π Read our announcement blog post π
Matano is an open source cloud-native alternative to SIEM, built for security teams on AWS.
We are on a mission to build the first open platform for threat hunting, detection & response, and cybersecurity analytics at petabyte scale.
- Security Data Lake: Matano normalizes unstructured security logs into a structured realtime data lake in your AWS account.
- Collect All Your Logs: Matano integrates out of the box with 50+ sources for security logs and can easily be extended with custom sources.
- Detection-as-Code: Use Python to build realtime detections as code. Support for automatic import of Sigma detections to Matano.
- Log Transformation Pipeline: Matano supports custom VRL (Vector Remap Language) scripting to parse, enrich, normalize and transform your logs as they are ingested without managing any servers.
- No Vendor Lock-In: Matano uses an open table format (Apache Iceberg) and open schema standards (ECS), to give you full ownership of your security data in a vendor-neutral format.
- Bring Your Own Analytics: Query your security lake directly from any Iceberg-compatible engine (AWS Athena, Snowflake, Spark, Trino etc.) without having to copy data around.
- Serverless: Matano is fully serverless and designed specifically for AWS and focuses on enabling high scale, low cost, and zero-ops.
- Reduce SIEM costs drastically (1/10th the cost).
- Augment your SIEM with a security data lake for additional context during investigations.
- Instantly search for matches to IOCs across your data lake using standardized fields.
- Write detections-as-code using Python to detect suspicious behavior & create contextualized alerts.
- Easier to use cloud-native open source SIEM alternative for detection & response.
- ECS-compatible serverless alternative to ELK / Elastic Security stack.
- AbuseCH (URLhaus, MalwareBazaar, ThreatFox)
- AlienVault OTX
- MaxMind GeoIP (coming soon)
- GreyNoise Intelligence (coming soon)
- Custom π§ (coming soon)
- Amazon Athena (default)
- Snowflake (preview)
- Spark
- Trino
- BigQuery Omni (BigLake)
- Dremio
View the complete installation instructions
Install the matano CLI to deploy Matano into your AWS account, and manage your Matano deployment.
Linux
curl -OL https://github.com/matanolabs/matano/releases/download/nightly/matano-linux-x64.sh
chmod +x matano-linux-x64.sh
sudo ./matano-linux-x64.sh
macOS
curl -OL https://github.com/matanolabs/matano/releases/download/nightly/matano-macos-x64.sh
chmod +x matano-macos-x64.sh
sudo ./matano-macos-x64.sh
Read the complete docs on getting started
To get started with Matano, run the matano init
command.
- Make sure you have AWS credentials in your environment (or in an AWS CLI profile).
- The interactive CLI wizard will walk you through getting started by generating an initial Matano directory for you, initializing your AWS account, and deploying Matano into your AWS account.
- Initial deployment takes a few minutes.
Once initialized, your Matano directory is used to control & manage all resources in your project e.g. log sources, detections, and other configuration. It is structured as follows:
β example-matano-dir git:(main) tree
βββ detections
β βββ aws_root_credentials
β βββ detect.py
β βββ detection.yml
βββ log_sources
β βββ cloudtrail
β β βββ log_source.yml
β β βββ tables
β β βββ default.yml
β βββ zeek
β βββ log_source.yml
β βββ tables
β βββ dns.yml
βββ matano.config.yml
βββ matano.context.json
When onboarding a new log source or authoring a detection, run matano deploy
from anywhere in your project to deploy the changes to your account.
Read the complete docs on configuring custom log sources
Matano uses Vector Remap Language (VRL), to allow users to easily onboard custom log sources and encourages you to normalize fields according to the Elastic Common Schema (ECS) to enable enhanced pivoting and bulk search for IOCs across your security data lake.
Users can define custom VRL programs to parse and transform unstructured logs as they are being ingested through one of the supported mechanisms for a log source (e.g. S3, SQS).
VRL is an expression-oriented language designed for transforming observability data (e.g. logs) in a safe and performant manner. It features a simple syntax and a rich set of built-in functions tailored specifically to observability use cases.
Let's have a look at a simple example. Imagine that you're working with HTTP log events that look like this:
{
"line": "{\"status\":200,\"srcIpAddress\":\"1.1.1.1\",\"message\":\"SUCCESS\",\"username\":\"ub40fan4life\"}"
}
You want to apply these changes to each event:
- Parse the raw
line
string into JSON, and explode the fields to the top level - Rename
srcIpAddress
to thesource.ip
ECS field - Remove the
username
field - Convert the
message
to lowercase
Adding this VRL program to your log source as a transform
step would accomplish all of that:
transform: |
. = object!(parse_json!(string!(.json.line)))
.source.ip = del(.srcIpAddress)
del(.username)
.message = downcase(string!(.message))
schema:
ecs_field_names:
- source.ip
- http.status
The resulting event π:
{
"message": "success",
"status": 200,
"source": {
"ip": "1.1.1.1"
}
}
Read the complete docs on detections
Use Matano detections to define rules that can alert on threats in your security logs. Matano users define detections as code (DaC). A detection is a Python program that is invoked with data from a log source in realtime and can create an alert.
def detect(record):
return (
record.deepget("event.action") == "CreateInstanceExportTask"
and record.deepget("event.provider") == "ec2.amazonaws.com"
and record.deepget("event.outcome") == "failure"
)
def detect(r):
return (
"authentication" in r.deepget("event.category", [])
and r.deepget("event.outcome") == "failure"
)
def title(r):
return f"Multiple failed logins from {r.deepget('user.full_name')} - {r.deepget('source.ip')}"
def dedupe(r):
return r.deepget("source.ip")
---
tables:
- aws_cloudtrail
- okta_system
- o365_audit
alert:
severity: medium
threshold: 5
deduplication_window_minutes: 15
destinations:
- slack_my_team
from detection import remotecache
# a cache of user -> ip[]
user_to_ips = remotecache("user_ip")
def detect(record):
if (
record.deepget("event.action") == "ConsoleLogin" and
record.deepget("event.outcome") == "success"
):
# A unique key on the user name
user = record.deepget("user.name")
existing_ips = user_to_ips[user] or []
updated_ips = user_to_ips.add_to_string_set(
user,
record.deepget("source.ip")
)
# Alert on new IPs
new_ips = set(updated_ips) - set(existing_ips)
if existing_ips and new_ips:
return True
Read the complete docs on alerting
All alerts are automatically stored in a Matano table named matano_alerts
. The alerts and rule matches are normalized to ECS and contain context about the original event that triggered the rule match, along with the alert and rule data.
Example Queries
Summarize alerts in the last week that have breached threshold
select
matano.alert.id as alert_id,
matano.alert.rule.name as rule_name,
max(matano.alert.title) as title,
count(*) as match_count,
min(matano.alert.first_matched_at) as first_matched_at,
max(ts) as last_matched_at,
array_distinct(flatten(array_agg(related.ip))) as related_ip,
array_distinct(flatten(array_agg(related.user))) as related_user,
array_distinct(flatten(array_agg(related.hosts))) as related_hosts,
array_distinct(flatten(array_agg(related.hash))) as related_hash
from
matano_alerts
where
matano.alert.first_matched_at > (current_timestamp - interval '7' day)
and matano.alert.breached = true
group by
matano.alert.rule.name,
matano.alert.id
order by
last_matched_at desc
Matano allows you to deliver alerts to external systems. You can use the Matano alerting SNS topic to deliver alerts to Email, Slack, and other services.
- Traditional tools used to analyze security data (SIEMs) donβt scale, and are too expensive and difficult to manage for cloud-based security teams.
- Cybersecurity vendors lock your data in proprietary formats which make it difficult to use outside of their product. With Matano, all your data is in open Apache Iceberg tables that can can be directly queried from different tools (AWS Athena, Snowflake, etc.) without having to copy any data.
- Security is a Big Data problem: collecting data from your network, SaaS, and cloud environments can exceed 100TBs of data. Security teams are forced to either not collect some data, leave data unprocessed, or build an in-house data lake to cost-effectively analyze large datasets. Matano helps you easily build a security data lake with all features needed for detection and response.
- At scale, without a strategy to normalize data into a structured format, it is difficult to correlate across data sources & build effective alerts that donβt create many false positives. Traditional SIEM query-based rules fail to accurately identify threats. Matano's detection-as-code approach offers greater flexibility and help's you harden your detections over time.
For general help on using Matano, please refer to the official Matano documentation. For additional help, feel free to use one of these channels to ask a question:
- Discord (Come join the Matano family, and hang out with the team and community)
- Forum (For deeper conversations about features, the project, or problems)
- GitHub (Bug reports, Contributions)
- Twitter (Get news hot off the press)
Thanks go to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind are welcome!