A small, self-contained threat correlation engine written in pure Python. It parses raw authentication logs, runs a set of rule-based correlation checks, enriches every suspicious source IP with reputation / geolocation data, and produces a triage-ready report, all from the command line.
It's intentionally minimal so the detection logic stays easy to read, but it's structured the way a real SIEM correlation tool is: a parser layer, a detection layer, an enrichment layer, and an orchestration layer that ties them together.
I wanted a hands-on project to practice the basics of detection engineering: writing parsers that don't break on the first malformed line, expressing correlation rules as small composable functions, and thinking carefully about thresholds and false positives. The goal isn't to compete with Splunk; it's to learn how the moving parts fit together.
- Regex-based log parser that emits structured
LogEventrecords (timestamp, level, event type, user, source IP). - Five correlation rules covering brute-force, credential compromise, username spraying, distributed attacks against a single user, and off-hours access.
- Severity-tagged alerts (INFO / LOW / MEDIUM / HIGH / CRITICAL) that can be filtered from the CLI.
- IP enrichment via the free
ip-api.comendpoint, with private-IP detection and an on-disk cache so you don't re-hit the API for IPs you've already looked up. - Multi-format output: human-readable text, Markdown, or JSON for piping into other tooling.
- Zero heavy dependencies, only
requests.
┌────────────┐
log.txt ───▶ │ parser.py │ ──▶ list[LogEvent]
└────────────┘
│
▼
┌──────────────────────┐
│ detection_engine.py │ ──▶ list[Alert]
│ R1 brute-force │
│ R2 success-after-fail
│ R3 username spray │
│ R4 distributed BF │
│ R5 off-hours login │
└──────────────────────┘
│
▼
┌────────────────┐
│ ip_lookup.py │ ──▶ dict[ip → IPInfo]
└────────────────┘
│
▼
┌────────────────────┐
│ auto_investigate.py│ ──▶ text / markdown / json report
└────────────────────┘
git clone https://github.com/<your-user>/ThreatCorrelationEngine.git
cd ThreatCorrelationEngine
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtPython 3.10 or newer is required (the code uses modern type hints).
The simplest way to run the whole pipeline:
python auto_investigate.py log.txtYou'll get a triage report listing every alert that fired, plus geo/ISP info for each suspicious IP.
Parse a log file and see structured events plus a summary:
python parser.py log.txt
python parser.py log.txt --json
python parser.py log.txt --summary-onlyRun only the detection engine:
python detection_engine.py log.txt
python detection_engine.py log.txt --min-severity HIGH
python detection_engine.py log.txt --jsonLook up a single IP:
python ip_lookup.py 8.8.8.8
python ip_lookup.py 8.8.8.8 --jsonGenerate a Markdown report and skip external lookups (offline mode):
python auto_investigate.py log.txt --format markdown -o report.md --skip-lookup| ID | Rule | Severity | What it catches |
|---|---|---|---|
| R1 | Brute-force login attempts | HIGH | >=5 failed logins from one IP within 5 minutes |
| R2 | Successful login after failures | CRITICAL | A burst of failures immediately followed by a success from the same IP |
| R3 | Username spraying | HIGH | One IP failing across >=3 distinct usernames |
| R4 | Distributed brute-force on a user | HIGH | One user account being attacked from >=3 distinct IPs |
| R5 | Off-hours successful login | MEDIUM | A successful login between 22:00 and 05:59 |
All thresholds live near the top of detection_engine.py so you can
tune them without digging through the rule bodies.
============================================================
THREAT CORRELATION ENGINE - AUTO-INVESTIGATE REPORT
============================================================
Total alerts: 4
[CRITICAL] R2 Successful login after failure burst
203.0.113.45 succeeded after 6 consecutive failures (possible credential compromise).
window : 2026-04-12 08:15:47 -> 2026-04-12 08:16:10
ips : 203.0.113.45
users : bob
[HIGH ] R1 Brute-force login attempts
203.0.113.45 produced 6 failed logins within 5 minutes.
...
--- IP Intelligence ---
203.0.113.45 United States / Ashburn (Example ISP)
198.51.100.7 Germany / Frankfurt (Example Hosting)
...
.
├── parser.py # Log parsing + summary
├── detection_engine.py # Correlation rules + severity tagging
├── ip_lookup.py # IP reputation / geo lookup with caching
├── auto_investigate.py # End-to-end orchestration + reporting
├── log.txt # Sample log used for testing
├── requirements.txt
└── README.md
- Pure rule-based detection, no ML or behavioural baselining.
- Only the small handful of rules listed above; production SIEMs ship with hundreds.
- IP enrichment depends on a free public API and its rate limits.
- No real-time ingestion; this is a batch tool that reads a file.
- Pluggable rule registry so new rules can be dropped into a folder.
- YAML/JSON config for thresholds and time windows.
- Streaming mode that tails a log file in real time.
- Threat-intel integration (AbuseIPDB, OTX, etc.).
- Simple web dashboard for browsing alerts.
Released under the MIT License. Feel free to reuse and adapt.
Arpit Dhameliya. Built as a learning project around detection engineering and log correlation.