Skip to content

arpitsec/ThreatCorrelationEngine

Repository files navigation

Threat Correlation Engine

Python License: MIT Status

A small, self-contained threat correlation engine written in pure Python. It parses raw authentication logs, runs a set of rule-based correlation checks, enriches every suspicious source IP with reputation / geolocation data, and produces a triage-ready report, all from the command line.

It's intentionally minimal so the detection logic stays easy to read, but it's structured the way a real SIEM correlation tool is: a parser layer, a detection layer, an enrichment layer, and an orchestration layer that ties them together.


Why I built this

I wanted a hands-on project to practice the basics of detection engineering: writing parsers that don't break on the first malformed line, expressing correlation rules as small composable functions, and thinking carefully about thresholds and false positives. The goal isn't to compete with Splunk; it's to learn how the moving parts fit together.


Features

  • Regex-based log parser that emits structured LogEvent records (timestamp, level, event type, user, source IP).
  • Five correlation rules covering brute-force, credential compromise, username spraying, distributed attacks against a single user, and off-hours access.
  • Severity-tagged alerts (INFO / LOW / MEDIUM / HIGH / CRITICAL) that can be filtered from the CLI.
  • IP enrichment via the free ip-api.com endpoint, with private-IP detection and an on-disk cache so you don't re-hit the API for IPs you've already looked up.
  • Multi-format output: human-readable text, Markdown, or JSON for piping into other tooling.
  • Zero heavy dependencies, only requests.

Architecture

                ┌────────────┐
   log.txt ───▶ │  parser.py │ ──▶  list[LogEvent]
                └────────────┘
                       │
                       ▼
              ┌──────────────────────┐
              │ detection_engine.py  │ ──▶ list[Alert]
              │  R1 brute-force      │
              │  R2 success-after-fail
              │  R3 username spray   │
              │  R4 distributed BF   │
              │  R5 off-hours login  │
              └──────────────────────┘
                       │
                       ▼
                ┌────────────────┐
                │  ip_lookup.py  │ ──▶ dict[ip → IPInfo]
                └────────────────┘
                       │
                       ▼
              ┌────────────────────┐
              │ auto_investigate.py│ ──▶ text / markdown / json report
              └────────────────────┘

Install

git clone https://github.com/<your-user>/ThreatCorrelationEngine.git
cd ThreatCorrelationEngine
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Python 3.10 or newer is required (the code uses modern type hints).


Usage

The simplest way to run the whole pipeline:

python auto_investigate.py log.txt

You'll get a triage report listing every alert that fired, plus geo/ISP info for each suspicious IP.

Each module also runs standalone

Parse a log file and see structured events plus a summary:

python parser.py log.txt
python parser.py log.txt --json
python parser.py log.txt --summary-only

Run only the detection engine:

python detection_engine.py log.txt
python detection_engine.py log.txt --min-severity HIGH
python detection_engine.py log.txt --json

Look up a single IP:

python ip_lookup.py 8.8.8.8
python ip_lookup.py 8.8.8.8 --json

Generate a Markdown report and skip external lookups (offline mode):

python auto_investigate.py log.txt --format markdown -o report.md --skip-lookup

Detection rules

ID Rule Severity What it catches
R1 Brute-force login attempts HIGH >=5 failed logins from one IP within 5 minutes
R2 Successful login after failures CRITICAL A burst of failures immediately followed by a success from the same IP
R3 Username spraying HIGH One IP failing across >=3 distinct usernames
R4 Distributed brute-force on a user HIGH One user account being attacked from >=3 distinct IPs
R5 Off-hours successful login MEDIUM A successful login between 22:00 and 05:59

All thresholds live near the top of detection_engine.py so you can tune them without digging through the rule bodies.


Sample output

============================================================
THREAT CORRELATION ENGINE - AUTO-INVESTIGATE REPORT
============================================================
Total alerts: 4

[CRITICAL] R2 Successful login after failure burst
   203.0.113.45 succeeded after 6 consecutive failures (possible credential compromise).
   window : 2026-04-12 08:15:47 -> 2026-04-12 08:16:10
   ips    : 203.0.113.45
   users  : bob

[HIGH    ] R1 Brute-force login attempts
   203.0.113.45 produced 6 failed logins within 5 minutes.
   ...

--- IP Intelligence ---
  203.0.113.45        United States / Ashburn  (Example ISP)
  198.51.100.7        Germany / Frankfurt      (Example Hosting)
  ...

Project layout

.
├── parser.py             # Log parsing + summary
├── detection_engine.py   # Correlation rules + severity tagging
├── ip_lookup.py          # IP reputation / geo lookup with caching
├── auto_investigate.py   # End-to-end orchestration + reporting
├── log.txt               # Sample log used for testing
├── requirements.txt
└── README.md

Limitations

  • Pure rule-based detection, no ML or behavioural baselining.
  • Only the small handful of rules listed above; production SIEMs ship with hundreds.
  • IP enrichment depends on a free public API and its rate limits.
  • No real-time ingestion; this is a batch tool that reads a file.

Future improvements

  • Pluggable rule registry so new rules can be dropped into a folder.
  • YAML/JSON config for thresholds and time windows.
  • Streaming mode that tails a log file in real time.
  • Threat-intel integration (AbuseIPDB, OTX, etc.).
  • Simple web dashboard for browsing alerts.

License

Released under the MIT License. Feel free to reuse and adapt.

Author

Arpit Dhameliya. Built as a learning project around detection engineering and log correlation.

About

A small Python-based threat correlation engine that parses authentication logs, runs rule-based detections (brute-force, credential compromise, username spraying), enriches suspicious IPs with geolocation data, and produces triage-ready reports.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages