Threat Correlation Engine

A small, self-contained threat correlation engine written in pure Python. It parses raw authentication logs, runs a set of rule-based correlation checks, enriches every suspicious source IP with reputation / geolocation data, and produces a triage-ready report, all from the command line.

It's intentionally minimal so the detection logic stays easy to read, but it's structured the way a real SIEM correlation tool is: a parser layer, a detection layer, an enrichment layer, and an orchestration layer that ties them together.

Why I built this

I wanted a hands-on project to practice the basics of detection engineering: writing parsers that don't break on the first malformed line, expressing correlation rules as small composable functions, and thinking carefully about thresholds and false positives. The goal isn't to compete with Splunk; it's to learn how the moving parts fit together.

Features

Regex-based log parser that emits structured LogEvent records (timestamp, level, event type, user, source IP).
Five correlation rules covering brute-force, credential compromise, username spraying, distributed attacks against a single user, and off-hours access.
Severity-tagged alerts (INFO / LOW / MEDIUM / HIGH / CRITICAL) that can be filtered from the CLI.
IP enrichment via the free ip-api.com endpoint, with private-IP detection and an on-disk cache so you don't re-hit the API for IPs you've already looked up.
Multi-format output: human-readable text, Markdown, or JSON for piping into other tooling.
Zero heavy dependencies, only requests.

Architecture

                ┌────────────┐
   log.txt ───▶ │  parser.py │ ──▶  list[LogEvent]
                └────────────┘
                       │
                       ▼
              ┌──────────────────────┐
              │ detection_engine.py  │ ──▶ list[Alert]
              │  R1 brute-force      │
              │  R2 success-after-fail
              │  R3 username spray   │
              │  R4 distributed BF   │
              │  R5 off-hours login  │
              └──────────────────────┘
                       │
                       ▼
                ┌────────────────┐
                │  ip_lookup.py  │ ──▶ dict[ip → IPInfo]
                └────────────────┘
                       │
                       ▼
              ┌────────────────────┐
              │ auto_investigate.py│ ──▶ text / markdown / json report
              └────────────────────┘

Install

git clone https://github.com/<your-user>/ThreatCorrelationEngine.git
cd ThreatCorrelationEngine
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Python 3.10 or newer is required (the code uses modern type hints).

Usage

The simplest way to run the whole pipeline:

python auto_investigate.py log.txt

You'll get a triage report listing every alert that fired, plus geo/ISP info for each suspicious IP.

Each module also runs standalone

Parse a log file and see structured events plus a summary:

python parser.py log.txt
python parser.py log.txt --json
python parser.py log.txt --summary-only

Run only the detection engine:

python detection_engine.py log.txt
python detection_engine.py log.txt --min-severity HIGH
python detection_engine.py log.txt --json

Look up a single IP:

python ip_lookup.py 8.8.8.8
python ip_lookup.py 8.8.8.8 --json

Generate a Markdown report and skip external lookups (offline mode):

python auto_investigate.py log.txt --format markdown -o report.md --skip-lookup

Detection rules

ID	Rule	Severity	What it catches
R1	Brute-force login attempts	HIGH	>=5 failed logins from one IP within 5 minutes
R2	Successful login after failures	CRITICAL	A burst of failures immediately followed by a success from the same IP
R3	Username spraying	HIGH	One IP failing across >=3 distinct usernames
R4	Distributed brute-force on a user	HIGH	One user account being attacked from >=3 distinct IPs
R5	Off-hours successful login	MEDIUM	A successful login between 22:00 and 05:59

All thresholds live near the top of detection_engine.py so you can tune them without digging through the rule bodies.

Sample output

============================================================
THREAT CORRELATION ENGINE - AUTO-INVESTIGATE REPORT
============================================================
Total alerts: 4

[CRITICAL] R2 Successful login after failure burst
   203.0.113.45 succeeded after 6 consecutive failures (possible credential compromise).
   window : 2026-04-12 08:15:47 -> 2026-04-12 08:16:10
   ips    : 203.0.113.45
   users  : bob

[HIGH    ] R1 Brute-force login attempts
   203.0.113.45 produced 6 failed logins within 5 minutes.
   ...

--- IP Intelligence ---
  203.0.113.45        United States / Ashburn  (Example ISP)
  198.51.100.7        Germany / Frankfurt      (Example Hosting)
  ...

Project layout

.
├── parser.py             # Log parsing + summary
├── detection_engine.py   # Correlation rules + severity tagging
├── ip_lookup.py          # IP reputation / geo lookup with caching
├── auto_investigate.py   # End-to-end orchestration + reporting
├── log.txt               # Sample log used for testing
├── requirements.txt
└── README.md

Limitations

Pure rule-based detection, no ML or behavioural baselining.
Only the small handful of rules listed above; production SIEMs ship with hundreds.
IP enrichment depends on a free public API and its rate limits.
No real-time ingestion; this is a batch tool that reads a file.

Future improvements

Pluggable rule registry so new rules can be dropped into a folder.
YAML/JSON config for thresholds and time windows.
Streaming mode that tails a log file in real time.
Threat-intel integration (AbuseIPDB, OTX, etc.).
Simple web dashboard for browsing alerts.

License

Released under the MIT License. Feel free to reuse and adapt.

Author

Arpit Dhameliya. Built as a learning project around detection engineering and log correlation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Threat Correlation Engine

Why I built this

Features

Architecture

Install

Usage

Each module also runs standalone

Detection rules

Sample output

Project layout

Limitations

Future improvements

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
auto_investigate.py		auto_investigate.py
detection_engine.py		detection_engine.py
ip_lookup.py		ip_lookup.py
log.txt		log.txt
parser.py		parser.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Threat Correlation Engine

Why I built this

Features

Architecture

Install

Usage

Each module also runs standalone

Detection rules

Sample output

Project layout

Limitations

Future improvements

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages