# üß† Linux Evidence Overview & Triage Notebook

This notebook is designed to help an investigator perform a quick, visual triage of a **mounted Linux evidence image** over a specified time window.

It focuses on four questions:

1. **Who logged in, from where, and when?** (user login events)
2. **What commands were run?** (stack‚Äëranked from the audit log, if present)
3. **What executable files appeared?** (new or modified executables in the window)
4. **What SUID/SGID/sticky files exist?** (potential privilege‚Äëescalation footholds)

## Environment prerequisites

The notebook expects a Python environment with at least:

- `pandas`, `matplotlib`, `seaborn`
- A compatible `scipy` / `numpy` combination. If you see an error like *"A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy"*, run:
  ```bash
  pip install --upgrade scipy
  ```
  in the same environment as the notebook, then restart the kernel.

## Workflow

1. **Mount your evidence image read‚Äëonly** so that it looks like a normal Linux filesystem (for example at `/mnt/evidence`).
2. In the next cell, set:
   - `EVIDENCE_ROOT` to the mount point of the evidence image, and
   - `DATE_FROM` / `DATE_TO` to the investigation time window.
3. Run each section in order. Each will:
   - Check for the relevant log / filesystem artifacts under `EVIDENCE_ROOT`.
   - Limit analysis to the date range you specify.
   - Produce **tables and charts** to help highlight anomalies.

> **Note:** This notebook is **read‚Äëonly** with respect to the evidence. It walks the filesystem and parses log files, but does not modify them. Always work from a copy of the acquired evidence, not from original media.


In [None]:
# üì¶ Imports & plotting setup
import os
import stat
from datetime import datetime
from typing import Optional, Tuple, List

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="darkgrid")

# Make plots a bit larger by default
plt.rcParams["figure.figsize"] = (10, 4)


In [None]:
# üîß Evidence root and time window configuration

# CHANGE ME: set this to the root of your mounted evidence image.
# Example: EVIDENCE_ROOT = "/mnt/evidence"
EVIDENCE_ROOT = "/mnt/evidence"  # or None to run against the live system (not recommended for real investigations)

# CHANGE ME: investigation time window (inclusive)
# Use ISO 8601 style strings; time is in local time of the evidence system unless you know otherwise.
DATE_FROM = "2024-01-01 00:00:00"
DATE_TO   = "2024-01-31 23:59:59"

# Parsed datetime objects
WINDOW_START = datetime.fromisoformat(DATE_FROM)
WINDOW_END = datetime.fromisoformat(DATE_TO)

print(f"Evidence root     : {EVIDENCE_ROOT}")
print(f"Analysis window   : {WINDOW_START} -> {WINDOW_END}")

if WINDOW_END <= WINDOW_START:
    raise ValueError("DATE_TO must be after DATE_FROM")


In [None]:
# üß± Helper functions

def build_path(relative_path: str) -> str:
    """Build an absolute path into the evidence tree (or live system if EVIDENCE_ROOT is None)."""
    if EVIDENCE_ROOT:
        return os.path.join(EVIDENCE_ROOT, relative_path.lstrip("/"))
    return relative_path


def in_window(ts: datetime) -> bool:
    return WINDOW_START <= ts <= WINDOW_END


def ensure_exists(path: str) -> bool:
    exists = os.path.exists(path)
    if not exists:
        print(f"[!] Missing expected file: {path}")
    return exists


## 1Ô∏è‚É£ User login events (auth.log)

This section parses `auth.log` (or equivalent) under the evidence root to identify **interactive login events** (typically via SSH).

It extracts:

- **Timestamp** (assuming the year from `DATE_FROM` if the log does not contain a year)
- **Account name**
- **Source IP address**

Results are filtered to the configured time window, then visualised to highlight unusual **login times** or **source IPs**.


In [None]:
import re
import glob
import gzip

# Paths for auth-like log files; adjust here if your distro uses /var/log/secure
AUTH_GLOB = build_path("/var/log/auth.log*")
auth_paths = sorted(glob.glob(AUTH_GLOB))

print("Using auth log files:")
for p in auth_paths:
    print(f" - {p}")

if auth_paths:
    login_records = []
    year_hint = WINDOW_START.year

    # Example auth.log line:
    # Jan 10 12:34:56 hostname sshd[1234]: Accepted password for user from 1.2.3.4 port 5555 ssh2
    login_re = re.compile(
        r"^(?P<month>\w{3})\s+(?P<day>\d{1,2})\s+(?P<time>\d{2}:\d{2}:\d{2})\s+"  # syslog prefix
        r"(?P<host>\S+)\s+sshd\[\d+\]:\s+Accepted .* for (?P<user>\S+) from (?P<ip>\S+)\s+"
    )

    month_map = {m: i for i, m in enumerate(
        ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"], start=1
    )}

    def open_maybe_gzip(path: str):
        return gzip.open(path, "rt", errors="ignore") if path.endswith(".gz") else open(path, "r", errors="ignore")

    for auth_path in auth_paths:
        try:
            f = open_maybe_gzip(auth_path)
        except OSError:
            continue
        with f:
            for line in f:
                m = login_re.match(line)
                if not m:
                    continue
                md = m.groupdict()
                month = month_map.get(md["month"])  # type: ignore
                day = int(md["day"])
                t_str = md["time"]

                try:
                    ts = datetime.strptime(f"{year_hint}-{month:02d}-{day:02d} {t_str}", "%Y-%m-%d %H:%M:%S")
                except Exception:
                    continue

                if not in_window(ts):
                    continue

                login_records.append({
                    "timestamp": ts,
                    "user": md["user"],
                    "ip": md["ip"],
                })

    if login_records:
        login_df = pd.DataFrame(login_records).sort_values("timestamp")
        display(login_df.head())

        # Timeline of logins
        plt.figure(figsize=(12, 4))
        sns.histplot(login_df["timestamp"], bins=50)
        plt.title("Login events over time")
        plt.xlabel("Time")
        plt.ylabel("Count")
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()

        # Top source IPs
        ip_counts = login_df["ip"].value_counts().reset_index()
        ip_counts.columns = ["ip", "count"]

        plt.figure(figsize=(10, 4))
        sns.barplot(data=ip_counts.head(15), x="ip", y="count")
        plt.title("Top source IPs for logins in window")
        plt.xticks(rotation=45, ha="right")
        plt.tight_layout()
        plt.show()

    else:
        print("No login events found in the specified window across any auth.log files.")
else:
    print("No auth.log files found; skipping login event analysis.")


## 2Ô∏è‚É£ Audit log command ranking (audit.log)

If the Linux audit framework was enabled, `audit.log` can provide a **high‚Äëfidelity record of executed commands**.

This section:

- Parses `audit.log` under the evidence root.
- Extracts timestamps and the primary executable/command.
- Filters out obvious **background noise** (e.g. `cron`, `systemd`).
- Stack‚Äëranks commands by frequency within the configured time window and visualises the top entries.

> Parsing of audit logs varies across distributions; this implementation focuses on common `EXECVE` records and may not capture every variant.


In [None]:
# Parse and rank commands from audit.log (if present)

AUDIT_GLOB = build_path("/var/log/audit/audit.log*")
audit_paths = sorted(glob.glob(AUDIT_GLOB))

print("Using audit log files:")
for p in audit_paths:
    print(f" - {p}")

exec_records: List[dict] = []

if audit_paths:
    # Typical EXECVE line:
    # type=EXECVE msg=audit(1697040000.123:123): argc=3 a0="bash" a1="-c" a2="whoami"
    ts_re = re.compile(r"audit\((?P<epoch>\d+)(?:\.\d+)?:")
    cmd_re = re.compile(r"\ba0=\"(?P<cmd>[^\"]+)\"")

    for audit_path in audit_paths:
        try:
            f = gzip.open(audit_path, "rt", errors="ignore") if audit_path.endswith(".gz") else open(audit_path, "r", errors="ignore")
        except OSError:
            continue
        with f:
            for line in f:
                if " type=EXECVE " not in line and not line.lstrip().startswith("type=EXECVE"):
                    continue

                # Skip obvious background/system noise
                if any(noise in line for noise in ["cron", "CRON", "systemd", "anacron"]):
                    continue

                ts_match = ts_re.search(line)
                cmd_match = cmd_re.search(line)
                if not ts_match or not cmd_match:
                    continue

                try:
                    epoch = int(ts_match.group("epoch"))
                    ts = datetime.utcfromtimestamp(epoch)
                except Exception:
                    continue

                if not in_window(ts):
                    continue

                exec_records.append({
                    "timestamp": ts,
                    "command": cmd_match.group("cmd"),
                })

    if exec_records:
        exec_df = pd.DataFrame(exec_records)
        display(exec_df.head())

        # Stack-rank commands
        cmd_counts = exec_df["command"].value_counts().reset_index()
        cmd_counts.columns = ["command", "count"]

        top5 = cmd_counts.head(5)
        bottom5 = cmd_counts.tail(5) if len(cmd_counts) > 5 else cmd_counts

        print("Top 5 commands by frequency:")
        display(top5)
        print("Bottom 5 commands by frequency:")
        display(bottom5)

        plt.figure(figsize=(10, 4))
        sns.barplot(data=top5, x="count", y="command")
        plt.title("Top 5 commands from audit EXECVE events (within window)")
        plt.xlabel("Count")
        plt.ylabel("Command")
        plt.tight_layout()
        plt.show()

        if len(bottom5) > 0 and not bottom5.equals(top5):
            plt.figure(figsize=(10, 4))
            sns.barplot(data=bottom5.sort_values("count"), x="count", y="command")
            plt.title("Bottom 5 commands from audit EXECVE events (within window)")
            plt.xlabel("Count")
            plt.ylabel("Command")
            plt.tight_layout()
            plt.show()
    else:
        print("No EXECVE events in any audit.log file within the specified window (after filtering noise).")
else:
    print("No audit.log files found; skipping audit command analysis.")


## 3Ô∏è‚É£ Executable files created/changed in the window

This section walks the evidence filesystem under `EVIDENCE_ROOT` and finds files that:

- Are **regular files** with any execute bit set (user/group/other), and
- Have either a modification time (`st_mtime`) **or** metadata/creation time (`st_ctime` / equivalent) within the investigation window.

> The exact meaning of `st_ctime` is filesystem‚Äëdependent (on many Linux filesystems it records inode/metadata changes; on others it can expose true creation time). Here we simply treat it as an additional signal alongside `st_mtime`.
>
> The output highlights paths where new or changed executables appeared during the window, which may indicate **dropped tools, malware, or scripts**.


In [None]:
# Walk filesystem for executable files touched/created in the window

if not EVIDENCE_ROOT:
    print("[!] EVIDENCE_ROOT is not set; executable scan is intended for mounted evidence.")
else:
    exec_records = []

    for root, dirs, files in os.walk(EVIDENCE_ROOT):
        for name in files:
            path = os.path.join(root, name)
            try:
                st = os.stat(path, follow_symlinks=False)
            except (FileNotFoundError, PermissionError, OSError):
                continue

            # Only regular files
            if not stat.S_ISREG(st.st_mode):
                continue

            # Any execute bit set?
            if not (st.st_mode & (stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)):
                continue

            mtime = datetime.fromtimestamp(st.st_mtime)
            ctime = datetime.fromtimestamp(st.st_ctime)
            if not (in_window(mtime) or in_window(ctime)):
                continue

            rel_path = os.path.relpath(path, EVIDENCE_ROOT)
            exec_records.append({
                "path": rel_path,
                "mtime": mtime,
                "ctime": ctime,
                "mode": oct(st.st_mode & 0o777),
                "size": st.st_size,
            })

    if exec_records:
        exec_fs_df = pd.DataFrame(exec_records).sort_values("mtime")
        display(exec_fs_df.head(20))

        # Simple histogram by day (based on mtime)
        exec_fs_df["date"] = exec_fs_df["mtime"].dt.date
        counts_by_date = exec_fs_df["date"].value_counts().sort_index()

        plt.figure(figsize=(10, 4))
        counts_by_date.plot(kind="bar")
        plt.title("Executable files touched/created per day (within window)")
        plt.xlabel("Date")
        plt.ylabel("Count")
        plt.xticks(rotation=45, ha="right")
        plt.tight_layout()
        plt.show()
    else:
        print("No executable files found with mtime/ctime inside the specified window.")


## 4Ô∏è‚É£ SUID/SGID/sticky bit files

SUID, SGID, and sticky bits can be abused for **privilege escalation** or **persistence**.

This section scans the mounted evidence tree for files and directories where any of these bits are set and summarises them.

- **SUID (set‚Äëuser‚ÄëID)**: file runs with the file owner's UID
- **SGID (set‚Äëgroup‚ÄëID)**: file runs with the file group's GID, or directory enforces group inheritance
- **Sticky bit**: on directories, only file owners (or root) can delete/rename contained files

The goal is to quickly surface unusual locations with these bits set for further manual review.


In [None]:
# Scan for SUID/SGID/sticky bit files and directories

if not EVIDENCE_ROOT:
    print("[!] EVIDENCE_ROOT is not set; SUID/SGID/sticky scan is intended for mounted evidence.")
else:
    special_records = []

    for root, dirs, files in os.walk(EVIDENCE_ROOT):
        # Check directories (sticky/SGID often used on dirs)
        for name in dirs:
            path = os.path.join(root, name)
            try:
                st = os.stat(path, follow_symlinks=False)
            except (FileNotFoundError, PermissionError, OSError):
                continue

            mode_bits = st.st_mode
            special_mask = stat.S_ISUID | stat.S_ISGID | stat.S_ISVTX
            if mode_bits & special_mask:
                rel_path = os.path.relpath(path, EVIDENCE_ROOT)
                special_records.append({
                    "type": "dir",
                    "path": rel_path,
                    "mode": oct(mode_bits & 0o7777),
                })

        # Check files
        for name in files:
            path = os.path.join(root, name)
            try:
                st = os.stat(path, follow_symlinks=False)
            except (FileNotFoundError, PermissionError, OSError):
                continue

            mode_bits = st.st_mode
            special_mask = stat.S_ISUID | stat.S_ISGID | stat.S_ISVTX
            if mode_bits & special_mask:
                rel_path = os.path.relpath(path, EVIDENCE_ROOT)
                special_records.append({
                    "type": "file",
                    "path": rel_path,
                    "mode": oct(mode_bits & 0o7777),
                })

    if special_records:
        special_df = pd.DataFrame(special_records).sort_values("path")
        display(special_df.head(50))

        # Simple breakdown by type
        by_type = special_df["type"].value_counts()
        print("\nCounts by object type:")
        print(by_type)

        # Breakdown by specific bit(s)
        def classify_bits(mode_str: str) -> str:
            mode_int = int(mode_str, 8)
            flags = []
            if mode_int & stat.S_ISUID:
                flags.append("SUID")
            if mode_int & stat.S_ISGID:
                flags.append("SGID")
            if mode_int & stat.S_ISVTX:
                flags.append("STICKY")
            return ",".join(flags) or "none"

        special_df["bits"] = special_df["mode"].apply(classify_bits)
        bits_counts = special_df["bits"].value_counts()
        print("\nCounts by special bit combination:")
        print(bits_counts)
    else:
        print("No SUID/SGID/sticky bit objects found under the evidence root.")


In [None]:
# üìÑ Generate triage report as Markdown

from pathlib import Path
from IPython.display import Markdown, display

output_dir = Path(".")
report_path = output_dir / "evidence_triage_report.md"

sections: list[str] = []

sections.append("# Linux Evidence Triage Report")
sections.append("")
sections.append(f"- Evidence root: `{EVIDENCE_ROOT}`")
sections.append(f"- Analysis window: {WINDOW_START} -> {WINDOW_END}")
sections.append("")


def add_df_section(title: str, df, max_rows: int = 20) -> None:
    sections.append(f"## {title}")
    if df is None or getattr(df, "empty", True):
        sections.append("_No data available for this section in the selected window._")
    else:
        sections.append("")
        sections.append(df.head(max_rows).to_markdown(index=False))
    sections.append("")


# Login events
login_df = globals().get("login_df")
add_df_section("Login events", login_df)

# Audit EXECVE command frequency (top/bottom 5)
sections.append("## Audit EXECVE command frequency")
exec_top5 = globals().get("top5")
exec_bottom5 = globals().get("bottom5")
if exec_top5 is None or getattr(exec_top5, "empty", True):
    sections.append("_No EXECVE command data available in this window (or audit.log missing)._")
else:
    sections.append("### Top 5 commands by frequency")
    sections.append(exec_top5.to_markdown(index=False))
    sections.append("")
    if exec_bottom5 is not None and not exec_bottom5.empty:
        sections.append("### Bottom 5 commands by frequency")
        sections.append(exec_bottom5.to_markdown(index=False))
sections.append("")

# Executable files touched/created
exec_fs_df = globals().get("exec_fs_df")
add_df_section("Executable files touched/created in window", exec_fs_df)

# SUID/SGID/sticky entries
special_df = globals().get("special_df")
add_df_section("SUID/SGID/sticky bit objects", special_df)

report_md = "\n".join(sections)
report_path.write_text(report_md, encoding="utf-8")

print(f"Markdown report written to: {report_path.resolve()}")
display(Markdown(report_md))


# üìÑ Generate triage report as Markdown

from pathlib import Path
from IPython.display import Markdown, display

output_dir = Path(".")
report_path = output_dir / "evidence_triage_report.md"

sections: list[str] = []

sections.append("# Linux Evidence Triage Report")
sections.append("")
sections.append(f"- Evidence root: `{EVIDENCE_ROOT}`")
sections.append(f"- Analysis window: {WINDOW_START} -> {WINDOW_END}")
sections.append("")


def add_df_section(title: str, df, max_rows: int = 20) -> None:
    sections.append(f"## {title}")
    if df is None or getattr(df, "empty", True):
        sections.append("_No data available for this section in the selected window._")
    else:
        sections.append("")
        sections.append(df.head(max_rows).to_markdown(index=False))
    sections.append("")


# Login events
login_df = globals().get("login_df")
add_df_section("Login events", login_df)

# Audit EXECVE command frequency (top/bottom 5)
sections.append("## Audit EXECVE command frequency")
exec_top5 = globals().get("top5")
exec_bottom5 = globals().get("bottom5")
if exec_top5 is None or getattr(exec_top5, "empty", True):
    sections.append("_No EXECVE command data available in this window (or audit.log missing)._")
else:
    sections.append("### Top 5 commands by frequency")
    sections.append(exec_top5.to_markdown(index=False))
    sections.append("")
    if exec_bottom5 is not None and not exec_bottom5.empty:
        sections.append("### Bottom 5 commands by frequency")
        sections.append(exec_bottom5.to_markdown(index=False))
sections.append("")

# Executable files touched/created
exec_fs_df = globals().get("exec_fs_df")
add_df_section("Executable files touched/created in window", exec_fs_df)

# SUID/SGID/sticky entries
special_df = globals().get("special_df")
add_df_section("SUID/SGID/sticky bit objects", special_df)

report_md = "\n".join(sections)
report_path.write_text(report_md, encoding="utf-8")

print(f"Markdown report written to: {report_path.resolve()}")
display(Markdown(report_md))
