# Tamper-Evident Forensic Logging System

## What are Logs?

Log files are machine-generated records of events happening inside your system or application. Every login attempt, error message, service restart, or user action leaves a trace. A basic log entry includes:
- A Timestamp
- Severity Level ($\texttt{\color{lightblue}{INFO}} \texttt{/} \texttt{\color{Yellow}{WARNING}} \texttt{/} \texttt{\color{Red}{ERROR}}$)
- The Source/System from which the event came
- What action was taken / What went wrong
- The User or Process involved

Logs give us real-time and historical visibility into how our infrastructure behaves and what's going wrong when it doesnâ€™t.

## Why Do Log Files Matter?

Logs give us visibility into how our systems behave and what is causing failures, slowdowns or security risks. They help us:
- **Troubleshoot faster:** Logs show exactly what happened, when and what triggered it, so that we can trace the root cause without guessing.
- **Spot performance issues early:** Repeated errors or slow API calls often show up in logs before they trigger alerts.
- **Detect threats in real time:** Logs are often the first sign that something isn't right, like a series of failed logins or strange firewall activity.

## Problem Statement

Modern cybersystems heavily rely on logs as one of the primary forensic evidence during legal proceedings, audits, cyber incidents etc. However, what happens if an adversary gains access to these logs and modifies them? **Traditional logs are written in plaintext**, **can be easily edited, deleted and reordered**, and lastly, **rarely provide cryptographic proof of integrity**. Once an advesary gains access and elevates access priviledges, traditional logs are no longer trustworthy. This project designs and implements a tamper-evident logging system that ensures:
- Every log entry depends on all previous entries
- Altering one entry breaks the entire chain
- Verification can be done later by an investigator

**This project is NOT about preventing attacks**. The design goal is _tamper evidence_, NOT _tamper prevention_. Traditional logging systems do not provide cryptographic guarantees of integrity and evidence of tampering under adversarial conditions. Thus, making them unsuitable for forensic use after system compromise.

$\boxed{\texttt{NOTE}}$ **Why can't encrypting the logs be helpful?**  
> Encryption alone does not guarantee append-only behavior, ordering integrity, tamper detection, evidence continuity etc. An advesary can always delete encrypted logs entirely or replay old encrypted logs

## Objectives

- Make sure logs are append only
- Detect any tampering or reordering
- Preserve time integrity
- Enable secure remote backup of storage so that if the local logs get tampered with, we can always verify with the backup server (and vice versa)

However, this project does NOT aim to solve:
- Protect against compromised private keys
- Provide real-time intrusion prevention

This project is about damage detection and control after an adversarial attack

## Threat Model

This project assumes the following:
- Attacker has user or admin access on client system
- Attacker can read log files
- Attacker can attempt to delete or modify logs
- Attacker may observe network traffic
- Attacker does NOT control the server

This system detects:
- Log deletion
- Log modification
- Log reordering
- Log injection (and rejects it)
- Replay

This project does NOT defend against:
- Malicious server
- Stolen client private key
- Hardware root compromise

### Approach

Each log entry contains:
- $\texttt{timestamp}$
- $\texttt{event\_data}$
- $\texttt{previous\_hash}$
- $\texttt{current\_hash}$

Where $\texttt{current\_hash}$ = $\texttt{HASH(timestamp || event\_data || previous\_hash)}$

This forms a hash chain:
```txt
    Entry 1 -> Hash 1
    Entry 2 -> Hash 2 (Dependent on Hash 1)
    Entry 3 -> Hash 3 (Dependent on Hash 2)
    ...
```

Any modification breaks all future hashes.

## Components  

1. Client (Log Generator)  
    **To generate event logs**, however it is NOT to be trusted for integrity.
    - Generates logs
    - Cryptographically proves who produced a log
    - Enables attribution and non-repudiation

2. Secure Channel (Authenticated + Encrypted Transport)
    To ensure logs arrive at the server unaltered, confidential and replay-resistant. Without a secure channel, logs can be modified in transit, be replayed, be observed or injected by attackers. AES-GCM + RSA key exchange gives CIA + protection against MiTM.

3. Server (Forensic Authority / Trust Anchor)
    It acts as the single source of truth for log integrity, order and time. The server does not trust logs blindly. It verifies, chains and commits them. It:
    - Enforces append-only behavior
    - Detects tampering
    - Maintains log continuity
    - Assigns authoritative timestamps
    - Stores evidence safely off-client

4. Persistent State (Hash Chain State Storage)
    To preserve continuity across time, crashes and restarts. Without persistent state:
    - Server restarts reset the hash chain
    - Attackers can exploit restarts

![Architecture](Architecture.png)

|Component|Enforces|
|---|---|
|Client|Authenticity|
|Secure Channel|Transport integrity|
|Server|Ordering + tamper detection|
|Persistent State|Long-term continuity|


## Implementation

### 1. Log Generation

For demonstration purposes, I'll be generating dummy log instead of using actual logs of applications.

#### 1.1 Importing modules:

```py
import hashlib
from datetime import datetime, timezone
import os
```

#### 1.2 Variable declaration

```py
LOG_FILE = "secure_logs.log"
GENESIS_HASH = "0" * 64  # Indicates start of chain
```
- `LOG_FILE` represents the address of the Log File
- `GENESIS_HASH` represents the start of the Hash chain (similar to blockchain genesis block)

#### 1.3 Hash Function

```py
def sha256(data: str) -> str:
    return hashlib.sha256(data.encode("utf-8")).hexdigest()
```

- Return SHA-256 hash of the the string entered
- SHA-256 operates exclusively on binary data (bytes) instead of abstract text encodings, hence we use UTF-8 instead of UTF-16
- Lastly, we convert the binary data into hexadecimal representation

#### 1.4 UTC Timestamp

```py
def get_utc_timestamp() -> str:
    return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
```

- UTC is timezone independent
- However, it uses the system clock which can be tampered with

#### 1.5 Get Last Hash

```py
def get_last_hash() -> str:
    if not os.path.exists(LOG_FILE):
        return GENESIS_HASH

    with open(LOG_FILE, "r", encoding="utf-8") as f:
        lines = f.readlines()
        if not lines:
            return GENESIS_HASH

        last_line = lines[-1].strip()
        return last_line.split("|")[-1]  # current_hash
```

- Checks if the log file exists
- If not, it returns GENESIS_HASH
- Reads the last log entry
- Extracts the final field (current_hash)
- Uses it as prev_hash for the next entry

This enforces append-only operation:
- New logs must reference the last valid state
- Gaps or deletions are detectable

#### 1.6 Canonical Serialization

```py
canonical_entry = (f"{username}|{timestamp}|{level}|{message}|{prev_hash}")
```

- Cryptographic hashes must be computed over a stable representation.

#### 1.7 Generate Log

```py
def generate_log(username: str = "admin", level: str = "INFO", message: str = "Test Log") -> None:
    # Validation
    allowed_levels = {"INFO", "WARNING", "ERROR", "SUCCESS"}
    if level not in allowed_levels:
        raise ValueError(f"Invalid log level: {level}")

    if "\n" in message:
        raise ValueError("Log message must not contain newline characters")

    username = username.strip()
    message = message.strip()

    timestamp = get_utc_timestamp()
    prev_hash = get_last_hash()

    # Canonical serialization
    canonical_entry = (f"{username}|{timestamp}|{level}|{message}|{prev_hash}")

    current_hash = sha256(canonical_entry)
    full_log_entry = f"{canonical_entry}|{current_hash}"

    with open(LOG_FILE, "a", encoding="utf-8") as f:
        f.write(full_log_entry + "\n")
```

- Takes three optional string parameters (username, log level, message) with defaults. Returns `None`.
- Creates a set of valid log levels and checks that the provided `level` is one of them. If not, raises an error to prevent invalid entries.
- Ensures the message doesn't contain newlines, which would break the single-line log file format (each entry must be one line).
- Removes leading/trailing whitespace from username and message to normalize the data.
- Retrieves the current UTC time in ISO 8601 format (eg: `2026-02-11T14:32:08Z`).
- Retrieves the hash from the last log entry (or `GENESIS_HASH` if the file is empty). This creates the chain.
- Combines all data fields into a single string in a fixed order. This stable format is essential for consistent hashing.
- Computes the SHA-256 hash of the canonical entry. This hash depends on all previous data (via prev_hash), creating the tamper-evident chain.
- Appends the computed hash to the canonical entry, creating the complete log record.
- Lastly, opens the log file in append mode (`a`) and writes the complete entry followed by a newline. The append mode ensures logs are only added instead of being overwritten.

#### 1.8 Log Entry

```py
def random_log_entry():
    """
    Generate a random log entry for testing purposes.
    """
    import random
    username = [
        "admin", "user1", "user2", "user3", "user4", "user5", "user6", "user7", "user8", "user9", "user10"
    ]
    username = random.choice(username)

    level = random.choice(["INFO", "WARNING", "ERROR", "SUCCESS"])

    message_warning = [
        "Disk space low",
        "High memory usage",
        "CPU temperature high",
        "Unusual login attempt",
        "Failed login attempt",
        "Configuration change detected",
    ]
    message_error = [
        "Failed to access resource",
        "Database connection failed",
        "Authentication failed",
        "Authorization denied",
        "System crash detected"
    ]
    message_success = [
        "Backup completed successfully",
        "Data migration completed",
        "System update applied",
        "User account created",
        "User account deleted"
    ]
    message_info = [
        "User logged in",
        "User logged out",
        "Scheduled task executed",
        "Configuration file updated",
        "Service started",
        "Service stopped"
    ]

    if level == "INFO":
        message = random.choice(message_info)
    elif level == "WARNING":
        message = random.choice(message_warning)
    elif level == "SUCCESS":
        message = random.choice(message_success)
    else:
        message = random.choice(message_error)

    timestamp = get_utc_timestamp()
    prev_hash = get_last_hash()

    canonical_entry = f"{username}|{timestamp}|{level}|{message}|{prev_hash}"
    current_hash = sha256(canonical_entry)
    
    with open(LOG_FILE, "a", encoding="utf-8") as f:
        f.write(f"{canonical_entry}|{current_hash}\n")
    
    return f"{canonical_entry}|{current_hash}".encode("utf-8")
```

- Defines a function that generates a random test log entry. Takes no parameters.
- Imports the `random` module to enable random selection from lists.
- Define a list of 11 usernames, then picks one randomly and stores it in `username`.
- Randomly selects one of four severity levels.
- Creates four separate lists of predefined messages, each tailored to their respective log level.
- Uses the log level to pick the appropriate message list, then randomly selects a message from that list.
- Retrieves the current UTC timestamp and the hash from the previous log entry to maintain the chain.
- Combines all fields in appropriate format.
- Computes the SHA-256 hash of the canonical entry.
- Opens the log file in append mode and writes the full entry (canonical data + hash) as a new line.
- Returns the complete log entry encoded as UTF-8 bytes. This return value is used by the client to send logs to the server.

In [1]:
import hashlib
from datetime import datetime, timezone
import os

LOG_FILE = "secure_logs.log"
GENESIS_HASH = "0" * 64  # Indicates start of chain


def sha256(data: str) -> str:
    """
    Compute SHA-256 hash of input string.
    """
    return hashlib.sha256(data.encode("utf-8")).hexdigest()


def get_utc_timestamp() -> str:
    """
    Return current UTC time in ISO 8601 format.
    Example: 2026-02-11T14:32:08Z
    """
    return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")


def get_last_hash() -> str:
    """
    Retrieve the last hash from the log file.
    If file does not exist or is empty, return GENESIS_HASH.
    """
    if not os.path.exists(LOG_FILE):
        return GENESIS_HASH

    with open(LOG_FILE, "r", encoding="utf-8") as f:
        lines = f.readlines()
        if not lines:
            return GENESIS_HASH

        last_line = lines[-1].strip()
        return last_line.split("|")[-1]  # current_hash


def generate_log(username: str = "admin", level: str = "INFO", message: str = "Test Log") -> None:
    """
    Generate a tamper-evident log entry and append it to disk.
    """

    # Validation
    allowed_levels = {"INFO", "WARNING", "ERROR", "SUCCESS"}
    if level not in allowed_levels:
        raise ValueError(f"Invalid log level: {level}")

    if "\n" in message:
        raise ValueError("Log message must not contain newline characters")

    username = username.strip()
    message = message.strip()

    timestamp = get_utc_timestamp()
    prev_hash = get_last_hash()

    # Canonical serialization
    canonical_entry = (f"{username}|{timestamp}|{level}|{message}|{prev_hash}")

    current_hash = sha256(canonical_entry)
    full_log_entry = f"{canonical_entry}|{current_hash}"

    with open(LOG_FILE, "a", encoding="utf-8") as f:
        f.write(full_log_entry + "\n")

def random_log_entry():
    """
    Generate a random log entry for testing purposes.
    """
    import random
    username = [
        "admin", "user1", "user2", "user3", "user4", "user5", "user6", "user7", "user8", "user9", "user10"
    ]
    username = random.choice(username)

    level = random.choice(["INFO", "WARNING", "ERROR", "SUCCESS"])

    message_warning = [
        "Disk space low",
        "High memory usage",
        "CPU temperature high",
        "Unusual login attempt",
        "Failed login attempt",
        "Configuration change detected",
    ]
    message_error = [
        "Failed to access resource",
        "Database connection failed",
        "Authentication failed",
        "Authorization denied",
        "System crash detected"
    ]
    message_success = [
        "Backup completed successfully",
        "Data migration completed",
        "System update applied",
        "User account created",
        "User account deleted"
    ]
    message_info = [
        "User logged in",
        "User logged out",
        "Scheduled task executed",
        "Configuration file updated",
        "Service started",
        "Service stopped"
    ]

    if level == "INFO":
        message = random.choice(message_info)
    elif level == "WARNING":
        message = random.choice(message_warning)
    elif level == "SUCCESS":
        message = random.choice(message_success)
    else:
        message = random.choice(message_error)

    timestamp = get_utc_timestamp()
    prev_hash = get_last_hash()

    canonical_entry = f"{username}|{timestamp}|{level}|{message}|{prev_hash}"
    current_hash = sha256(canonical_entry)
    
    with open(LOG_FILE, "a", encoding="utf-8") as f:
        f.write(f"{canonical_entry}|{current_hash}\n")
    
    return f"{canonical_entry}|{current_hash}".encode("utf-8")

### 2. Verification

```py
def verify_log_file(log_file: str = LOG_FILE) -> None:
    """
    Verify integrity of the tamper-evident log file.
    Prints verification result and first point of failure (if any).
    """

    if not os.path.exists(log_file):
        print("[ERROR] Log file does not exist.")
        return

    expected_prev_hash = GENESIS_HASH

    with open(log_file, "r", encoding="utf-8") as f:
        for line_number, line in enumerate(f, start=1):
            line = line.strip()

            if not line:
                continue  # skip empty lines

            parts = line.split("|")

            if len(parts) != 6:
                print(f"[TAMPER DETECTED] Line {line_number}: malformed entry")
                return

            username, timestamp, level, message, stored_prev_hash, stored_curr_hash = parts

            # Check previous hash consistency
            if stored_prev_hash != expected_prev_hash:
                print(f"[TAMPER DETECTED] Line {line_number}: previous hash mismatch")
                print(f"Expected: {expected_prev_hash}")
                print(f"Found:    {stored_prev_hash}")
                return

            # Recompute current hash
            canonical_entry = (f"{username}|{timestamp}|{level}|{message}|{stored_prev_hash}")
            computed_hash = sha256(canonical_entry)

            if computed_hash != stored_curr_hash:
                print(f"[TAMPER DETECTED] Line {line_number}: hash verification failed")
                print(f"Expected: {computed_hash}")
                print(f"Found:    {stored_curr_hash}")
                return

            # Move chain forward
            expected_prev_hash = stored_curr_hash

    print("[OK] Log file integrity verified. No tampering detected.")
```

- Defines a verification function that takes an optional log file path parameter. Returns None and prints results instead.
- Checks if the log file exists. If not, prints an error and exits early.
- Sets the starting expected hash to GENESIS_HASH (64 zeros). This will be updated as we verify each entry, forming the chain.
- Opens the log file in read mode to iterate through each line.
- Loops through each line, automatically tracking the line number (starting from 1). Removes leading/trailing whitespace.
- Skips any blank lines in the file.
- Splits the line by pipe delimiter to extract the six fields: username, timestamp, level, message, prev_hash, current_hash.
- Checks that the entry has exactly 6 fields. If not, the entry is corrupted, and verification fails immediately.
- Verifies that the stored previous hash matches what we expect from the previous entry. If it doesn't match, the chain is broken (entries were deleted, modified or reordered).
- Rebuilds the canonical format that was hashed when the entry was created.
- Recalculates the SHA-256 hash using the canonical entry.
- Compares the recomputed hash with the stored hash. If they don't match, the entry was modified after creation.
- Updates `expected_prev_hash` to the current entry's hash, so the next entry must reference this in its `prev_hash` field.
- Prints success message if all entries pass verification without breaking the chain.

In [2]:
def verify_log_file(log_file: str = LOG_FILE) -> None:
    """
    Verify integrity of the tamper-evident log file.
    Prints verification result and first point of failure (if any).
    """

    if not os.path.exists(log_file):
        print("[ERROR] Log file does not exist.")
        return

    expected_prev_hash = GENESIS_HASH

    with open(log_file, "r", encoding="utf-8") as f:
        for line_number, line in enumerate(f, start=1):
            line = line.strip()

            if not line:
                continue  # skip empty lines

            parts = line.split("|")

            if len(parts) != 6:
                print(f"[TAMPER DETECTED] Line {line_number}: malformed entry")
                return

            username, timestamp, level, message, stored_prev_hash, stored_curr_hash = parts

            # Check previous hash consistency
            if stored_prev_hash != expected_prev_hash:
                print(f"[TAMPER DETECTED] Line {line_number}: previous hash mismatch")
                print(f"Expected: {expected_prev_hash}")
                print(f"Found:    {stored_prev_hash}")
                return

            # Recompute current hash
            canonical_entry = (f"{username}|{timestamp}|{level}|{message}|{stored_prev_hash}")
            computed_hash = sha256(canonical_entry)

            if computed_hash != stored_curr_hash:
                print(f"[TAMPER DETECTED] Line {line_number}: hash verification failed")
                print(f"Expected: {computed_hash}")
                print(f"Found:    {stored_curr_hash}")
                return

            # Move chain forward
            expected_prev_hash = stored_curr_hash

    print("[OK] Log file integrity verified. No tampering detected.")

verify_log_file()

[OK] Log file integrity verified. No tampering detected.


### 3. Key Generation Script

```py
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

def generate_keys(name):
    private_key = rsa.generate_private_key(
        public_exponent=65537,
        key_size=2048
    )

    with open(f"{name}_private_key.pem", "wb") as f:
        f.write(
            private_key.private_bytes(
                serialization.Encoding.PEM,
                serialization.PrivateFormat.PKCS8,
                serialization.NoEncryption()
            )
        )

    with open(f"{name}_public_key.pem", "wb") as f:
        f.write(
            private_key.public_key().public_bytes(
                serialization.Encoding.PEM,
                serialization.PublicFormat.SubjectPublicKeyInfo
            )
        )

generate_keys("server")
generate_keys("client")
```

- Import the RSA cryptographic algorithm for asymmetric key generation.
- Imports utilities to convert cryptographic keys into file-friendly formats (like PEM).
- Defines a function that takes a name parameter to distinguish between server and client keys.
- Creates a 2048-bit RSA private key. `public_exponent=65537` is the standard value used in RSA (Fermat's fourth prime). Larger key size = stronger security but slower operations.
- Opens a new file (eg: `server_private_key.pem` or `client_private_key.pem`) in write-binary mode ("wb"). The with statement ensures the file closes automatically.
- Converts the private key object into a serialized format: PEM encoding (human-readable text format), PKCS8 format (standard private key format), with no encryption (password protection).
- Writes the serialized private key bytes to the file.
- Opens a new file for the public key (e.g., server_public_key.pem or client_public_key.pem) in write-binary mode.
- Extracts the public key from the private key object, then serializes it in PEM encoding and SubjectPublicKeyInfo format (standard public key format).
- Writes the serialized public key bytes to the file.
- Lastly, calls the function twice to create two key pairs: one for the server and one for the client. This enables asymmetric encryption and signatures for secure communication.

### 4. Remote server



### 5. Transmitting 

In [18]:
import random
import socket
import os
import struct
import time
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import padding
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

HOST = "127.0.0.1"
PORT = 5000

# Load server public key
with open("Server/server_public_key.pem", "rb") as f:
    server_public_key = serialization.load_pem_public_key(f.read())

# Load client private key (for signing)
with open("client_private_key.pem", "rb") as f:
    client_private_key = serialization.load_pem_private_key(
        f.read(), password=None
    )

# Generate AES session key
aes_key = AESGCM.generate_key(bit_length=256)
aesgcm = AESGCM(aes_key)

def sign(data: bytes) -> bytes:
    return client_private_key.sign(
        data,
        padding.PSS(
            mgf=padding.MGF1(hashes.SHA256()),
            salt_length=padding.PSS.MAX_LENGTH
        ),
        hashes.SHA256()
    )

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((HOST, PORT))

    # Send AES key encrypted with RSA
    enc_key = server_public_key.encrypt(
        aes_key,
        padding.OAEP(
            mgf=padding.MGF1(hashes.SHA256()),
            algorithm=hashes.SHA256(),
            label=None
        )
    )

    s.sendall(struct.pack(">I", len(enc_key)) + enc_key)
    print("[+] AES key sent securely")

    for _ in range(10):
        log = random_log_entry()
        signature = sign(log)

        payload = (
            struct.pack(">I", len(log)) +
            log +
            signature
        )

        nonce = os.urandom(12)
        ciphertext = aesgcm.encrypt(nonce, payload, None)

        packet = nonce + ciphertext
        s.sendall(struct.pack(">I", len(packet)) + packet)

        print("[+] Log sent securely")
        time.sleep(random.uniform(0.5, 2.0))


[+] AES key sent securely
[+] Log sent securely
[+] Log sent securely
[+] Log sent securely
[+] Log sent securely
[+] Log sent securely
[+] Log sent securely
[+] Log sent securely
[+] Log sent securely
[+] Log sent securely
[+] Log sent securely


In [17]:
verify_log_file()
verify_log_file("Server\\remote_logs.log")

[OK] Log file integrity verified. No tampering detected.
[OK] Log file integrity verified. No tampering detected.
