# LAB 4

Networks and Systems Security
Week 04
Malicious Software

### Section 1: File Integrity Checker

Objective

To understand how host-based antivirus systems detect unauthorized
file modifications by using file hashing as a baseline for system integrity.

Key Concepts
- Integrity Checking: A preventive technique that detects
unauthorized changes to files or system components.
- Hashing (SHA-256): Generates a unique fingerprint for each file;
even a one-bit change produces a completely different hash.
- Baseline Comparison: Security systems record “clean” file
hashes and later compare them to detect tampering or infection.
- Malware Connection: Many viruses modify executable files —
this method detects such unauthorized changes.

code

In [None]:
# Script to loop through the files and generate a SHA256 hash for each. 
# The results should be saved in a csv file with the file name and the hash and time stamp. 

import hashlib
import os
import datetime
import csv

#function to hash files in a given directory
def hashfile(file_path):
    file_hashes = []

    for i in os.listdir(file_path):
        sha256 = hashlib.sha256()

        with open(os.path.join(file_path, i), 'rb') as f:
            data = f.read()
            sha256.update(data)
        file_hashes.append({"file": i, "hash": sha256.hexdigest(),"time": datetime.datetime.now()})

    return(file_hashes)


def writecsv(file_hashes, output_file):
    with open(output_file, "w", newline="") as csvfile:
        fieldnames = ["file", "hash", "time"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(file_hashes)


# file_path = input("Enter the path of the folder: ")
file_path = "C://Users//kierm//Desktop//Goldsmiths//Y3//Net and sys security//Workshop//4 Malicious Software//Data"

file_hashes = hashfile(file_path)
print(f"Hashes:{file_hashes}")

#writing the hashes to a csv file

# writecsv(file_hashes, "hashes.csv")
writecsv(file_hashes, "new-hashes.csv")

Hashes:[{'file': '1 one.txt', 'hash': 'dd982210e11dcf0f614c64101ef75ac1eed4093f7d4d0b82bfed9170a21e9384', 'time': datetime.datetime(2025, 12, 8, 11, 24, 6, 205418)}, {'file': '3 three.txt', 'hash': '41f79d28481e4f4c9c65fd14550e7fb51f3c05db5f616aa8a2d5cfa599d4c532', 'time': datetime.datetime(2025, 12, 8, 11, 24, 6, 223334)}, {'file': '4 four.txt', 'hash': '070f302b5b07bf09ccd0e3a39502d9fcb00a3dab84772438a23a417229bf14bd', 'time': datetime.datetime(2025, 12, 8, 11, 24, 6, 231174)}, {'file': '5 five', 'hash': 'de1b715513f19c0ef05a50b5e3b5742f17f73e02223a7289472cd06da4d32ba0', 'time': datetime.datetime(2025, 12, 8, 11, 24, 6, 256234)}]


#### Analysis
The script scans the data folder, hashes each file, records the file name, hash and timestamp and then stores it all in a CSV file

### Discussion points
We rely on file hashes instead of timestamps here since malware can hide changes by altering these timestamps, and inserting malicious code without altering the file size. Hashes instead can detect tiny changes, producing unique irreversible fingerprints or the files. 

Rootkits are a great example of this, usually pretending files are unchanged which allows them to hide malicious code.

Big updates are prime time for these malicious acts to be done since they would update and change many hashes. To combat this you should confirm updates are trusted with signatures, trusted sources and schedules, keeping the original files as backup to revert back to in for emergencies. 

### Section 2: Detecting Suspicious File Changes
Objective

To simulate how malware detection software identifies file tampering by
comparing current file hashes to a trusted baseline.

Key Concepts
- Change Detection: Compares new hashes against the baseline to
flag altered or deleted files.
- Heuristic Insight: Sudden modification of system or executable
files often indicates infection.
- Forensics Use: Detecting when and which files changed helps
trace intrusion paths.

code

In [35]:
#function to detect changes between two csv files
def detect_changes(original_file, new_file):
    #load original hashes
    with open(original_file, "r") as csvfile:
        reader = csv.DictReader(csvfile)
        original_hashes = {row["file"]: row["hash"] for row in reader}

    #load new hashes
    with open(new_file, "r") as csvfile:
        reader = csv.DictReader(csvfile)
        new_hashes = {row["file"]: row["hash"] for row in reader}

    #detect changes
    changes = []
    for file, new_hash in new_hashes.items():
        original_hash = original_hashes.get(file)
        if original_hash is None:
            changes.append((file, "New file added"))
        elif original_hash != new_hash:
            changes.append((file, "File modified"))

    for file in original_hashes.keys():
        if file not in new_hashes:
            changes.append((file, "File deleted"))

    return changes


changes = detect_changes("hashes.csv", "new-hashes.csv")
for change in changes:
    print(f"{change[0]}: {change[1]}")

3 three.txt: File modified
4 four.txt: File modified
5 five: New file added
2 two.txt: File deleted


#### Analysis
Here I load both the original hashes, and new generated hashes, comparing them to see if files were added, modified, or completely removed. This then printed in a simple readable string, which as you can see shows us exactly what has changed.

#### Discussion points
Hash checking can provide invaluable information for antivirus scanning, which can suggest early warnings of tamporing before malware is fully recognised by catching small changes signature based antivirus might miss.

A rootkit in this example however, could intercept these reads by returning fake file contents and hashes, meaning my script would interpret them as clean without ever seeing the real infected versions. In a real scenario this allows rootkits to go undetected for an incredibly long time, compromising systems and confidentiality.

If I wanted to, I can replace any modified or deleted files with the original to ensure any and all changes detected dont go through. I could further expand on this by alerting a user or admin to confirm changes if they want them.

### Section 3: Signature-Based Malware Detection
Objective

To understand how early antivirus programs detected malware using
pattern matching, and why modern malware easily bypasses this
method.

Key Concepts
- Signatures: Unique byte or code patterns associated with known
malware.
- Pattern Matching: Scans files for predefined suspicious
expressions or function calls.
- Weakness: Fails against obfuscated, encrypted, or polymorphic
malware.
- Modern Shift: Behavioural analysis and machine learning now
complement signature detection.

code

In [None]:
# Code to Scan for Suspicious Patterns such as eval(), base64.b64decode, socket.connect, exec(), import os
import os
import re

SIGNATURES = [r"eval\(", r"base64\.b64decode", r"socket\.connect", r"exec\(", r"import os"]

def scan_for_suspicious_patterns(file_path):
    for root, dirs, files in os.walk(file_path):
        for i in files:
            with open(os.path.join(root, i), "r") as fp:
                lines = fp.readlines()
                for line_number, row in enumerate(lines, start=1):
                    for j in range(len(SIGNATURES)):
                        if re.search(SIGNATURES[j], row):
                            print("\nString exists in file:", i)
                            print("Line Number:", line_number)
                            print("Line Content:", row.strip())
                            print("Matched Signature:", SIGNATURES[j])
     

scan_for_suspicious_patterns(file_path)



String exists in file: 3 three.txt
Line Number: 3
Line Content: import os
Matched Signature: import os

String exists in file: 5 five
Line Number: 1
Line Content: base64.b64decode
Matched Signature: base64\.b64decode


#### Analysis
I simulate classic signature based malware detection by scanning though each file in data for specific suspicious patterns. If a pattern is found it alerts the user by printing the file name, line number, content and matched signature, allowing us to verify the suspected malware.

#### Discussion points
Polymorphic/metamorphic viruses constantly change their code, making them difficult to detect with this sort of fixed pattern detection, allowing the virus to stay undetected by simply changing.

Attackers can also exploit this system by intentionally flooding the scanner with false detections, leading to mislead analysis making real threats harder to detect.

Heuristic and behavioural scanners detect through suspicious actions and runtime behaviour instead of exact code patterns, allowing them to detect malware that would otherwise be missed, improving our scanner a lot.

My reserach clearly reflecting how modern detection methods must evolve alongside shapeshifting malware in order to keep credentials and systems safe.

### Section 4: Worm Propagation Simulation
Objective

To visualize how a worm spreads rapidly across a network using random
and opportunistic scanning strategies.

Key Concepts
- Worms: Self-contained programs that propagate through networks
without user action.
- Scanning: Worms randomly or strategically locate vulnerable
systems.
- Propagation Dynamics: The infection rate depends on network
topology and available hosts.
- Containment: Rate limiting, ingress filtering, and anomaly
detection reduce spread.

code

In [31]:
import random

def simulate_infection(hostnum, attempts):
    hosts = list(range(hostnum))
    infected = [random.choice(hosts)]
    timeline = [] 

    for step in range(50):
        newinfections = []
        for i in infected[:]:
            for j in range(attempts):
                target = random.choice(hosts)
                if target not in infected and target not in newinfections:
                    newinfections.append(target)

        infected.extend(newinfections) 
        timeline.append(len(infected)) 
        print(f"Step {step}: {len(infected)} infected")
        if len(infected) == hostnum:
            break

    print("Simulation Complete!")
    return timeline 

print("Sim 1")
simulate_infection(100, 2)
print("\nSim 2")
simulate_infection(100, 4)

Sim 1
Step 0: 3 infected
Step 1: 8 infected
Step 2: 22 infected
Step 3: 46 infected
Step 4: 84 infected
Step 5: 99 infected
Step 6: 100 infected
Simulation Complete!

Sim 2
Step 0: 5 infected
Step 1: 22 infected
Step 2: 69 infected
Step 3: 98 infected
Step 4: 100 infected
Simulation Complete!


[5, 22, 69, 98, 100]

#### Analysis
This simulates a network of hosts, which can potentially get infected by a worm which scans randomly for vulnerable hosts a select number of times, infecting them when found. This being an effective worm propagation simulation, printing out the infection count at each step to show us how quickly the work spreads.

#### Discussion Points
Doubling the attempts per host increases the speed at which the worm spreads exponentially, compromising entire networks in half the steps. This shows how quickly a worm in a real invironment might spread, really displaying how dangerous they can be.

Staying to a local subnet allows worms to avoid triggering detections on wider scans, using vulnerable hosts and common IP ranges on local nets, allowing faster infection with less noise.

Therfore rate limiting seems the best for stopping these attcks, slowing down the infection process and stopping worms explosive spread, allowing us to find it more easily via thresholding.

#### Mini reflection
I struggled to start this coding exercise for a very long time, lacking in understanding for how to really simulate worm propagation. Theoretically I understood the concept, knowing it will likely be an exponential infection process accoss hosts, but making sure this was handled correctly between hosts and infected hosts, making sure newly infected hosts dont start infecting straight away, and keeping track of the propagation progress took longer than expected. This made me appreciate worm growth and tracking, and deepend my understanding of propagation stopping strageties working on this sim example of real infections.

### Section 5: Countermeasure Design Challenge
Objective

To consolidate learning by designing a layered defence plan combining
prevention, detection, and mitigation techniques.

Key Concepts
- Defence in Depth: Layering multiple, complementary protection
methods.
- Host-based vs. Network-based: Combining internal integrity
monitoring with perimeter defences.
- User Awareness: Social engineering remains one of the weakest
links.
- Resilience: Recovery and containment strategies matter as much
as prevention.

code


In [33]:
def detect_anomalies(timeline, threshold):
    anomalies = []
    for step in range(1, len(timeline)):
        growth = timeline[step] - timeline[step - 1]
        if growth > threshold:
            anomalies.append((step, growth))

    if anomalies:
        for step, growth in anomalies:
            print(f"Step {step}: Sudden growth of {growth} infections")
    else:
        print("No anomalies detected.")

print("Running simulation:")
timeline = simulate_infection(100, 2)

print("\nAnomaly Detection Results:")
anomalies = detect_anomalies(timeline, 10)

Running simulation:
Step 0: 3 infected
Step 1: 8 infected
Step 2: 18 infected
Step 3: 43 infected
Step 4: 70 infected
Step 5: 95 infected
Step 6: 100 infected
Simulation Complete!

Anomaly Detection Results:
Step 3: Sudden growth of 25 infections
Step 4: Sudden growth of 27 infections
Step 5: Sudden growth of 25 infections


#### Analysis
To detect abnormal behaviour on a network, I check for abnormally large jumps in infections based on my threshold. This mimics how real systems would detect abnormality through sudden activity spikes.

#### Discussion points
Malware outbreaks begin usually with either backdoor and zero day exploits, or most commonly tricking users with phishing and scam links, therfore usually the first layer of defense broken is at user level.

Large and diverse datasets of normal vs malicious network behaviour can train categorical ai models to predict abnormal patterns on a network. Trained on training data in a closed environment to allow for proper tuning - filtering out noise and balancing out false positives with detection sensitivity. Tested on separate, and unlinked test data.

To avoid false positives security teams should get familiar with average network data, and avoid overly sensitive settings that would overwhelm detectors with noise. This way they can identify when a real outbreak might be accuring based on application, time and network flow.

#### Mini reflection
Again, I wasnt too sure on how to handle this task, with my original plan being to simulate a network with a sudden activity spike for detection, but this ended up becoming too complicated. This is why I opted for a much simpler solution, using my previous worm propegation function - assuming large spikes in infection would be a cause of and therfore equivalent to unusual network activity spikes. Theoretically, as a simple demo sim, this works, allowing me to get a more hands on experience with thresholding, understanding its detection concept better.