## Section 1: File Integrity Checker

Comparing hashes are an incredibly efficient way of checking the integrity of a file. Changing even a single byte of a file causes the entire hash of the file to change. Knowing this behaviour, we can compare a file against its known hash to check if it has been changed or modified in any way.

The following code makes a record of hashes of all the files in the '`Folder`' directory.

In [1]:
import os
from hashlib import sha256
import datetime

with open('hashes.csv', 'w', newline='') as x:
    x.write(f"Name,Hash,Timestamp\n")
    for name in os.listdir('Folder'):
        with open(os.path.join(os.getcwd(), 'Folder', name), 'rb') as f:
            fileContent = f.read()
            hash = sha256(fileContent).hexdigest()
            now = datetime.datetime.now()

            x.write(f"{name},{hash},{now}\n")

# Section 2: Detecting Suspicious File Changes

The following function compares the contents of any folder against the record of hashes to evaluate the integrity of the files in that folder.

In [80]:
import csv

def compare(folder):
    baseline = {}

    with open('hashes.csv') as x:
        reader = csv.DictReader(x)

        for row in reader:
            baseline[row['Name']] = row['Hash']

    for name in os.listdir(folder):
        with open(os.path.join(os.getcwd(), folder, name), 'rb') as f:
            fileContent = f.read()
            hash = sha256(fileContent).hexdigest()

            # Checks if the file is in the baseline.
            if name in baseline:
                # Checks if the file in the baseline has been tampered with.
                if baseline[name] == hash:
                    print(f"The file: '{name}', is intact. \n")
                else:
                    print(f"The file: '{name}', is compromised.")
                    now = datetime.datetime.now()
                    print(f'Time of last check: {now}. \n')
            # Checks if a renamed version of the file is there.
            elif hash in baseline.values():
                print(f"The file: '{name}', was renamed. File content is intact \n")
            # Checks if the file is a new a file.
            else:
                print(f"The file: '{name}', is a new file with no record in the baseline. \n")


compare('New Folder')

The file: 'Erm.txt', is a new file with no record in the baseline. 

The file: 'Funny Cat Pic OG.jpg', was renamed. File content is intact 

The file: 'Funny Cat Pic.jpg', is compromised.
Time of last check: 2025-11-06 19:27:18.239512. 

The file: 'Hmmm.txt', is intact. 



# Section 3: Signature-Based Malware Detection

## Polymorphic and Metamorphic Viruses

Metamorphic and polymorphic malware are two types of malware that can change their code as they propagate through a system. The main difference between them is that polymorphic malware can morph itself to change its code using a variable encryption key, whereas metamorphic malware rewrites its code without an encryption key.

Polymorphic malware uses an encryption key to change its shape and signature. Polymorphic malware consists of two parts, namely:
1. Encrypted virus body. Code that changes its shape.
2. Virus decryption routine. Code that doesn't change its shape and decrypts and encrypts the other part.

Metamorphic malware is rewritten with each iteration without using an encryption key. Malware authors use multiple transformation tehniques to do this, such as:
- register renaming
- code permutation
- code expansion
- code shrinking
- garbage code insertion

For both, polymorphic and metamorphic viruses, the changing code make traditional signature-based virus detection methods redundant. Even if the virus is detected and placed on a blocklist, the virus will soon have transformed, making it untraceable once again. Polymorphic viruses have a static code for its virus decryption routine, this makes it more susceptible to detection than metamorphic viruses. (https://www.techtarget.com/searchsecurity/definition/metamorphic-and-polymorphic-malware#:~:text=The%20main%20difference%20between%20them,code%20without%20an%20encryption%20key.)

One way of effectively detecting the above malware is by using signature-less malware detection, such as ML algorithms, to analyse the broader picture and extract 'features' fromt he files analysed. For example, the ML model can look at the randomness in various areas of the file. (https://www.crowdstrike.com/en-gb/cybersecurity-101/malware/polymorphic-virus/)

## Signature-Based Malware Detection Simulation:

In [7]:
import os
import re

def scan_files(directory):
    # Define the suspicious signatures
    signatures = [r"eval\(", r"base64\.b64decode", r"socket\.connect", r"exec\(", r"import os"]
    
    print(f"--- Scanning Directory: {directory} ---")

    # Walk through the directory and all subdirectories
    for root, dirs, files in os.walk(directory):
        for filename in files:
            path = os.path.join(root, filename)
            
            # # Skip scanning this script itself
            # if filename == os.path.basename(__file__):
            #     continue
            
            # Open file with 'errors=ignore' to handle non-text files without crashing
            with open(path, "r", encoding="utf-8", errors="ignore") as f:
                content = f.read()
                
            # Check for matches
            matches = []
            for sig in signatures:
                if re.search(sig, content):
                    matches.append(sig)
            
            # Print results if matches are found
            if matches:
                print(f"[ALERT] Found in {path}: {matches}")

    print(f"--- End of Scan for Directory: {directory} ---\n")

In [6]:
scan_files(".")

--- Scanning Directory: . ---
[ALERT] Found in .\Workshop 4.ipynb: ['import os']
--- End of Scan for Directory: . ---



Malware signatures were found in `./Workshop 4.ipynb`, which is the name of this current working file. This makes sense as `import os` is used more than once.

# Section 4: Worm Propagation Simulation

A worm is a type of malware that can self-propagate (self-replicate) without human interaction. It can also use the victim's internet or LAN to spread itself further to other hosts on the network. 

Worms exploit vulnerabilities in the victim's security to steal sensitive information, install backdoors that can be used to access the system, corrupt files and more. They also consume large amounts of memory and bandwidth causing the system to malfunction. (https://www.fortinet.com/uk/resources/cyberglossary/worm-virus)

The following code simulates how a worm would propagate through a network of 500 computers.

In [10]:
import random

TOTAL_HOSTS = 500 # Total number of hosts in the network
ATTEMPTS_PER_INFECTED_HOST = 5 # Number of scan attempts per infected host per time step
SUCCESS_RATE = 0.20 # Probability of successful infection upon scanning a vulnerable host

def simulate_worm_spread(total_hosts, attempts, success_rate):
    # Initialise network state: 0 = Clean, 1 = Infected
    network = [0] * total_hosts
    network[0] = 1 # Start with 1 infected host
    
    print("--- Worm Propagation Simulation ---")
    
    time_step = 0
    # Stop when all hosts are infected or maximum time is reached
    while sum(network) < total_hosts and time_step < 50:
        infected_count = sum(network)
        print(f"Time Step {time_step}: {infected_count} infected hosts.")
        
        newly_infected_indices = set()
        
        # Get indices of currently infected hosts
        infected_indices = []
        for i, state in enumerate(network):
            if state == 1:
                infected_indices.append(i)
        
        # Each infected host attempts to scan new hosts
        for attempt in range(attempts):
            for infected_index in infected_indices:
                # 1. Randomly select a target host index
                target_index = random.randint(0, total_hosts - 1)
                
                # 2. Check if the target host is vulnerable (clean)
                if network[target_index] == 0:
                    # 3. Check for successful infection based on SUCCESS_RATE
                    if random.random() < success_rate:
                        newly_infected_indices.add(target_index)

        # Update the network state with newly infected hosts
        for index in newly_infected_indices:
            network[index] = 1

        time_step += 1
        
    print(f"Time Step {time_step}: {sum(network)} infected hosts. Simulation Complete.")

simulate_worm_spread(TOTAL_HOSTS, ATTEMPTS_PER_INFECTED_HOST, SUCCESS_RATE)

--- Worm Propagation Simulation ---
Time Step 0: 1 infected hosts.
Time Step 1: 2 infected hosts.
Time Step 2: 3 infected hosts.
Time Step 3: 7 infected hosts.
Time Step 4: 16 infected hosts.
Time Step 5: 33 infected hosts.
Time Step 6: 60 infected hosts.
Time Step 7: 109 infected hosts.
Time Step 8: 188 infected hosts.
Time Step 9: 296 infected hosts.
Time Step 10: 390 infected hosts.
Time Step 11: 452 infected hosts.
Time Step 12: 481 infected hosts.
Time Step 13: 492 infected hosts.
Time Step 14: 499 infected hosts.
Time Step 15: 499 infected hosts.
Time Step 16: 500 infected hosts. Simulation Complete.


#### 1. How does the infection curve change if you double the number of attempts per host?

Doubling the number of attemps per host will steepen the curve. This is because the worm spreads at a faster speed, as more scans mean greater change of the worm successfully infecting a host. The curve will level off at the same point as the number of hosts remains 500.

#### 2. Why might a worm choose a local subnet propagation strategy?

A local subnet strategy means the worm will first infect hosts on the same local subnet (hosts with the same network prefix). This allows the worm to spread quickly due to low latency and high speeds. By using this strategy, worms can also often bypass security such as firewalls.

#### 3. Which containment strategy (rate limiting, scan detection, thresholding) would be most effective here?

Scan detection is the is most effective against worms. It works by looking for two specific patterns: port scanning and IP sweeping. Port scanning is when one device is checking many different ports on a single target. IP sweeping is when one device is checking the same port on many different computers. It can then quarintine the the device.

Rate limiting also works against worms. It controls the volume of traffic allowed to pass through the network. This slows down the spread of the worm from exponential rate to an almost linear one. (https://www.cloudflare.com/en-gb/learning/bots/what-is-rate-limiting/)

Thresholding is the mechanism used to configure the strategies above. It works by setting numerical limits, for example, "Alert if > 100 connections per second". This is a static strategy and is dependent on how well the limits are tuned. It can often result in false positives.

# Section 5: Countermeasure Design Challenge

The code below is a network monitoring function.

In [11]:
import psutil
import time
from collections import defaultdict

def monitor_network():
    # Configuration: How many connections trigger an alert
    threshold = 15
    check_interval = 5  # Seconds to wait between scans
    
    print(f"--- Monitoring Network (Alert Threshold: {threshold}) ---")
    
    while True:
        # Reset counts for this time step
        connection_counts = defaultdict(int)
        
        # 1. Get all active network connections
        connections = psutil.net_connections(kind='inet')
        
        # 2. Count connections for each Process ID (PID)
        for conn in connections:
            if conn.status == 'ESTABLISHED' and conn.pid:
                connection_counts[conn.pid] += 1
        
        # 3. Check if any process exceeds the threshold
        for pid, count in connection_counts.items():
            if count > threshold:
                proc_name = psutil.Process(pid).name()
                
                print(f"[ALERT] High Traffic: {proc_name} (PID: {pid}) has {count} connections.")
        
        # Wait before scanning again
        time.sleep(check_interval)

monitor_network()

--- Monitoring Network (Alert Threshold: 15) ---
[ALERT] High Traffic: chrome.exe (PID: 22120) has 19 connections.
[ALERT] High Traffic: chrome.exe (PID: 22120) has 31 connections.


KeyboardInterrupt: 

#### 1. Which layer of defence failed first in most real-world malware outbreaks?

Human layer or patch management.

Phishing is often used as a gateway for attacks. This is because humans make mistakes. Humans are often far from objective, and even with ample training, external factors may affect their judgement.

A failure to regularly update security systems is also often a primary point of failure. For example, the WannaCry ransomware worm used the EternalBlue exploit to propagate. Microsoft had already released patches close the eploit before the attack started, but many organisations failed update their Windows systems. (https://en.wikipedia.org/wiki/WannaCry_ransomware_attack)

#### 2. How can AI-based systems be trained to detect abnormal patterns safely?

The AI runs on a clean network first to learn what "normal" looks like. Engineers then create synthetic data that mimics the behavior of malware (like rapid scanning) without the destructive payload. This allows the AI to practice spotting anomalies without putting real data at risk.

#### 3. How should security teams balance detection sensitivity and false positives?

If a detection system is too sensitive it will raise false alerts/positives. This may cause 'alert fatigue' in the analysts making them lax. They might miss the real attack in between the noise.

On the other hand, if a system only alerts for 100% known malware, it will miss zero-day attacks.

To combat this teams often use contextual scoring, which means instead of just alerting yes or no, the system returns a risk score that tells the analysts the likelihood that the connection is a malware.