# Week 4

### Malicious Software

This notebook goes over key behavious related to malware.

- file integrity checking
- signature scanning
- worm propagation

# Section 1: File Integrity Checker

Key Concepts:

- Integrity Checking: A preventive technique that detects unauthorized changes to files or system components.
- Hashing (SHA-256): Generates a unique fingerprint for each file; even a one-bit change produces a completely different hash.
- Baseline Comparison: Security systems record “clean” file hashes and later compare them to detect tampering or infection.
- Malware Connection: Many viruses modify executable files — this method detects such unauthorized changes.

The code below is a script which loops through the files in the folder ```files``` and generates a SHA-256 hash for each. The results are saved in a CSV file with the file name, hash and time stamp.

In [None]:
import os, hashlib, csv
from datetime import datetime

folder_path = './files'

records = []
csv_output = "file_hashes.csv"

# loop through files in the folder
for filename in os.listdir(folder_path):
    file_path = os.path.join(folder_path, filename)

    if os.path.isfile(file_path):
        # compute hash
        hash = hashlib.sha256()
        with open(file_path, "rb") as f:
            # hash file in chunks of 4096 bytes
            for block in iter(lambda: f.read(4096), b""):
                hash.update(block)
        
        hash_hex = hash.hexdigest()
        timestamp = datetime.now().isoformat()

        records.append((filename, hash_hex, timestamp))

# create CSV file and write records
with open(csv_output, "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["Filename", "SHA256 Hash", "Timestamp"])
    writer.writerows(records)

print(f"Hashes written to {csv_output}")

Hashes written to file_hashes.csv


Security systems rely on file hashes instead of time stamps or file sizes because timestamps and file sizes are easily manipulated. A malicious actor can change a file without affecting the timestamp or size in a noticeable way. Hashes (such as SHA-256) provide a unique fingerprint of a file's content. Even a single-byte change will result in a completely different hash, making it obvious when someone has tampered with a file. This is a strong method of integrity verification.

Malware can try to replace legitimate files with malicious ones of the same size to fool naive checks based on size alone. Sophisticated malware can modify a file and restore the original hash by exploiting weak hashing algorithms or using a hash collision attack. A collision attack is weakness where two different inputs can produce the exact same, identical hash values. This is rare, but it can happen. Lastly, some malware will entirely disable or bypass integrity-checking software.

What happens if a legitimate system update changes many hashes?
When an update legitimately modifies files, their hashes will naturally change. If the system is too strict, it would flag these as false-positives. To avoid this a system should:
1. Maintain a whitelist of trusted software updates. 
2. Use signed update packages to verify the authenticty of an update before updating the hash database.
3. Updating the hashes immediately after an update, so that they match the most up to date files.

# Section 2: Detecting suspicious file changes

This section simulates how a malware detection software would intentify file tampering by comparing current file hashes to a trusted bassline.

- Change Detection: Compares new hashes against the baseline to flag altered or deleted files.
- Heuristic Insight: Sudden modification of system or executable files often indicates infection.
- Forensics Use: Detecting when and which files changed helps trace intrusion paths.

The code below takes the files in the folder and compares them to the previously created CSV.

In [None]:
import os, hashlib, csv
from datetime import datetime

folder_path = './files'

new_hashes = []
csv_output = "file_hashes.csv"

# loop through files in the folder
for filename in os.listdir(folder_path):
    file_path = os.path.join(folder_path, filename)

    if os.path.isfile(file_path):
        # compute hash
        hash = hashlib.sha256()
        with open(file_path, "rb") as f:
            # hash file in chunks of 4096 bytes
            for block in iter(lambda: f.read(4096), b""):
                hash.update(block)
        
        hash_hex = hash.hexdigest()
        timestamp = datetime.now().isoformat()

        new_hashes.append((filename, hash_hex, timestamp))


# load previous hashes from CSV into a dictionary

previous_hashes = {}

try:
    with open(csv_file, "r", newline="") as csvf:
        reader = csv.DictReader(csvf)
        for row in reader:
            previous_hashes[row["Filename"]] = row["SHA256 Hash"]
except FileNotFoundError:
    print("File not found.")

# compare new hashes with previous hashes
mismatches = []
for new_filename, new_hash, new_timestamp in new_hashes:


    eeuugghhh



