<a href="https://colab.research.google.com/github/Kolawole-a2/Kola_Projects/blob/main/SEAS8416__AFOLABI_Assignment8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Our Objective:**

Log analysis function to detect when 2 event types take place (not correlated, they can happen independently). This demonstrates regular expressions (which will require user to investigate the re python library and how regular expressions are built) to scan for a log entry. If either of the events takes place, it will call another function that alerts an administrator.


**The function below runs the interactive interface.**

Here’s what will happen in order:

1. File Upload: Prompts the user to upload a log or CSV file.

2. Show Sample Lines: Displays 5 random lines from the file so the user can preview the content.

3. Create Dropdowns: Lets the user pick two predefined regex patterns to test.

4. Text Inputs: Shows the regex tied to each dropdown. Users can also edit them manually.

5. Update Mechanism: If a user picks a new dropdown value, the corresponding text input updates.

6. Test Button: Runs the selected patterns on the 5 sample lines.

7. Run Button: Scans the full file and prints alerts for each match.



In [3]:
import re
import io
import random
import pandas as pd
from google.colab import files
import ipywidgets as widgets
from IPython.display import display, clear_output

# Common regex patterns for dropdown
common_patterns = {
    "Login Failure": r"login failure",
    "Unauthorized Access": r"unauthorized access",
    "Backdoor Attack": r"backdoor",
    "Date Pattern (YYYY-MM-DD)": r"\d{4}-\d{2}-\d{2}",
    "IP Address": r"\b(?:\d{1,3}\.){3}\d{1,3}\b",
    "User 'admin'": r"user\s*'admin'",
    "Custom Pattern": ""
}

def alert_admin(event_type, log_entry):
    print(f"[ALERT] Event '{event_type}' detected! Log entry: {log_entry.strip()}")

def test_regex_on_lines(pattern, lines):
    try:
        regex = re.compile(pattern, re.IGNORECASE)
        print(f"\n🔍 Testing pattern: {pattern}")
        for i, line in enumerate(lines):
            if regex.search(line):
                print(f"✅ Match on Line {i+1}: {line.strip()}")
            else:
                print(f"❌ No match on Line {i+1}: {line.strip()}")
    except re.error as e:
        print(f"⚠️ Invalid regex: {e}")

def analyze_log(lines, pattern1, pattern2):
    event_pattern1 = re.compile(pattern1, re.IGNORECASE)
    event_pattern2 = re.compile(pattern2, re.IGNORECASE)
    found = False

    for line in lines:
        if event_pattern1.search(line):
            alert_admin(pattern1, line)
            found = True
        if event_pattern2.search(line):
            alert_admin(pattern2, line)
            found = True

    if not found:
        print("\nNo matching events were found.")

def run_gui():
    print("📁 Please upload your log or CSV file:")
    uploaded = files.upload()

    if not uploaded:
        print("No file uploaded.")
        return

    filename = next(iter(uploaded))
    with open(filename, 'r', encoding='utf-8', errors='ignore') as f:
        lines = f.readlines()

    # Sample 5 lines to show user
    print(f"\n📄 File '{filename}' loaded. Showing 5 sample lines for testing:")
    sample_lines = random.sample(lines, min(5, len(lines)))
    for i, line in enumerate(sample_lines):
        print(f"Line {i+1}: {line.strip()}")

    # Dropdowns and text inputs for patterns
    dropdown1 = widgets.Dropdown(
        options=list(common_patterns.keys()),
        value="Login Failure",
        description='Event 1:',
    )
    pattern1_input = widgets.Text(description='Pattern 1:', value=common_patterns[dropdown1.value])

    dropdown2 = widgets.Dropdown(
        options=list(common_patterns.keys()),
        value="Unauthorized Access",
        description='Event 2:',
    )
    pattern2_input = widgets.Text(description='Pattern 2:', value=common_patterns[dropdown2.value])

    def update_pattern1(change):
        pattern1_input.value = common_patterns[change.new]

    def update_pattern2(change):
        pattern2_input.value = common_patterns[change.new]

    dropdown1.observe(update_pattern1, names='value')
    dropdown2.observe(update_pattern2, names='value')

    button_test = widgets.Button(description="🧪 Test Patterns on Sample Lines")
    button_run = widgets.Button(description="🚀 Scan Full File")

    def on_test_click(b):
        clear_output(wait=True)
        display(dropdown1, pattern1_input, dropdown2, pattern2_input, button_test, button_run)
        print(f"\n🔎 Sample lines from '{filename}':")
        for i, line in enumerate(sample_lines):
            print(f"Line {i+1}: {line.strip()}")
        test_regex_on_lines(pattern1_input.value, sample_lines)
        test_regex_on_lines(pattern2_input.value, sample_lines)

    def on_run_click(b):
        clear_output(wait=True)
        print(f"📊 Analyzing '{filename}' with your patterns...\n")
        analyze_log(lines, pattern1_input.value, pattern2_input.value)

    button_test.on_click(on_test_click)
    button_run.on_click(on_run_click)

    display(dropdown1, pattern1_input, dropdown2, pattern2_input, button_test, button_run)

# Run the GUI-driven analyzer
run_gui()


Dropdown(description='Event 1:', index=6, options=('Login Failure', 'Unauthorized Access', 'Backdoor Attack', …

Text(value='authentication failure', description='Pattern 1:')

Dropdown(description='Event 2:', index=1, options=('Login Failure', 'Unauthorized Access', 'Backdoor Attack', …

Text(value='unauthorized access', description='Pattern 2:')

Button(description='🧪 Test Patterns on Sample Lines', style=ButtonStyle())

Button(description='🚀 Scan Full File', style=ButtonStyle())


🔎 Sample lines from 'linux log file (2).txt':
Line 1: Jul  3 23:16:09 combo ftpd[763]: connection from 62.99.164.82 (62.99.164.82.sh.interxion.inode.at) at Sun Jul  3 23:16:09 2005
Line 2: Jul 25 23:24:09 combo ftpd[26475]: connection from 217.187.83.50 () at Mon Jul 25 23:24:09 2005
Line 3: Jul 10 16:03:18 combo sshd(pam_unix)[30682]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=150.183.249.110  user=root
Line 4: Jun 28 20:58:46 combo sshd(pam_unix)[12665]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=62-192-102-94.dsl.easynet.nl  user=root
Line 5: Jul 10 13:17:22 combo ftpd[30284]: connection from 220.94.205.45 () at Sun Jul 10 13:17:22 2005

🔍 Testing pattern: authentication failure
❌ No match on Line 1: Jul  3 23:16:09 combo ftpd[763]: connection from 62.99.164.82 (62.99.164.82.sh.interxion.inode.at) at Sun Jul  3 23:16:09 2005
❌ No match on Line 2: Jul 25 23:24:09 combo ftpd[26475]: connection from 217.187.83.50 () at Mon Jul 

**Work Summary Note**

This project is a simple but powerful log analysis tool designed to help detect specific patterns in log files that may indicate suspicious or unauthorized activity. It combines regular expressions (regex), interactive widgets, and user-friendly file uploads, making it especially suitable for beginners, analysts, or students who want to explore cybersecurity or system monitoring.

At its core, the program allows users to upload any text-based log or CSV file and search for patterns such as login failures, unauthorized access attempts, or even traces of backdoor activity. To make this more intuitive, it offers a graphical interface (using dropdowns and buttons) to select predefined patterns or even enter custom search patterns. These patterns are defined using regular expressions, which are powerful tools for identifying specific types of text in large files.

Once a file is uploaded, the tool shows a sample of a few lines from it, allowing users to preview the data and test their search patterns before scanning the full file. This helps reduce errors and gives users confidence in what they are about to analyze.

**There are two main options for analysis:**

**Test Patterns on Sample Lines** – This quickly checks if the search patterns work correctly on a few example lines.

**Scan Full File** – This runs a full scan of the uploaded file and alerts the user if any of the selected patterns are detected in any line.

If a match is found during the full scan, the system issues an alert and prints the specific log line where the match occurred. This is particularly helpful for identifying and responding to possible security events in system logs.

Overall, this tool offers a practical introduction to log analysis using Python. It provides an interactive environment that makes it easy to explore, test, and apply regular expressions to real-world data without requiring advanced programming knowledge. The flexibility of customizing search patterns also means it can be adapted to various use cases and log formats, from security monitoring to basic data exploration.