
# Lab 9: Build a Log Aggregator

In this lab, you will create your own log generator, build a command-line utility that scans log files, summarizes their contents, and provides insight into system behavior. Data structures to track log message levels such as `INFO`, `WARNING`, `ERROR`, and `CRITICAL`.

This lab reinforces:
- File I/O
- Pattern recognition (regex)
- Dictionaries and counters
- Functions and modularity
- CLI arguments, logging



## Part 1: Create Log files (20%)
Using the the following example log format below create a **python file** that will log errors In a structured tree format 

You will find examples in the folder called Logs that you can use to build your program.

Remember set of logs should have a varied levels of log entries (`INFO`, `WARNING`, `ERROR`, `CRITICAL`) and tailored message types for different service components.
You must create 5 structured logs here are some examples:

    sqldb
    ui
    frontend.js
    backend.js
    frontend.flask
    backend.flask

You may use chat GPT to create sample outputs NOT THE LOGS. IE:

    System failure
    Database corruption
    Disk failure detected
    Database corruption


In [1]:
# Paste your python file here 
# don't forget to upload it with your submission
import logging
import random

# Define different components and their sample messages    generated using chat
components = {
    "frontend": [
        "User navigated to homepage",
        "Loaded dashboard components",
        "User attempted invalid input on form",
        "Failed to load user profile data",
        "Frontend script crashed unexpectedly"
    ],
    "backend": [
        "Received API request for user data",
        "Slow response detected on API endpoint",
        "Failed to process payment request",
        "Backend server crash: out of memory",
        "Authentication token verified"
    ],
    "sqldb": [
        "Database connection established",
        "Query execution time exceeds threshold",
        "Failed to insert new user record",
        "Database corruption detected in users table",
        "Backup completed successfully"
    ],
    "authserver": [
        "User login request received",
        "Multiple failed login attempts detected",
        "Session creation failed",
        "Authentication service unavailable",
        "Password reset token generated"
    ],
    "system": [
        "System maintenance scheduled",
        "CPU usage exceeds 85%",
        "Disk write failure in logging system",
        "Kernel panic detected",
        "System heartbeat OK"
    ]
}

# Log levels random gen
levels = [logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL]

#  function to configure and write logs
def setup_logger(component_name):
    logger = logging.getLogger(component_name)
    logger.setLevel(logging.DEBUG)
    #handler saving the output next to the script file, you should see an output in the same folder the script is in, <<<<
    handler = logging.FileHandler(f"{component_name}.log")
    formatter = logging.Formatter('%(asctime)s | %(name)s | %(levelname)s | %(message)s')
    handler.setFormatter(formatter)
    
    logger.addHandler(handler)
    return logger

#  loggers for each component
for component, messages in components.items():
    logger = setup_logger(component)
    
    for _ in range(5):  # log entries fromatting liek the examples showed with time and all 
        message = random.choice(messages)
        level = random.choice(levels)
        
        if level == logging.INFO:
            logger.info(message)
        elif level == logging.WARNING:
            logger.warning(message)
        elif level == logging.ERROR:
            logger.error(message)
        elif level == logging.CRITICAL:
            logger.critical(message)

print("Log files created ")


Log files created 



### Example Log Format

You will work with logs that follow this simplified structure:

```
2025-04-11 23:20:36,913 | my_app | INFO | Request completed
2025-04-11 23:20:36,914 | my_app.utils | ERROR | Unhandled exception
2025-04-11 23:20:36,914 | my_app.utils.db | CRITICAL | Disk failure detected
```


In [None]:
#just one of the example out puts for authserver.log


#2025-04-27 20:28:30,376 | authserver | ERROR | Session creation failed
#2025-04-27 20:28:30,376 | authserver | WARNING | Multiple failed login attempts detected
#2025-04-27 20:28:30,376 | authserver | ERROR | Session creation failed
#2025-04-27 20:28:30,376 | authserver | ERROR | Password reset token generated
#2025-04-27 20:28:30,377 | authserver | WARNING | Multiple failed login attempts detected


## Part 2: Logging the Log File (40%)
    New File
### Part 2a: Read the Log File (see lab 7) (10%)


Write a function to read the contents of a log file into a list of lines. Handle file errors gracefully.

### Part 2b: Parse Log Lines (see code below if you get stuck) (10%)

Use a regular expression to extract:
- Timestamp
- Log name
- Log level
- Message

### Part 2c: Count Log Levels (20%)

Create a function to count how many times each log level appears. Store the results in a dictionary. Then output it as a Json File
You may pick your own format but here is an example. 
```python
{
    "INFO": 
    {
        "Request completed": 42, 
        "Heartbeat OK": 7
    }

    "WARNING":
    {
        ...
    }
}

```


In [None]:
# Paste your python file here don't for get to upload it with your submission
#Part 2 a-c allows you to select what log file to read, on this case one was  made it all 5 of them, and extracts the time info with the count too 
#ofcourse you can secify the log component by seelcting what file
import os
import re
import json

def read_log_file(filename):
    lines = []
    try:
        with open(filename, 'r') as file:
            lines = file.readlines()
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
    except IOError as e:
        print(f"An I/O error occurred: {e}")
    return lines

def parse_log_line(line):
    pattern = r'^(.*?\d{2,3}) \| (.*?) \| (.*?) \| (.*)$'
    match = re.match(pattern, line)
    if match:
        timestamp = match.group(1)
        log_name = match.group(2)
        log_level = match.group(3)
        message = match.group(4)
        return timestamp, log_name, log_level, message
    else:
        return None

def count_log_levels(lines):
    log_counts = {}

    for line in lines:
        parsed = parse_log_line(line.strip())
        if parsed:
            _, _, log_level, message = parsed

            if log_level not in log_counts:
                log_counts[log_level] = {}

            if message not in log_counts[log_level]:
                log_counts[log_level][message] = 1
            else:
                log_counts[log_level][message] += 1
    return log_counts

def save_as_json(data, filename):
    with open(filename, 'w') as json_file:
        json.dump(data, json_file, indent=4)
    print(f"✅ Results saved to {filename}")

if __name__ == "__main__":
    # Read and combine all five log files
    all_lines = []
    log_files = ['frontend.log', 'backend.log', 'sqldb.log', 'authserver.log', 'system.log']

    for file in log_files:
        lines = read_log_file(file)
        all_lines.extend(lines)

    # Count across all logs
    log_counts = count_log_levels(all_lines)

    # Save combined results to a JSON file
    save_as_json(log_counts, 'combined_log_summary.json')


✅ Results saved to Part2_combined_log_summary.json


In [None]:
# Paste your python file here 
# don't forget to upload it with your submission


## Step 3: Generate Summary Report (40%)
    New File
### Step 3a (20%):
 Develop a function that continuously monitors your JSON file(s) and will print a real-time summary of log activity. It should keep count of the messages grouped by log level (INFO, WARNING, ERROR, CRITICAL) and display only the critical messages. (I.e. If new data comes in the summary will change and a new critical message will be printed)
 - note: do not reprocess the entire file on each update.  

### Step 3a: Use a Matplotlib (Lecture 10) (20%)
Develop a function that continuously monitors your JSON file(s) and will graph in real-time a bar or pie plot of each of the errors.  (a graph for each log level). 
- The graph should show the distribution of log messages by level  (INFO, WARNING, ERROR, CRITICAL)  


### Critical notes:
- Your code mus use Daemon Threads (Lecture 14)
- 3a and 3b do not need to run at the same time. 


In [None]:
# Paste your python file here 
# don't forget to upload it with your submission

In [None]:
# Here is a sample regex that parses a log file and extracts relevant information. 
# you will need to modify it. Review Lecture 11
import re

def parse_log_line(line):
    pattern = r"^(.*?)\s\|\s(\w+)\s\|\s(\w+)\s\|\s(.*)$"
    match = re.match(pattern, line)
   
