# Huynh Gia Phong Tat (Jayden)


# Lab 9: Build a Log Aggregator

In this lab, you will create your own log generator, build a command-line utility that scans log files, summarizes their contents, and provides insight into system behavior. Data structures to track log message levels such as `INFO`, `WARNING`, `ERROR`, and `CRITICAL`.

This lab reinforces:
- File I/O
- Pattern recognition (regex)
- Dictionaries and counters
- Functions and modularity
- Optional: CLI arguments, logging



## Part 1: Create Log files (20%)
Using the the following example log format below create a **python file** that will log errors In a structured tree format 

You will find examples in the folder called Logs that you can use to build your program.

Remember set of logs should have a varied levels of log entries (`INFO`, `WARNING`, `ERROR`, `CRITICAL`) and tailored message types for different service components.
You must create 5 structured logs here are some examples:

    sqldb
    ui
    frontend.js
    backend.js
    frontend.flask
    backend.flask

You may use chat GPT to create sample outputs NOT THE LOGS. IE:

    System failure
    Database corruption
    Disk failure detected
    Database corruption


### How I approach this

I wanted my logs to look **professional**, kind of like **system logs** I’ve seen in real tools (provided by you).  
I started by creating a **formatter** to match the exact style I was given:  
`MM/DD + padded level + source + message`.

Then, instead of making everything a **giant function**,  
I made a **class** so I could reuse the logger setup for any component I needed.  
I like how clean it looks now — I just pass in the `component` and `function` names,  
and it logs exactly how I want.

I also made sure everything uses `snake_case` because I’m trying to be **consistent** 

In [3]:
import logging
import os
from datetime import datetime

# Trying to match the RSVP Agent log format
# Starting with a custom formatter class
class sample_formatter(logging.Formatter):
    def format(self, record):
        # Format the timestamp as MM/DD HH:MM:SS
        timestamp = datetime.fromtimestamp(record.created).strftime("%m/%d %H:%M:%S")

        # Pad the log level so it's always aligned
        level = record.levelname.ljust(7)

        # Build the source field with the logger name
        source = f":{record.name}:"

        # Combine all pieces into the final log line
        return f"{timestamp} {level}: {source} {record.getMessage()}"


# Creating a structured logger that uses the formatter above
class structured_logger:

    def __init__(self, component_name, function_name):
        # Passing in the component and function names, just going to hold onto these for now
        self.component_name = component_name
        self.function_name = function_name

        # Logger name needs to be unique, using both names here to make it work
        self.logger = logging.getLogger(f"{component_name}:{function_name}")

        # Going with DEBUG for now, I think that gives me all the log levels
        self.logger.setLevel(logging.DEBUG)

        # Had to separate this into its own method to keep things cleaner
        self._setup_handler()


    def _setup_handler(self):
        # Needed a folder to hold the logs so they don’t end up everywhere
        os.makedirs("Logs", exist_ok=True)

        # One log file per component seemed like the easiest way to organize it
        log_file_path = f"Logs/{self.component_name}.Log"

        # I kept running into duplicate handlers so this check avoids that
        if not self.logger.handlers:
            # This part took a while to figure out — I originally had the path wrong
            file_handler = logging.FileHandler(log_file_path)

            # I finally got the formatter to apply correctly after tweaking this line
            formatter = sample_formatter()
            file_handler.setFormatter(formatter)

            # I remember forgetting to attach this the first time — nothing was logging
            self.logger.addHandler(file_handler)


    def log(self, level, message):
        # Tried a bunch of ways to get the right log method — this one worked best
        log_method = getattr(self.logger, level.lower(), None)

        # Without this check, I was getting errors when passing unknown levels
        if log_method:
            log_method(message)

        # I thought about adding a fallback print but this felt cleaner for now



### Example Log Format

You will work with logs that follow this simplified structure:

```
2025-04-11 23:20:36,913 | my_app | INFO | Request completed
2025-04-11 23:20:36,914 | my_app.utils | ERROR | Unhandled exception
2025-04-11 23:20:36,914 | my_app.utils.db | CRITICAL | Disk failure detected
```


## Part 2: Logging the Log File (40%)
    New File
### Part 2a: Read the Log File (see lab 7) (10%)


Write a function to read the contents of a log file into a list of lines. Handle file errors gracefully.

### Part 2b: Parse Log Lines (see code below if you get stuck) (10%)

Use a regular expression to extract:
- Timestamp
- Log name
- Log level
- Message

### Part 2c: Count Log Levels (20%)

Create a function to count how many times each log level appears. Store the results in a dictionary. Then output it as a Json File
You may pick your own format but here is an example. 
```python
{
    "INFO": 
    {
        "Request completed": 42, 
        "Heartbeat OK": 7
    }

    "WARNING":
    {
        ...
    }
}

```


In [None]:
# Paste your python file here don't for get to upload it with your submission
import os
import re
import json
from collections import defaultdict
from structured_logger import sample_formatter  # using the custom formatter I wrote earlier

# Use this from project 1 to read the log file
# I had to change the import path a bit to match my folder structure
def read_log_file(filepath):
    try:
        with open(filepath, 'r', encoding='utf-8') as file:
            return file.readlines()
    except FileNotFoundError:
        print(f"File not found: {filepath}")
        return []
    except Exception as e:
        print(f"Couldn't read the file: {e}")
        return []


def parse_log_lines(log_lines):
    parsed_logs = []
# This regex is a bit of a mess but it works for now. I might clean it up later.
# I had to change the regex a bit to match the log format
    for line in log_lines:
        if len(line.strip()) < 20:
            continue
        timestamp = line[:17].strip()
        remainder = line[17:].strip()
        if ':' not in remainder:
            continue
        level_part, rest = remainder.split(':', 1)
        level = level_part.strip()
        if ':' not in rest:
            continue
        name_part, message = rest.split(':', 1)
        name = name_part.strip()
        message = message.strip()
        parsed_logs.append({
            "timestamp": timestamp,
            "level": level,
            "name": name,
            "message": message
        })
    return parsed_logs

# This function counts the occurrences of each log level and message, and returns a summary. Took a while to get this right
# because I was trying to use a list of tuples instead of a dictionary. This way is much cleaner.
def count_log_levels(parsed_logs):
    summary = defaultdict(lambda: defaultdict(int))
    for entry in parsed_logs:
        level = entry['level']
        msg = entry['message']
        summary[level][msg] += 1
    return summary


def main():
    # This part is just to make sure the log file is in the right place
    # I had to change the path a bit to match my folder structure
    log_input_path = os.path.join("Logs", "RSVP_Agent_processing.log")
    log_output_path = os.path.join("Logs", "log_summary.json")
    print(f"Trying to read the log file from: {log_input_path}")
    lines = read_log_file(log_input_path)
    print(f"Going through {len(lines)} lines...")
    parsed = parse_log_lines(lines)
    print("Counting up the log levels and messages...")
    summary = count_log_levels(parsed)
    # This part gave me some headaches until I realized the folder had to exist
    os.makedirs("Labs/Lab_9/Logs", exist_ok=True)
    print("Checking the full path just to be sure:")
    print("   ", os.path.abspath(log_input_path))
    print(f"Saving everything to: {log_output_path}")
    with open(log_output_path, 'w', encoding='utf-8') as f:
        json.dump(summary, f, indent=4)
    print("Done! Should be good to go.")


if __name__ == "__main__":
    main()


Trying to read the log file from: Labs\Lab_9\Logs\RSVP_Agent_processing.log
File not found: Labs\Lab_9\Logs\RSVP_Agent_processing.log
Going through 0 lines...
Counting up the log levels and messages...
Checking the full path just to be sure:
    c:\Games\Github ETE 4990\homeworkfolder-Jayden-gif-dev-1\Labs\Lab_9\Labs\Lab_9\Logs\RSVP_Agent_processing.log
Saving everything to: Labs\Lab_9\Logs\log_summary.json
Done! Should be good to go.



## Step 3: Generate Summary Report (40%)
    New File
### Step 3a (20%):
 Develop a function that continuously monitors your JSON file(s) and will print a real-time summary of log activity. It should keep count of the messages grouped by log level (INFO, WARNING, ERROR, CRITICAL) and display only the critical messages. (I.e. If new data comes in the summary will change and a new critical message will be printed)
 - note: do not reprocess the entire file on each update.  

### Step 3a: Use a Matplotlib (Lecture 10) (20%)
Develop a function that continuously monitors your JSON file(s) and will graph in real-time a bar or pie plot of each of the errors.  (a graph for each log level). 
- The graph should show the distribution of log messages by level  (INFO, WARNING, ERROR, CRITICAL)  


### Critical notes:
- Your code mus use Daemon Threads (Lecture 14)
- 3a and 3b do not need to run at the same time. 


In [None]:
# Paste your python file here 
# don't forget to upload it with your submission

In [None]:
# Here is a sample regex that parses a log file and extracts relevant information. 
# you will need to modify it. Review Lecture 11
import re

def parse_log_line(line):
    pattern = r"^(.*?)\s\|\s(\w+)\s\|\s(\w+)\s\|\s(.*)$"
    match = re.match(pattern, line)
   
