
# Lab 9: Build a Log Aggregator

In this lab, you will create your own log generator, build a command-line utility that scans log files, summarizes their contents, and provides insight into system behavior. Data structures to track log message levels such as `INFO`, `WARNING`, `ERROR`, and `CRITICAL`.

This lab reinforces:
- File I/O
- Pattern recognition (regex)
- Dictionaries and counters
- Functions and modularity
- Optional: CLI arguments, logging



## Part 1: Create Log files (20%)
Using the the following example log format below create a **python file** that will log errors In a structured tree format 

You will find examples in the folder called Logs that you can use to build your program.

Remember set of logs should have a varied levels of log entries (`INFO`, `WARNING`, `ERROR`, `CRITICAL`) and tailored message types for different service components.
You must create 5 structured logs here are some examples:

    sqldb
    ui
    frontend.js
    backend.js
    frontend.flask
    backend.flask

You may use chat GPT to create sample outputs NOT THE LOGS. IE:

    System failure
    Database corruption
    Disk failure detected
    Database corruption


In [1]:
# Paste your python file here 
# don't forget to upload it with your submission
import logging
from logging.handlers import RotatingFileHandler

#custom logger
logger = logging.getLogger(__name__)
sqldb_logger = logging.getLogger("sqldb")
frontend_logger = logging.getLogger("frontend.js")
backend_flask_logger = logging.getLogger("backend.flask")
backend_logger = logging.getLogger("backend.js")
frontend_flask_logger = logging.getLogger("frontend.flask")


#handler which controls where the log messages go
console_handler = logging.StreamHandler()
#file_handler = logging.FileHandler()
rotating_file_handler = RotatingFileHandler("app.log", maxBytes=2000)

console_handler.setLevel(logging.WARNING)
rotating_file_handler.setLevel(logging.ERROR)

logging_format = logging.Formatter("%(asctime)s, %(name)s, %(levelname)s, %(message)s")

console_handler.setFormatter(logging_format)
rotating_file_handler.setFormatter(logging_format)

for log in [sqldb_logger, frontend_logger, backend_flask_logger, backend_logger, frontend_flask_logger]:
    log.setLevel(logging.DEBUG)
    log.addHandler(console_handler)
    log.addHandler(rotating_file_handler)

# logger.warning("This is a warning")
# logger.error("This is a error")

sqldb_logger.info("Database connection successful")
frontend_logger.warning("User attempted invalid operation")
backend_flask_logger.error("Failed to fetch user data")
backend_logger.critical("System failure detected")
frontend_flask_logger.info("Homepage loaded successfully")



























2025-04-27 18:41:03,313, backend.flask, ERROR, Failed to fetch user data
2025-04-27 18:41:03,316, backend.js, CRITICAL, System failure detected



### Example Log Format

You will work with logs that follow this simplified structure:

```
2025-04-11 23:20:36,913 | my_app | INFO | Request completed
2025-04-11 23:20:36,914 | my_app.utils | ERROR | Unhandled exception
2025-04-11 23:20:36,914 | my_app.utils.db | CRITICAL | Disk failure detected
```


## Part 2: Logging the Log File (40%)
    New File
### Part 2a: Read the Log File (see lab 7) (10%)


Write a function to read the contents of a log file into a list of lines. Handle file errors gracefully.

### Part 2b: Parse Log Lines (see code below if you get stuck) (10%)

Use a regular expression to extract:
- Timestamp
- Log name
- Log level
- Message

### Part 2c: Count Log Levels (20%)

Create a function to count how many times each log level appears. Store the results in a dictionary. Then output it as a Json File
You may pick your own format but here is an example. 
```python
{
    "INFO": 
    {
        "Request completed": 42, 
        "Heartbeat OK": 7
    }

    "WARNING":
    {
        ...
    }
}

```


In [2]:
# Part 2a
def read_log_file(file_path):
    try:
        with open(file_path, "r") as file:
            lines = file.readlines()
        return lines
    except FileNotFoundError:
        print("Error: Log file not found.")
        return []
    
#You can replace the app.log with a file you want. I added the r in front of my file path so it would read my file path as normal characters and not special commands.
lines = read_log_file(r"C:\Users\loren\OneDrive\Desktop\DOOM\chocolate-doom-3.0.1-win32 (1)\INSTALL.txt")
print(lines)
    

['\n', '== Chocolate Doom installation ==\n', '\n', 'These are instructions for how to install and set up Chocolate Doom\n', 'for play.\n', '\n', '== Obtaining an IWAD file ==\n', '\n', 'To play, you need an IWAD file.  This file contains the game data\n', '(graphics, sounds, etc). The full versions of the games are\n', 'proprietary and need to be bought.  The IWAD file has one of the\n', 'following names:\n', '\n', '   doom1.wad                   (Shareware Doom)\n', '   doom.wad                    (Registered / Ultimate Doom)\n', '   doom2.wad                   (Doom 2)\n', '   tnt.wad                     (Final Doom: TNT: Evilution)\n', '   plutonia.wad                (Final Doom: Plutonia Experiment)\n', '   chex.wad                    (Chex Quest)\n', '   freedm.wad                  (FreeDM)\n', '\n', "If you don't have a copy of a commercial version, you can download\n", 'the shareware version of Doom (extract the file named doom1.wad):\n', '\n', ' * https://www.doomworld.com/idg

In [3]:
#Part 2b
import re

def split_log_line(line):
    pattern = r"(.+?), (.+?), (.+?), (.+)"
    match = re.match(pattern, line)
    if match:
        timestamp, component, level, message = match.groups()
        return timestamp, component, level, message
    else:
        return None

lines = read_log_file("app.log")
for line in lines:
    result = split_log_line(line)
    if result:
        print(result)

        

("{'timestamp': ' 2025-04-27T12:10:00Z'", "'level': 'ERROR'", "'component': 'backend.flask'", "'message': 'Failed to fetch user data'}")
("{'timestamp': ' 2025-04-27T12:15:00Z'", "'level': 'CRITICAL'", "'component': 'backend.js'", "'message': 'System failure detected'}")
('2025-04-27 11:31:32,664', 'backend.flask', 'ERROR', 'Failed to fetch user data')
('2025-04-27 11:31:32,665', 'backend.js', 'CRITICAL', 'System failure detected')
('2025-04-27 11:33:14,483', 'backend.flask', 'ERROR', 'Failed to fetch user data')
('2025-04-27 11:33:14,483', 'backend.flask', 'ERROR', 'Failed to fetch user data')
('2025-04-27 11:33:14,487', 'backend.js', 'CRITICAL', 'System failure detected')
('2025-04-27 11:33:14,487', 'backend.js', 'CRITICAL', 'System failure detected')
('2025-04-27 11:34:21,729', 'backend.flask', 'ERROR', 'Failed to fetch user data')
('2025-04-27 11:34:21,729', 'backend.flask', 'ERROR', 'Failed to fetch user data')
('2025-04-27 11:34:21,729', 'backend.flask', 'ERROR', 'Failed to fetch

In [4]:
import json

def count_log_levels(lines):
    log_counts = {}

    for line in lines:
        result = split_log_line(line)
        if result:
            timestamp, component, level, message, = result

            if level not in log_counts:
                log_counts[level] = {}

            if message not in log_counts[level]:
                log_counts[level] = {}

            if message not in log_counts[level]:
                log_counts[level][message] = 1
            else:
                log_counts[level][message] += 1
            
    return log_counts
lines = read_log_file("app.log")
counts = count_log_levels(lines)
        
with open("log_summary.json", "w") as json_file:
    json.dump(counts, json_file, indent=4)


## Step 3: Generate Summary Report (40%)
    New File
### Step 3a (20%):
 Develop a function that continuously monitors your JSON file(s) and will print a real-time summary of log activity. It should keep count of the messages grouped by log level (INFO, WARNING, ERROR, CRITICAL) and display only the critical messages. (I.e. If new data comes in the summary will change and a new critical message will be printed)
 - note: do not reprocess the entire file on each update.  

### Step 3b: Use a Matplotlib (Lecture 10) (20%)
Develop a function that continuously monitors your JSON file(s) and will graph in real-time a bar or pie plot of each of the errors.  (a graph for each log level). 
- The graph should show the distribution of log messages by level  (INFO, WARNING, ERROR, CRITICAL)  


### Critical notes:
- Your code mus use Daemon Threads (Lecture 14)
- 3a and 3b do not need to run at the same time. 


In [None]:
import json
import threading
import time

def monitor_log_summary():
    try:
        with open("log_summary.json", "r") as file:
            old_data = json.load(file)
    except FileNotFoundError:
        old_data = {}

    while True:

        try:
            with open("log_summary.json", "r") as file:
                new_data = json.load(file)
        except FileNotFoundError:
            new_data = {}
            
        if new_data != old_data:
            print("\n--- Updated Log Summary ---")
            for level in new_data:
                total = sum(new_data[level].values())
                print(f"{level}: {total}")
                
            if "CRITICAL" in new_data:
                for message in new_data["CRITICAL"]:
                    if "CRITICAL" not in old_data or message not in old_data["CRITICAL"]:
                        print(f"CRITICAL Message: {message}")

            old_data = new_data
            
        time.sleep(2)

monitor_thread = threading.Thread(target=monitor_log_summary, daemon=True)
monitor_thread.start()

while True:
    time.sleep(1)



In [None]:

import json
import threading
import time
import matplotlib.pyplot as plt

def plot_log_summary():
    try:
        with open("log_summary.json", "r") as file:
            old_data = json.load(file)
    except FileNotFoundError:
        old_data = {}

    while True:
        try:
            with open("log_summary.json", "r") as file:
                new_data = json.load(file)
        except FileNotFoundError:
            new_data = {}

        if new_data != old_data:
            plt.clf()
            fig, axs = plt.subplots(2, 2, figsize=(10, 8))
            fig.suptitle("Log Summary by Level")

            levels = ["INFO", "WARNING", "ERROR", "CRITICAL"]
            positions = [(0,0), (0,1), (1,0), (1,1)]

            for level, pos in zip(levels, positions):
                row, col = pos
                ax = axs[row][col]

                if level in new_data:
                    messages = list(new_data[level].keys())
                    counts = list(new_data[level].values())
                    ax.bar(messages, counts)
                    ax.set_title(level)
                    ax.set_ylabel("Count")
                    ax.set_xticklabels(messages, rotation=45, ha="right")
                else:
                    ax.set_title(level)
                    ax.set_xticks([])
                    ax.set_yticks([])

            plt.tight_layout()
            plt.pause(0.1)
            old_data = new_data

        time.sleep(2)

monitor_thread = threading.Thread(target=plot_log_summary, daemon=True)
monitor_thread.start()

while True:
    time.sleep(1)


