Exercise 1: Log File Analysis System

Scenario

You're working at a cloud services company that needs to analyze system logs from
multiple servers. The logs are in JSON format and contain information about server
events, errors, and performance metrics.

Task

Create a system that can process and analyze these log entries to help identify
potential issues and generate reports.

In [3]:
import json

def read_json(file_path):
  
    with open(file_path, 'r') as f:
        logs = json.load(f)
    return logs


file_path = r'C:\Users\aravi\Desktop\Assessment_Python\assessments\20241219\sample-dataset-1.json'  
logs = read_json(file_path)

In [None]:
logs

1. Create a function that filters log entries to find:

All high-priority warnings using filter() and lambda
Servers with CPU usage above 80% using list comprehension

In [None]:
def filter_logs(logs):
  
    high_priority_warnings = list(filter(lambda x: x["status"] == "warning" and x["priority"] == "high", logs))
    high_cpu_servers = [x["server_id"] for x in logs if x["event_type"] == "performance" and x["metrics"]["cpu_usage"] > 80]
    return high_priority_warnings,f" Servers with high CPU usage are {high_cpu_servers}" 


2. Create a function that extracts unique server IDs using map() and set()

In [None]:
def unique_server_ids(logs):
  
    unique_server_ids = set(map(lambda x: x["server_id"], logs))
    return unique_server_ids

3. Create a function that sorts the log entries by:
Timestamp (primary key)
Priority (secondary key) Using the sorted() function with a lambda key

In [None]:
def sorted_entries(logs):
  
    sorted_logs = sorted(logs, key=lambda x: (x["timestamp"], x["priority"]))
    return sorted_logs

4. Bonus: Create a function that generates a summary report showing:
Count of events by priority
List of unique event types
Average CPU usage across all servers

In [5]:
def generate_summary_report(logs):
    priority_counts = {}
    event_types = set()
    total_cpu = 0
    valid_log_count = 0

    for entry in logs:
      if entry:
        priority = entry.get('priority', 'unknown')
        priority_counts[priority] = priority_counts.get(priority, 0) + 1
        event_types.add(entry.get('event_type'))
        cpu_usage = entry.get('metrics', {}).get('cpu_usage')
        if cpu_usage is not None:
            total_cpu += cpu_usage
            valid_log_count += 1
    
    avg_cpu_usage = total_cpu / valid_log_count if valid_log_count > 0 else 0
    
    return {
        'priority_counts': priority_counts,
        'event_types': list(event_types),
        'average_cpu_usage': avg_cpu_usage
    }


In [None]:
filter_logs(logs)

In [None]:
unique_server_ids(logs)

In [None]:
sorted_entries(logs)

In [6]:
generate_summary_report(logs)

{'priority_counts': {'high': 5, 'low': 1, 'medium': 2},
 'event_types': ['performance',
  'security',
  'network',
  'database',
  'application'],
 'average_cpu_usage': 74.2}