Automated Log Analysis Using Python

A comprehensive Blue Team cybersecurity tool for detecting anomalies and extracting suspicious activities from system and network logs.

Created by: Abubakar and Raqib Hayat

📚 Academic Project Information

Course: CS-332 Information Security

Semester: 6 - Spring 2025

Instructor: Dr. Mudassar Raza

Project Type: Semester Project

CLO Alignment: CLO-3, CLO-4

Team Classification: Blue Team Operations Tool

Course Learning Outcomes (CLOs)

CLO-3: Apply security mechanisms and protocols to protect information systems
CLO-4: Evaluate and implement cybersecurity tools and techniques for threat detection and response

This project demonstrates practical application of cybersecurity concepts through automated log analysis, focusing on defensive security operations and threat detection methodologies taught in the Information Security curriculum.

🎯 Project Overview

The Python Log Analyzer is an automated cybersecurity tool designed to parse, analyze, and detect suspicious activities in various log formats. This tool addresses the critical need for efficient log monitoring in cybersecurity operations, where manual analysis is time-consuming and error-prone.

Blue Team Focus Areas

This tool specifically supports Blue Team defensive security operations by:

Threat Detection: Automated identification of malicious activities in log data
Incident Response: Rapid analysis capabilities for security event investigation
Continuous Monitoring: Real-time and batch processing for ongoing security surveillance
Evidence Collection: Forensic-ready reporting for incident documentation
Compliance Support: Structured analysis supporting regulatory requirements

Key Features

Multi-format Support : Handles Syslog, Apache, and custom log formats
Automated Parsing : Uses regex patterns to extract structured data from unstructured logs
Anomaly Detection : Identifies suspicious patterns like brute-force attacks and frequent errors
Threshold-based Alerting : Customizable thresholds for flagging suspicious IPs
Visual Reports : Generates both text reports and graphical charts
Command-line Interface : Easy integration into security workflows and automation scripts

🛡️ Cybersecurity Framework Integration

Blue Team Operations Alignment

This tool follows industry-standard Blue Team methodologies:

Detection Engineering: Automated pattern recognition for threat indicators
Log Analysis: Systematic examination of system and network logs
Incident Response: Rapid identification and documentation of security events
Threat Hunting: Proactive search for indicators of compromise (IOCs)
Security Operations Center (SOC) Support: Integration-ready for SOC workflows

Academic Learning Integration

The project demonstrates practical application of course concepts:

Information Security Principles: Confidentiality, Integrity, Availability through monitoring
Threat Landscape Understanding: Recognition of common attack patterns
Defense-in-Depth: Multi-layered security approach through comprehensive log monitoring
Risk Assessment: Categorization and prioritization of security threats
Security Tools Development: Hands-on experience with cybersecurity tool creation

🚀 Installation and Setup

Prerequisites

Ensure you have Python 3.6+ installed on your system:

python --version
# or
python3 --version

Required Libraries

Install the necessary Python packages:

pip install pandas matplotlib regex

Note: The regex library provides enhanced pattern matching. If unavailable, the script falls back to Python's built-in re module.

Download the Tool

Save the log_analyzer.py script to your working directory
Download the sample log files:
- syslog_failed_ssh.log
- apache_access_errors.log
Make the script executable (Linux/Mac):
```
chmod +x log_analyzer.py
```

📖 Usage Guide

Basic Command Structure

python log_analyzer.py <log_file> <log_type> [options]

Parameters

log_file (required): Path to the log file to analyze
log_type (required): Type of log file (syslog, apache, or custom)
--threshold (optional): Minimum count to flag IP as suspicious (default: 5)
--status-code (optional): Filter Apache logs by specific HTTP status code
--output (optional): Save report to specified file

Example Commands

1. Analyze Syslog for Failed SSH Logins

python log_analyzer.py syslog_failed_ssh.log syslog --threshold 5

What it does:

Parses syslog entries for "Failed password" messages
Groups failed attempts by IP address
Flags IPs with 5+ failed login attempts
Generates a security report with flagged IPs

2. Analyze Apache Logs for 404 Errors

python log_analyzer.py apache_access_errors.log apache --status-code 404 --threshold 10

What it does:

Parses Apache access logs
Filters for 404 "Not Found" errors
Identifies IPs generating 10+ 404 errors
Useful for detecting reconnaissance activities

3. General Apache Error Analysis with Report Output

python log_analyzer.py apache_access_errors.log apache --threshold 3 --output security_report.txt

What it does:

Analyzes all HTTP errors (4xx, 5xx status codes)
Flags IPs with 3+ error-causing requests
Saves detailed report to security_report.txt

4. Custom Log Analysis

python log_analyzer.py custom.log custom --threshold 2 --output custom_report.txt

What it does:

Applies keyword-based detection for custom log formats
Looks for suspicious terms like "failed", "error", "denied", "unauthorized"
Extracts IP addresses and timestamps where possible

🔍 Understanding the Output

Console Output Example

🚀 Starting Automated Log Analysis...
Target: syslog_failed_ssh.log (syslog format)
📖 Parsing syslog log file: syslog_failed_ssh.log
✅ Processed 20 lines, parsed 18 entries
🔍 Analyzing suspicious activities...
🎯 Found 17 suspicious entries out of 18 total entries
🚨 Flagged 2 IP addresses exceeding threshold of 5

============================================================
📊 AUTOMATED LOG ANALYSIS REPORT
============================================================
Log File: syslog_failed_ssh.log
Log Type: SYSLOG
Analysis Date: 2024-05-15 14:30:25
Threshold: 5

--- SUMMARY STATISTICS ---
Total Log Entries: 18
Suspicious Entries: 17
Flagged IP Addresses: 2

--- FLAGGED IP ADDRESSES (Exceeding Threshold) ---
  🚨 192.168.1.100: 7 suspicious events [HIGH RISK]
  🚨 203.0.113.50: 6 suspicious events [MEDIUM RISK]

--- RECOMMENDATIONS ---
  🔒 Consider blocking or monitoring flagged IP addresses
  📈 Investigate high-risk IPs for potential security threats
  🕐 Review logs for unusual timing patterns

Generated Files

Text Report : Detailed analysis saved to specified output file
Visualization Chart : Bar chart showing top suspicious IPs (suspicious_ips_chart_[logtype].png)

🛡️ Cybersecurity Applications

1. Incident Detection

Brute-force Attacks : Identifies repeated failed login attempts
Web Reconnaissance : Detects scanning for common vulnerabilities
System Intrusions : Flags unusual system events and errors

2. Compliance Monitoring

Audit Trails : Provides evidence of security monitoring
Regulatory Requirements : Supports PCI DSS, HIPAA, and other compliance standards
Forensic Analysis : Creates detailed records for incident investigation

3. Operational Security

Real-time Monitoring : Can be automated for continuous log analysis
Threshold Alerting : Customizable rules for different threat levels
Integration Ready : Outputs can feed into SIEM systems

🔧 Technical Details

Supported Log Formats

Syslog Format

Pattern Detection : Failed SSH logins, system errors, unauthorized access
Regex Patterns : Extracts timestamps, hostnames, process names, and messages
Example : May 15 08:32:10 server1 sshd[12345]: Failed password for root from 192.168.1.100

Apache Combined/Common Log Format

HTTP Analysis : Status codes, request methods, user agents
Error Detection : 4xx client errors, 5xx server errors
Example : 192.168.1.50 - - [15/May/2024:10:30:15 +0000] "GET /admin.php HTTP/1.1" 404 512

Custom Logs

Flexible Parsing : Keyword-based suspicious activity detection
IP Extraction : Identifies IP addresses in various log formats
Extensible : Easy to modify for specific log structures

Analysis Algorithms

Pattern Matching : Uses compiled regex patterns for efficient parsing
Statistical Analysis : Counts events by IP, time windows, and event types
Threshold-based Detection : Flags IPs exceeding configurable limits
Risk Assessment : Categorizes threats as HIGH, MEDIUM, or LOW risk

📊 Sample Data Analysis

Demo 1: SSH Brute-force Detection

Input : syslog_failed_ssh.log with multiple failed login attempts Command : python log_analyzer.py syslog_failed_ssh.log syslog --threshold 5 Result : Identifies 192.168.1.100 with 7 failed attempts and 203.0.113.50 with 6 attempts

Demo 2: Web Attack Detection

Input : apache_access_errors.log with reconnaissance attempts Command : python log_analyzer.py apache_access_errors.log apache --status-code 404 --threshold 3 Result : Flags 203.0.113.75 for excessive 404 errors (potential vulnerability scanning)

🎓 Educational Value & Learning Outcomes

Skills Demonstrated

Technical Skills:

Python programming for cybersecurity applications
Regular expression pattern matching for log parsing
Data analysis and visualization techniques
Command-line interface development
File I/O and error handling

Cybersecurity Concepts:

Log analysis methodologies
Threat detection and anomaly identification
Incident response procedures
Risk assessment and categorization
Security monitoring best practices

Blue Team Operations:

Defensive security tool development
Automated threat detection systems
Security operations center (SOC) workflows
Continuous monitoring implementation
Evidence collection and documentation

Real-world Applications

This project prepares students for careers in:

SOC Analyst: Monitoring and analyzing security events
Incident Response Specialist: Investigating and responding to security incidents
Security Engineer: Developing and implementing security tools
Cybersecurity Consultant: Assessing and improving organizational security posture

🔮 Future Enhancements

Planned Features

Machine Learning Integration : Advanced anomaly detection using ML algorithms
Real-time Processing : Live log monitoring with streaming analysis
Web Interface : GUI for easier configuration and visualization
SIEM Integration : Direct integration with Security Information and Event Management systems
Geo-IP Analysis : Location-based threat intelligence
Threat Intelligence Feeds : Integration with IOC (Indicators of Compromise) databases

Contributing

This tool is designed to be extensible. Key areas for enhancement:

Additional log format parsers
Advanced visualization options
Custom rule engines
Performance optimizations for large log files

📚 References and Resources

Official Documentation (Primary Sources)

Python Documentation : https://docs.python.org/3/
Pandas Documentation : https://pandas.pydata.org/pandas-docs/stable/
Matplotlib Documentation : https://matplotlib.org/stable/contents.html
Regular Expressions : https://regex101.com/

Cybersecurity Standards and Frameworks

NIST Cybersecurity Framework : Guidelines for log monitoring and incident response
OWASP Logging Cheat Sheet : Best practices for security logging
SANS Blue Team Operations : Defensive security methodologies
MITRE ATT&CK Framework : Threat detection and response strategies

Academic References

Course textbook and lecture materials (CS-332 Information Security)
Industry white papers on log analysis and threat detection
Research papers on automated security monitoring systems

⚠️ Important Notes

Academic Integrity

This project was developed following academic guidelines and represents original work by the project team. All external resources and references have been properly cited.

Limitations

Performance : Large log files (>1GB) may require optimization
Complex Attacks : May miss sophisticated, low-and-slow attacks
False Positives : Requires tuning for specific environments
Context : Provides detection but limited contextual analysis

Security Considerations

Log Privacy : Ensure compliance with data protection regulations
Access Control : Restrict access to log files and analysis reports
Data Retention : Follow organizational policies for log data storage
Alert Fatigue : Balance sensitivity with practicality

🆘 Troubleshooting

Common Issues

File Not Found Error

❌ Error: Log file 'logfile.log' not found.

Solution : Verify the file path and ensure read permissions

No Entries Parsed

❌ No valid log entries found. Please check the log file format.

Solution : Ensure the log type matches the actual file format

Permission Denied

❌ Error: Permission denied accessing '/var/log/secure'.

Solution : Run with appropriate permissions or copy logs to accessible location

Getting Help

For questions or issues:

Check the error messages for specific guidance
Verify log file format matches the specified type
Test with provided sample files first
Review regex patterns for custom log formats
Consult official documentation for libraries used
Contact project team or course instructor for academic support

📝 Project Submission Details

Project Team: Abu Bakar (NUM-BSCS-2022-41) and Raqib Hayat (NUM-BSCS-2022-40)

Course: CS-332 Information Security

Submission Date: 27/05/2025

Tool Version: 1.0

Last Updated: May 2024

Project Deliverables

Complete Python log analysis tool
Comprehensive documentation (this README)
Sample log files for testing
Usage examples and demonstrations
Technical architecture documentation

This tool is developed for educational purposes as part of the CS-332 Information Security course curriculum. It demonstrates practical application of cybersecurity concepts and is designed for legitimate security monitoring purposes. Always ensure compliance with applicable laws and organizational policies when analyzing log data.

Academic Disclaimer: This project follows all academic integrity guidelines and represents original work by the development team under the supervision of Dr. Mudassar Raza for the Spring 2025 semester of CS-332 Information Security.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Charts		Charts
Logs Files		Logs Files
log_analyzer.py		log_analyzer.py
readME.md		readME.md

Folders and files

Latest commit

History

Repository files navigation

Automated Log Analysis Using Python

📚 Academic Project Information

Course Learning Outcomes (CLOs)

🎯 Project Overview

Blue Team Focus Areas

Key Features

🛡️ Cybersecurity Framework Integration

Blue Team Operations Alignment

Academic Learning Integration

🚀 Installation and Setup

Prerequisites

Required Libraries

Download the Tool

📖 Usage Guide

Basic Command Structure

Parameters

Example Commands

1. Analyze Syslog for Failed SSH Logins

2. Analyze Apache Logs for 404 Errors

3. General Apache Error Analysis with Report Output

4. Custom Log Analysis

🔍 Understanding the Output

Console Output Example

Generated Files

🛡️ Cybersecurity Applications

1. Incident Detection

2. Compliance Monitoring

3. Operational Security

🔧 Technical Details

Supported Log Formats

Syslog Format

Apache Combined/Common Log Format

Custom Logs

Analysis Algorithms

📊 Sample Data Analysis

Demo 1: SSH Brute-force Detection

Demo 2: Web Attack Detection

🎓 Educational Value & Learning Outcomes

Skills Demonstrated

Real-world Applications

🔮 Future Enhancements

Planned Features

Contributing

📚 References and Resources

Official Documentation (Primary Sources)

Cybersecurity Standards and Frameworks

Academic References

⚠️ Important Notes

Academic Integrity

Limitations

Security Considerations

🆘 Troubleshooting

Common Issues

Getting Help

📝 Project Submission Details

Project Deliverables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages