Skip to content

This Python project analyzes system performance data (CPU, RAM, disk, swap, SNMP time) to detect and process anomalies. It implements several smoothing and data cleaning techniques, then generates correlation reports between different metrics.

Notifications You must be signed in to change notification settings

Acegenesis/System-Data-Analysis---Anomaly-Detection

Repository files navigation

πŸ“Š System Data Analysis - Anomaly Detection

Python Status Platform

Data Analysis Anomaly Detection Visualization Reports

🎯 Project Description

This Python project analyzes system performance data (CPU, RAM, disk, swap, SNMP time) to detect and process anomalies. It implements several smoothing and data cleaning techniques, then generates correlation reports between different metrics.

πŸ“ Project Structure

ads/
β”œβ”€β”€ data/                          # Raw input data
β”‚   β”œβ”€β”€ cpu.csv                    # CPU usage (%)
β”‚   β”œβ”€β”€ diskIO.csv                 # Disk activity
β”‚   β”œβ”€β”€ ram.csv                    # RAM memory usage
β”‚   β”œβ”€β”€ swap.csv                   # Swap memory usage
β”‚   └── TimeSNMP.csv               # SNMP response time
β”œβ”€β”€ data_cleaned/                  # Cleaned data (auto-generated)
β”œβ”€β”€ data_standardized/             # Standardized data (auto-generated)
β”œβ”€β”€ reports/                       # Generated reports (auto-generated)
β”œβ”€β”€ main.py                        # Main script
β”œβ”€β”€ config.py                      # Centralized configuration
β”œβ”€β”€ logger.py                      # Logging module
β”œβ”€β”€ anomaly_detection.py           # Advanced anomaly detection
β”œβ”€β”€ visualization.py               # Advanced visualization
β”œβ”€β”€ setup.py                       # Installation script
β”œβ”€β”€ requirements.txt               # Python dependencies
└── README.md                      # This file

πŸ”§ Features

1. Data Processing

  • Automatic reading of all CSV files in the data/ folder
  • Smart date parsing (French format DD/MM/YYYY)
  • Format handling : delimiter ;, decimal ,

2. Implemented Smoothing Techniques

  • Moving average (moving_average) : window-based smoothing
  • Exponential smoothing (ewma) : decreasing weight of observations
  • Savitzky-Golay filter (savgol) : polynomial smoothing
  • Kalman filter (kalman) : optimal recursive estimation

3. Anomaly Detection and Treatment

  • IQR detection : identification of outliers
  • Capping : replacement of anomalies with IQR bounds
  • Z-score standardization : data normalization

4. Correlation Analysis

  • Cross-correlation between all metrics
  • Configurable threshold (default: Β±0.7)
  • Temporal interpolation to handle missing data

5. Visualization & Reporting

  • Professional HTML/PDF reports with executive summaries
  • Comparative plots : raw vs smoothed vs treated data
  • Problem timelines with anomaly classification
  • Statistical dashboards and performance metrics
  • Automated report generation with recommendations

πŸš€ Installation and Usage

Automatic Installation

python setup.py

Manual Installation

# 1. Install dependencies
pip install -r requirements.txt

# 2. Verify installation
python setup.py

# 3. Generate realistic data (optional)
python generate_realistic_data.py

# 4. Run basic analysis
python main.py

# 5. Generate professional reports
python generate_report.py

Configuration

Modify config.py to customize:

  • Smoothing and anomaly detection parameters
  • Correlation thresholds
  • Visualization and reporting options
  • Output directories and file formats
  • Metrics to process

Generated Outputs

  1. Cleaned files in data_cleaned/

    • Data smoothed by EWMA (Ξ±=0.2)
    • Anomalies treated by IQR capping
  2. Standardized files in data_standardized/

    • Normalized data (mean=0, std=1)
  3. Professional reports in reports/

    • HTML report with interactive visualizations
    • PDF report for professional presentation
    • Statistical charts and performance dashboards
    • Problem timeline and analysis charts
  4. Console reports :

    • Individual processing report
    • Correlation analysis between metrics
    • Detailed statistics by column
    • Advanced problem classification

πŸ“ˆ Data Format

Input Data

  • CSV format with delimiter ;
  • Date column : format DD/MM/YYYY HH:MM
  • Value columns : French format (comma decimal)
  • Frequency : measurements every 5 minutes (except TimeSNMP: ~2 minutes)

Available Metrics

  • CPU : processor usage percentage
  • RAM : RAM memory usage
  • Swap : virtual memory usage
  • DiskIO : disk input/output activity
  • TimeSNMP : SNMP query response time

πŸ” Processing Algorithms

Processing Pipeline

  1. Reading raw data
  2. EWMA smoothing (Ξ±=0.2) to reduce noise
  3. Anomaly detection by IQR method
  4. Capping of outliers
  5. Z-score standardization
  6. Saving results

Configurable Parameters

  • EWMA Alpha : 0.2 (smoothing factor)
  • IQR Threshold : 1.5 (multiplier for anomaly detection)
  • Correlation threshold : 0.7 (significant correlation)

πŸ“Š Output Example

--- Individual File Processing Report ---
Starting processing of CSV files in 'data' folder:

--- Processing: cpu.csv ---
  Data columns identified: ['% cpu']
  Processing column: '% cpu'
    Initial standard deviation: 0.5000
    EWMA smoothing (alpha=0.2) applied. Standard deviation: 0.4000
    Number of anomalies detected (IQR on smoothed): 15
    Anomalies treated by capping. Standard deviation: 0.3500
  Cleaned data saved in: data_cleaned/cpu_cleaned.csv
  Standardized data saved in: data_standardized/cpu_standardized.csv

--- Cross-File Correlation Report ---
Significant correlations (threshold: +/- 0.7):
  - Strong positive correlation (0.85) between 'cpu_% cpu' and 'ram_% ram'.
  - Strong negative correlation (-0.72) between 'cpu_% cpu' and 'TimeSNMP_time'.

πŸ› οΈ Advanced Usage

Available Scripts

python main.py                    # Basic analysis with anomaly detection
python generate_realistic_data.py # Generate synthetic server data
python analyze_problems.py        # Advanced problem classification
python generate_report.py         # Professional HTML/PDF reports
python project_summary.py         # Complete project overview

Customization

Modify config.py for:

# Anomaly detection parameters
ANOMALY_CONFIG = {
    'method': 'iqr',             # 'iqr', 'zscore', 'isolation_forest'
    'iqr_multiplier': 1.5,       # Sensitivity threshold
}

# Reporting options
VISUALIZATION_CONFIG = {
    'save_plots': True,          # Auto-save charts
    'plot_format': 'png',        # Output format
}

Extending the System

The modular architecture allows easy addition of:

  • New anomaly detection algorithms in anomaly_detection.py
  • Custom visualizations in visualization.py
  • Additional report formats in generate_report.py
  • New data generators in generate_realistic_data.py

πŸ“ Technical Notes

  • Error handling : robust to missing or corrupted files
  • Performance : optimized processing for large data volumes
  • Memory : file-by-file processing to avoid overload
  • Reproducibility : deterministic results with fixed seeds

About

This Python project analyzes system performance data (CPU, RAM, disk, swap, SNMP time) to detect and process anomalies. It implements several smoothing and data cleaning techniques, then generates correlation reports between different metrics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages