This Python project analyzes system performance data (CPU, RAM, disk, swap, SNMP time) to detect and process anomalies. It implements several smoothing and data cleaning techniques, then generates correlation reports between different metrics.
ads/
βββ data/ # Raw input data
β βββ cpu.csv # CPU usage (%)
β βββ diskIO.csv # Disk activity
β βββ ram.csv # RAM memory usage
β βββ swap.csv # Swap memory usage
β βββ TimeSNMP.csv # SNMP response time
βββ data_cleaned/ # Cleaned data (auto-generated)
βββ data_standardized/ # Standardized data (auto-generated)
βββ reports/ # Generated reports (auto-generated)
βββ main.py # Main script
βββ config.py # Centralized configuration
βββ logger.py # Logging module
βββ anomaly_detection.py # Advanced anomaly detection
βββ visualization.py # Advanced visualization
βββ setup.py # Installation script
βββ requirements.txt # Python dependencies
βββ README.md # This file
- Automatic reading of all CSV files in the
data/folder - Smart date parsing (French format DD/MM/YYYY)
- Format handling : delimiter
;, decimal,
- Moving average (
moving_average) : window-based smoothing - Exponential smoothing (
ewma) : decreasing weight of observations - Savitzky-Golay filter (
savgol) : polynomial smoothing - Kalman filter (
kalman) : optimal recursive estimation
- IQR detection : identification of outliers
- Capping : replacement of anomalies with IQR bounds
- Z-score standardization : data normalization
- Cross-correlation between all metrics
- Configurable threshold (default: Β±0.7)
- Temporal interpolation to handle missing data
- Professional HTML/PDF reports with executive summaries
- Comparative plots : raw vs smoothed vs treated data
- Problem timelines with anomaly classification
- Statistical dashboards and performance metrics
- Automated report generation with recommendations
python setup.py# 1. Install dependencies
pip install -r requirements.txt
# 2. Verify installation
python setup.py
# 3. Generate realistic data (optional)
python generate_realistic_data.py
# 4. Run basic analysis
python main.py
# 5. Generate professional reports
python generate_report.pyModify config.py to customize:
- Smoothing and anomaly detection parameters
- Correlation thresholds
- Visualization and reporting options
- Output directories and file formats
- Metrics to process
-
Cleaned files in
data_cleaned/- Data smoothed by EWMA (Ξ±=0.2)
- Anomalies treated by IQR capping
-
Standardized files in
data_standardized/- Normalized data (mean=0, std=1)
-
Professional reports in
reports/- HTML report with interactive visualizations
- PDF report for professional presentation
- Statistical charts and performance dashboards
- Problem timeline and analysis charts
-
Console reports :
- Individual processing report
- Correlation analysis between metrics
- Detailed statistics by column
- Advanced problem classification
- CSV format with delimiter
; - Date column : format
DD/MM/YYYY HH:MM - Value columns : French format (comma decimal)
- Frequency : measurements every 5 minutes (except TimeSNMP: ~2 minutes)
- CPU : processor usage percentage
- RAM : RAM memory usage
- Swap : virtual memory usage
- DiskIO : disk input/output activity
- TimeSNMP : SNMP query response time
- Reading raw data
- EWMA smoothing (Ξ±=0.2) to reduce noise
- Anomaly detection by IQR method
- Capping of outliers
- Z-score standardization
- Saving results
- EWMA Alpha :
0.2(smoothing factor) - IQR Threshold :
1.5(multiplier for anomaly detection) - Correlation threshold :
0.7(significant correlation)
--- Individual File Processing Report ---
Starting processing of CSV files in 'data' folder:
--- Processing: cpu.csv ---
Data columns identified: ['% cpu']
Processing column: '% cpu'
Initial standard deviation: 0.5000
EWMA smoothing (alpha=0.2) applied. Standard deviation: 0.4000
Number of anomalies detected (IQR on smoothed): 15
Anomalies treated by capping. Standard deviation: 0.3500
Cleaned data saved in: data_cleaned/cpu_cleaned.csv
Standardized data saved in: data_standardized/cpu_standardized.csv
--- Cross-File Correlation Report ---
Significant correlations (threshold: +/- 0.7):
- Strong positive correlation (0.85) between 'cpu_% cpu' and 'ram_% ram'.
- Strong negative correlation (-0.72) between 'cpu_% cpu' and 'TimeSNMP_time'.
python main.py # Basic analysis with anomaly detection
python generate_realistic_data.py # Generate synthetic server data
python analyze_problems.py # Advanced problem classification
python generate_report.py # Professional HTML/PDF reports
python project_summary.py # Complete project overviewModify config.py for:
# Anomaly detection parameters
ANOMALY_CONFIG = {
'method': 'iqr', # 'iqr', 'zscore', 'isolation_forest'
'iqr_multiplier': 1.5, # Sensitivity threshold
}
# Reporting options
VISUALIZATION_CONFIG = {
'save_plots': True, # Auto-save charts
'plot_format': 'png', # Output format
}The modular architecture allows easy addition of:
- New anomaly detection algorithms in
anomaly_detection.py - Custom visualizations in
visualization.py - Additional report formats in
generate_report.py - New data generators in
generate_realistic_data.py
- Error handling : robust to missing or corrupted files
- Performance : optimized processing for large data volumes
- Memory : file-by-file processing to avoid overload
- Reproducibility : deterministic results with fixed seeds