# CyberSec Log Analyzer: CICIDS Log Parsing, Feature Extraction, and Anomaly Detection
This notebook demonstrates how to parse CICIDS logs, extract time/user features, and train an XGBoost model for anomaly detection.

In [None]:
# Install dependencies if running in a fresh environment
# !pip install pandas duckdb clickhouse-connect xgboost scikit-learn

In [None]:
# Import required modules
import pandas as pd
from parsers.cicids_parser import parse_cicids_csv
from ml.feature_extractor import extract_time_user_features
from ml.xgb_anomaly import train_xgb_classifier, predict_anomaly

## 1. Parse CICIDS Log File
Load a sample CICIDS CSV log file and inspect the data.

In [None]:
# Example: Replace with your actual file path
cicids_file = '../data/CICIDS_sample.csv'
df = parse_cicids_csv(cicids_file)
df.head()

## 2. Extract Time- and User-Based Features
Generate features such as time since last event, user event count, hour, and day of week.

In [None]:
# Assume 'timestamp' and 'user' columns exist in the parsed DataFrame
df_feat = extract_time_user_features(df, time_col='timestamp', user_col='user')
df_feat.head()

## 3. Train XGBoost Anomaly Classifier
Train an XGBoost model using the extracted features and a binary anomaly label.

In [None]:
# For demonstration, create a synthetic label if not present
df_feat['is_anomaly'] = (df_feat['user_event_count'] > 50).astype(int)
feature_cols = ['time_since_last_event', 'user_event_count', 'hour', 'dayofweek']
model, report, X_test, y_test, y_pred, y_prob = train_xgb_classifier(
    df_feat, label_col='is_anomaly', feature_cols=feature_cols)
print('Classification report:', report)

## 4. Predict Anomalies and Output Results
Apply the trained model to flag anomalies and output confidence scores.

In [None]:
flag, prob = predict_anomaly(model, df_feat, feature_cols)
df_feat['anomaly_flag'] = flag
df_feat['anomaly_score'] = prob
df_feat[['user', 'timestamp', 'anomaly_flag', 'anomaly_score']].head()

---
**End of notebook.**