# üìä RiskRadar  
## IT Incident SLA Breach Risk ‚Äî Decision Support Project

This notebook presents the complete **RiskRadar** project.
It explains the problem, dataset, architecture, program structure,
data flow, and design decisions in a clear and practical way.

The focus is on understanding **how and why** the system works,
not just how to run the code.

## üü¶ 1) Introduction

RiskRadar is a decision-support application built to analyse IT incident data
and estimate the **risk of SLA breaches**.

In real IT service management systems, incidents are not stored as a single
clean record. Instead, each incident goes through multiple updates such as:
- reassignment to different teams
- status changes
- reopen events
- notes and system updates over time

These updates are stored as **event logs**, which makes the raw data difficult
to use directly for machine learning.

This project focuses on transforming those event logs into a usable format,
training a model to predict SLA breach risk, and presenting the results
in a dashboard that supports real operational decisions.

## üéØ 2) Problem Statement

The dataset contains multiple rows for the same incident.
Each row represents a separate update or event.

Machine learning models require:
- one row per sample
- fixed and consistent feature representation

Directly training on event-level data leads to:
- duplicated incidents
- inconsistent features
- unreliable predictions

The objectives of this project are:
1. Convert event-level logs into one row per incident
2. Predict SLA breach probability for each incident
3. Translate probability into meaningful risk levels
4. Support decisions using visualisation and explanation

## üóÇÔ∏è 3) Dataset Description

### üìÑ incident_event_log.csv

This dataset is an **event log** where:
- the same incident number appears multiple times
- each row represents an update or action
- fields such as priority, category, or assignment group may change over time

This structure reflects real-world IT systems, but it is not directly usable
for machine learning.

Therefore, the first major step in this project is to summarise
the data at the **incident level**.

## üèóÔ∏è 4) Project Structure Overview

The project is organised into small, focused files instead of one large script.

| Layer | File | Purpose |
|-----|-----|-----|
| Data | incident_event_log.csv | Raw event-level data |
| Data | data_processing.py | Builds incident-level summary table |
| Data | feature_engineering.py | Feature calculation helpers |
| Model | model_training.py | Trains and evaluates ML model |
| Model | run_train.py | Training runner script |
| Model | best_model.joblib | Saved trained model |
| Decision | decision_logic.py | Risk band and action mapping |
| Explain | llm_explainer.py | Human-readable explanation |
| App | app.py | Streamlit dashboard (main program) |

Only **app.py** is executed directly.
All other files support it.

## üîó 5) How the Programs Work Together

### High-level flow
```text
app.py
 ‚îú‚îÄ‚îÄ data_processing.py
 ‚îÇ     ‚îî‚îÄ‚îÄ feature_engineering.py
 ‚îú‚îÄ‚îÄ loads best_model.joblib
 ‚îú‚îÄ‚îÄ decision_logic.py
 ‚îú‚îÄ‚îÄ llm_explainer.py
 ‚îî‚îÄ‚îÄ renders dashboard

## üßº 6) Data Processing ‚Äî data_processing.py

This module converts raw event logs into a usable dataset.

Main responsibilities:
- load the CSV safely
- parse timestamps
- group rows by incident number
- aggregate event history into summary features

Typical features created:
- total number of events
- maximum and average reassignment count
- maximum reopen count
- stable fields such as category, priority, assignment group

This is the most critical step in the pipeline.
If summarisation is incorrect, all downstream results are affected.


## üß© 7) Feature Engineering ‚Äî feature_engineering.py

This module supports data processing by keeping feature logic consistent.

It handles:
- count calculations
- aggregation logic
- missing value handling
- consistent feature naming

Separating feature engineering ensures that:
- training and prediction use the same features
- the model behaves reliably inside the application

## ü§ñ 8) Model Training ‚Äî model_training.py

This module handles the machine learning training process.

Training steps:
1. select input features and target label (sla_breached)
2. split data into training and test sets
3. train a classification model
4. evaluate performance using standard metrics
5. save the trained model as best_model.joblib

The model outputs probabilities rather than only binary predictions,
which supports risk-based decision making.

## ‚ñ∂Ô∏è 9) Training Runner ‚Äî run_train.py

This file provides a clean way to retrain the model.

Instead of manually running training code,
the model can be retrained using a single command.

This improves reproducibility and keeps the workflow simple.

## üíæ 10) Saved Model ‚Äî best_model.joblib

This file stores the trained model.

Benefits:
- fast application startup
- no retraining required
- consistent predictions

This separation reflects real-world ML deployment practices.

## üö¶ 11) Decision Logic ‚Äî decision_logic.py

Raw probabilities such as 0.65 are not intuitive for operational teams.

This module:
- converts probability into a risk band (High / Medium / Low)
- maps each risk band to a recommended action

This step transforms model output into actionable insight.

## üìù 12) Explanation Logic ‚Äî llm_explainer.py

This module generates short explanations for each incident.

Explanations are based on:
- predicted probability
- risk band
- key incident features such as event counts and priority

The goal is to make predictions understandable
without over-interpreting the model.

## üñ•Ô∏è 13) Application Layer ‚Äî app.py

This is the main entry point of the project.

The application:
- orchestrates all other modules
- handles user interaction
- displays metrics, visuals, and tables
- allows exporting the scored dataset

All other files exist to support this application.

## üìä 14) Dashboard and Visualisation

The dashboard is designed to answer practical questions.

Key visuals include:
- risk probability distribution
- risk band counts
- top risky assignment groups
- top risky categories
- priority versus risk heatmap
- calibration plot

Each visual helps users understand patterns,
model behaviour, and operational risk.

## üìà 15) Model Evaluation

Evaluation metrics are included to avoid blind trust in predictions.

Metrics shown:
- precision
- recall
- F1 score
- confusion matrix
- threshold tuning

This helps balance catching real SLA breaches
while avoiding unnecessary alerts.

## üß† 16) Design Reasoning (Why This Architecture)

This project uses a modular design instead of a single large script.

Reasons:
- easier to maintain and debug
- easier to explain in interviews
- training and application logic are separated
- each file has a clear responsibility

This structure reflects how real ML systems are organised,
even at a small scale.

## üîó 17) Dependency Explanation

- app.py depends on all other modules
- data_processing.py depends on feature_engineering.py
- model_training.py depends on incident summary data
- decision_logic.py depends on model probability output
- llm_explainer.py depends on computed features and risk band

Removing any of these modules would break a specific part of the pipeline,
which shows that each file has a clear purpose.

## üîÑ 18) End-to-End Lifecycle Summary

1. Raw event-level data
2. Incident-level summarisation
3. Model training
4. Model saving
5. Risk scoring
6. Decision mapping
7. Explanation generation
8. Visualisation and exploration

This lifecycle is what makes the project complete,
not just technically correct.

## üéì 19) Project Level Assessment

Overall level: **Intermediate to Advanced**

The project demonstrates:
- real data engineering
- probability-based ML
- decision-support thinking
- explainable and visual output

It is suitable for academic evaluation,
interviews, and portfolio presentation.

## ‚úÖ 20) Final Conclusion

RiskRadar demonstrates a full machine learning workflow:
- raw data handling
- feature engineering
- model training and reuse
- decision logic
- explanation layer
- interactive dashboard

The focus is on building something usable and understandable,
not just training a model and stopping there.