<a href="https://colab.research.google.com/github/Rami-Troudi/Sentinel/blob/main/sentinel/ai/Notebook/Sentinel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#I. Business Understanding


##  1. PROBLEM STATEMENT

### Business Context

Insurance companies, particularly **Lloyd Assurance**, face significant challenges in road safety and risk management:

- **Reactive Risk Assessment**: Insurance companies evaluate driver risk primarily after accidents occur, leading to inaccurate premium pricing and increased claims costs.

- **Limited Behavioral Data**: Traditional insurance models rely on static factors (age, vehicle type, location) rather than actual driving behavior, resulting in poor risk segmentation.

- **High Accident Rates**: Tunisia experiences high road accident rates with limited preventive measures. Insurance companies lack tools to incentivize and reward safe driving behavior.

- **Lack of Engagement**: Customers have no visibility into their driving performance and no motivation to improve, leading to continued risky behavior and higher claims.

- **Geographic Risk Gaps**: Insurance companies cannot identify high-risk road zones dynamically, missing opportunities for targeted interventions and pricing adjustments.

### Problem Definition

**How can we leverage IoT sensor data and artificial intelligence to proactively assess driver behavior, predict risk levels, identify dangerous zones, and incentivize safer driving through personalized feedback and rewards?**

### Impact

- **For Drivers**: Improved safety awareness, reduced accident risk, financial rewards for good behavior
- **For Lloyd Assurance**: Reduced claims, better risk pricing, enhanced customer loyalty, competitive differentiation
- **For Society**: Fewer accidents, reduced fatalities, lower healthcare costs

### Success Criteria

- Accurately classify drivers into risk categories (Safe/Average/Risky) with >85% confidence
- Generate actionable Safety & Eco scores that correlate with real driving risk
- Identify high-risk geographic zones with precision
- Provide personalized recommendations that drivers can act upon

---

##  2. BUSINESS OBJECTIVES (BO)

### BO1: Risk-Based Driver Segmentation

**Objective:** Enable Lloyd Assurance to segment their customer base by actual driving risk rather than demographic proxies.

**Business Value:**
- More accurate premium pricing
- Reduced adverse selection
- 15-20% potential reduction in claims costs

**Success Metrics:**
- Classify 100% of drivers into 3 risk categories: Safe (>80 score), Average (50-80), Risky (<50)
- Driver classification accuracy validated against historical claims data
- Risk categories show statistically significant difference in accident rates

### BO2: Proactive Accident Prevention

**Objective:** Reduce accident frequency by providing real-time feedback and incentives for safer driving behavior.

**Business Value:**
- Lower claims frequency
- Enhanced brand reputation as safety-focused insurer
- Contribution to CSR (Corporate Social Responsibility) goals

**Success Metrics:**
- Generate personalized safety recommendations for 100% of drivers
- Identify and alert drivers about their top 3 risky behaviors
- Track behavior improvement over time (not in MVP, but planned)

### BO3: Geographic Risk Intelligence

**Objective:** Create dynamic risk maps identifying dangerous road zones to inform pricing, coverage decisions, and public safety initiatives.

**Business Value:**
- Geographic pricing optimization
- Partnership opportunities with municipalities
- Risk-based coverage adjustments

**Success Metrics:**
- Generate heatmaps identifying top 10% most dangerous zones
- Correlate zones with accident frequency
- Provide zone-specific risk scores

### BO4: Customer Engagement & Retention

**Objective:** Increase customer loyalty through gamification (eco-points) and transparent performance tracking.

**Business Value:**
- Reduced customer churn
- Premium product differentiation
- Positive word-of-mouth marketing

**Success Metrics:**
- Safety Score and Eco Score displayed in real-time in mobile app
- Eco-points awarded based on performance
- Customer satisfaction increase (measured post-launch)

---

##  3. DATA SCIENCE OBJECTIVES (DSO)

### DSO1: Driving Behavior Scoring System

**Technical Objective:** Develop ML/rule-based models to calculate Safety Score (0-100) and Eco Score (0-100) from sensor data.

**Input Data:**
- Accelerometer readings (ax, ay, az) from IEEE & Mendeley datasets
- Gyroscope readings (gx, gy, gz) from Mendeley dataset
- Speed, GPS coordinates, timestamps from IEEE dataset
- Labeled harsh events (braking, acceleration, turns)

**Approach:**

1. **Feature Engineering:**
   - Extract harsh braking events (ax < -0.4g threshold)
   - Extract harsh acceleration events (ax > 0.3g threshold)
   - Calculate sharp turns (gyroscope magnitude > threshold)
   - Compute speed statistics (mean, variance, max)
   - Calculate jerk (rate of acceleration change)
   - Aggregate features per trip

2. **Scoring Algorithm:**
   - **Safety Score:** Rule-based weighted penalty system
     ```
     Safety = 100 - (harsh_brake*5 + harsh_accel*3 + sharp_turns*2 + speeding*4)
     ```
   - **Eco Score:** Smoothness-based calculation
     ```
     Eco = 100 - (harsh_accel*8 + harsh_brake*6 + speed_variance*2 + idle_time*1)
     ```

3. **Validation:**
   - Compare scores against labeled driving styles (aggressive/normal)
   - Ensure score distribution makes intuitive sense (bell curve expected)
   - Test edge cases (perfect driver = 100, very aggressive = <20)

**Deliverables:**
- `calculate_safety_score()` function
- `calculate_eco_score()` function
- Feature extraction pipeline
- Trained model saved as `sentinel_scoring_baseline.pkl`

**Success Criteria:**
- Scores range properly between 0-100
- Aggressive drivers score <50, normal drivers 50-80, safe drivers >80
- Model inference time <500ms per trip

### DSO2: Driver Risk Classification

**Technical Objective:** Classify drivers into 3 risk categories using unsupervised or supervised learning.

**Approach:**

1. **Feature Set:**
   - Aggregated scores (Safety Score, Eco Score)
   - Event frequencies (harsh braking count, acceleration count)
   - Trip statistics (avg speed, max speed, duration)

2. **Classification Method:**
   - **Option A (Simple):** Threshold-based on Safety Score
     - Safe: Score ≥ 80
     - Average: 50 ≤ Score < 80
     - Risky: Score < 50
   
   - **Option B (ML):** K-Means clustering (k=3) on normalized features

3. **Output:**
   - Driver category label
   - Confidence score
   - Category statistics (% in each category)

**Deliverables:**
- `classify_driver()` function
- Classification model (if ML approach)
- Driver segmentation report

**Success Criteria:**
- Clear separation between clusters (silhouette score >0.5 if using K-Means)
- Categories align with business intuition
- 100% of drivers assigned to exactly one category

### DSO3: Geographic Risk Zone Detection

**Technical Objective:** Identify dangerous road zones by clustering GPS coordinates with high incident frequencies.

**Approach:**

1. **Data Preparation:**
   - Extract GPS coordinates from IEEE dataset
   - Link coordinates to harsh events (braking, acceleration, turns)

2. **Zone Risk Analysis:**
   - Grid-based aggregation (divide area into grid cells)
   - Count harsh events per grid cell
   - Calculate risk score per zone = (harsh_events / total_trips) * 100

3. **Visualization:**
   - Generate heatmap overlay on map
   - Identify top 10% most dangerous zones
   - Export zone risk data for dashboard

**Deliverables:**
- `detect_risk_zones()` function
- GeoJSON/CSV file with zone coordinates and risk scores
- Heatmap visualization (using Folium or Plotly)

**Success Criteria:**
- Minimum 50 unique zones identified
- Risk scores show clear variation (not uniform)
- High-risk zones visually identifiable on map

### DSO4: Personalized Recommendation Engine

**Technical Objective:** Generate actionable safety tips based on individual driver behavior patterns.


**Prioritization:**
- Rank issues by severity (safety > eco)
- Limit to top 3-5 recommendations per driver
- Track if driver improves on specific metrics

**Deliverables:**
- `generate_recommendations()` function
- Recommendation templates library
- Integration spec for mobile app

**Success Criteria:**
- Every driver receives 1-5 personalized tips
- Recommendations are specific (include numbers/metrics)
- Tips are actionable (tell driver what to do)


#II. Data Understading