## Step 1: Business & Strategy Layer


### Problem Framing

> _"What are we solving? Why does it matter?"_

#### Business Problem
Insurance companies suffer major financial losses due to **fraudulent claims**. Manual detection is slow, expensive, and inconsistent. We want to **build an ML system** that can detect suspicious claims **automatically and early**.

#### ML Framing

| Aspect                 | Description                                                |
|------------------------|------------------------------------------------------------|
| **Type of ML Problem** | **Classification** (Binary)                                |
| **Input**              | Structured tabular data from customer claims               |
| **Target**             | `FraudFound_P` → `1` (fraud), `0` (non-fraud)              |
| **Supervised?**        | Yes — historical data is labeled                           |
| **Real-World Impact**  | Save money, reduce risk, increase trust                    |

#### Framing Checklist
- [x] Clear business goal
- [x] Framed as ML classification problem
- [x] Labeled historical data available
- [x] Reasonable to automate


### Success Metrics Definition

> _"How do we define success?"_

#### Business Metrics
- **Amount of money saved** by catching fraud early
- **Reduction in investigation time**
- **Improved claim processing efficiency**

#### ML Metrics

| Metric       | Why it matters                                                    |
|--------------|-------------------------------------------------------------------|
| **Recall**   | Catch as many fraudulent cases as possible (minimize false negs) |
| **Precision**| Avoid wrongly flagging legit claims (minimize false positives)   |
| **F1 Score** | Balance between precision and recall                              |
| **AUC-ROC**  | Overall model discrimination power                                |



### 1.3 Stakeholder Mapping & Feedback Loops

> _"Who cares about this model? Who uses it? How do we get feedback?"_

#### Stakeholders

| Role                | Involvement                                 |
|---------------------|---------------------------------------------|
| **Data Science Team**   | Model development and evaluation         |
| **Fraud Investigators** | Use model predictions to prioritize cases|
| **Claims Managers**     | Business decision-makers                 |
| **IT/DevOps**           | Deployment and integration support       |

#### Feedback Loops

| Type               | Example                                                             |
|--------------------|---------------------------------------------------------------------|
| **Human-in-loop**  | Fraud team flags false positives/negatives, sent back to retrain    |
| **Batch retraining**| Model retrained monthly on new verified claims                    |


###Cost vs Impact Modeling

> _"Is the effort worth it? What are the tradeoffs?"_

#### Financial Impact

| Scenario                     | Estimated Cost                         |
|-----------------------------|----------------------------------------|
| False Negative (miss fraud) | ₹50,000+ per claim lost                |
| False Positive (flag legit) | ₹1,000 investigation cost              |
| Average fraud rate          | ~10%                                   |
| Model catch rate goal       | 80–90% of frauds                       |

#### Trade-Offs

- **High recall** → more frauds caught, but more false alarms
- **High precision** → fewer false alarms, but some frauds may slip

> We balance this with **cost-weighted metrics** and stakeholder feedback.

#### Tools
- ROI simulation with Python (`cost_matrix`, sensitivity analysis)
- SHAP for cost-aware interpretability
- Business simulation notebooks (optional in later phases)
