## Framing the Problem

![CRISP](./images/6_page.jpg)

#### What is CRISP-DM?
CRISP-DM stands for Cross-Industry Standard Process for Data Mining. It is a cyclical process that provides a structured approach to planning, organizing, and implementing a data mining project. The process consists of six major phases:

- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment


I. Business Understanding: 
Any good project starts with a deep understanding of the customer’s needs. Data mining projects are no exception and CRISP-DM recognizes this.


- Determine business objectives: You should first “thoroughly understand, from a business perspective, what the customer/company really wants to accomplish.” (CRISP-DM Guide) and then define business success criteria.
- Assess situation: Determine resources availability, project requirements, assess risks and contingencies, and conduct a cost-benefit analysis.

II. Data Understanding:
Next is the Data Understanding phase. Adding to the foundation of Business Understanding, it drives the focus to identify, collect, and analyze the data sets that can help you accomplish the project goals. **You want to answer the Question, how can data solve this problem**. This phase also has four tasks:



- Collect initial data: Acquire the necessary data and (if necessary) load it into your analysis tool.
- Describe data: Examine the data and document its surface properties like data format, number of records, or field identities.
- Explore data: Dig deeper into the data. Query it, visualize it, and identify relationships among the data.
- Verify data quality: How clean/dirty is the data? Document any quality issues.

![DS_PROCESS](./images/7_page.jpg)

## What Does It Mean to Understand a Business Problem?
Understanding a business problem means identifying what the stakeholders care about, what decisions need to be made, and how success is measured.

For example:
- A bank wants to reduce customer churn
- An e-commerce site wants to increase recommendation click-throughs

Your role as a data scientist is to translate this need into a predictive or analytical solution.

## Framing the Problem: From Business to Data
Most business problems can be categorized into one of a few machine learning types:

| Business Question | ML Problem Type |
|------------------|------------------|
| Will a user churn? | Classification |
| How much will a customer spend? | Regression |
| Which products are similar? | Clustering |

Framing includes:
- Identifying the target variable
- Understanding features
- Defining what a row in the dataset represents

## Example: Churn Prediction
Let's simulate a telecom company that wants to predict if a customer will cancel their subscription.

**Business Goal:** Reduce churn by identifying high-risk customers.

**ML Translation:**
- Supervised classification
- Target variable: `churn` (yes/no)
- Features: `tenure`, `monthly_charges`, `contract_type`, etc.

## Defining Success Metrics
Different problems require different evaluation metrics:
- **Classification:** Accuracy, Precision, Recall, F1, AUC
- **Regression:** MAE, RMSE, R²

> For churn prediction, we care more about **recall** — we don't want to miss customers who are likely to churn.

#### Framing the Problem

The major difference between Junior and Senior Data Scientist.

Example: (Translate a business problem to a data science problem)
1. We've made changes in our organisation, can you help us identify the biggest drivers of churn.
2. We will like to make our pricing system optimal.

![framing_the_problem_1](./images/9_page.jpg)