# 01 ‚Äî From Business Story to Data  
## Understanding the Problem Before Writing Any Code

In this notebook, we will **not train a model yet**.

Our goal is to understand:
- the business problem (customer churn)
- how this problem becomes a machine learning task
- how data represents past examples
- what are **inputs** and what is the **output**


## 1Ô∏è‚É£ The Business Problem: Customer Churn

A company provides a service to customers.

Some customers **stay**.  
Some customers **leave** (this is called **churn**).

### Business question:
> Can we predict if a customer is likely to leave?

Why this matters:
- Keeping customers is cheaper than finding new ones
- If we can predict churn early, the business can take action


## 2Ô∏è‚É£ Why Not Use Simple Rules?

Example rule:
> IF monthly charges are high AND contract is month-to-month  
> THEN customer will churn

Problem:
- Real customers are more complex
- Many factors interact together
- Hard-coded rules do not scale well

This is where **Machine Learning** helps.


## 3Ô∏è‚É£ Machine Learning View of the Problem

Machine Learning does **not guess the future magically**.

It learns from **past examples**.

We show the model:
- information about past customers
- whether they churned or not

The model learns patterns from this data.


## 4Ô∏è‚É£ Data = Past Examples

In machine learning:
- each **row** represents one past customer
- each **column** represents information about that customer

This information is called **features**.


## Load the Data

In [None]:
import pandas as pd

# Load the churn dataset
df = pd.read_csv("data/churn.csv")

# Look at the first few rows
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


## 5Ô∏è‚É£ What Does One Row Represent?

One row = **one customer in the past**

For that customer, we know:
- their contract type
- their monthly charges
- how long they stayed
- whether they churned or not

This is exactly what the model will learn from.


## 6Ô∏è‚É£ Inputs vs Output (Very Important)

In supervised machine learning, we always separate:

### üîπ Inputs (Features)
Information we use to make a prediction  
Examples:
- contract type
- monthly charges
- tenure

### üîπ Output (Target / Label)
What we want to predict  
Example:
- churn (Yes / No)

‚ö†Ô∏è The model must **never see the output as an input**.


In [5]:
# Define the output (target)
y = df["Churn"]

# Define the inputs (features)
X = df.drop(columns=["Churn"])

## 7Ô∏è‚É£ Quick Sanity Check

Before moving on, we check:
- Do inputs and output have the same number of rows?
- Does each row in X match one value in y?

If yes ‚Üí we are ready for training (next notebook).


In [6]:
print("Inputs shape:", X.shape)
print("Output shape:", y.shape)

Inputs shape: (7043, 20)
Output shape: (7043,)


In [4]:
X.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65


## ‚úÖ What We Learned in This Notebook

- How a business problem becomes a machine learning task
- Data represents **past examples**
- Each row = one customer
- Inputs (X) are different from output (y)

üö´ We did NOT train a model yet  
‚û°Ô∏è That comes next
