# Session 54: Classification in Python

**Unit 5: Basics of Predictive Analytics**
**Hour: 54**
**Mode: Practical Lab**

---

### 1. Objective

This lab focuses on training our first **classification** model. We will use **Logistic Regression**, a fundamental algorithm for binary classification, to predict customer churn based on our Telco dataset.

The process will follow the same Scikit-learn workflow we learned for Linear Regression, demonstrating the library's consistency and ease of use.

### 2. Setup

We will repeat the full data preparation workflow we designed in Session 51.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Load data
url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)

# For this example, let's use a few strong predictors
df_subset = df[['tenure', 'MonthlyCharges', 'Contract', 'Churn']].copy()
df_subset.dropna(inplace=True)

# 1. Separate Features (X) and Target (y)
X = df_subset.drop('Churn', axis=1)
y = df_subset['Churn']

# 2. One-Hot Encode Categorical Features
X_encoded = pd.get_dummies(X, columns=['Contract'], drop_first=True)

# 3. Split Data
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, random_state=42)

### 3. Training the Logistic Regression Model

We follow the exact same `initialize -> fit -> predict` pattern.

#### Step 1: Initialize the Model

We create an instance of the `LogisticRegression` class.

In [None]:
# We set max_iter=1000 to ensure the model converges, which can be an issue on some datasets.
log_model = LogisticRegression(max_iter=1000)

#### Step 2: Fit the Model

We train the model on our training data.

In [None]:
log_model.fit(X_train, y_train)

The model has now learned the relationships between tenure, monthly charges, contract type, and the likelihood of churning.

#### Step 3: Make Predictions

We use the trained model to predict the outcome for the unseen test set.

In [None]:
y_pred = log_model.predict(X_test)

Let's look at the first 10 predictions and compare them to the actual outcomes.

In [None]:
print("Predictions:", y_pred[:10])
print("Actuals:    ", y_test[:10].values)

The model seems to be making some correct and some incorrect predictions. To know how well it's doing overall, we need to evaluate it.

### 4. Conclusion

In this lab, you have successfully trained your first classification model in Python.
1.  You prepared the data for a classification task (separating X/y, one-hot encoding, splitting).
2.  You followed the standard Scikit-learn `fit`/`predict` workflow with the `LogisticRegression` model.
3.  You have generated a set of predictions on unseen data.

This demonstrates the power and consistency of the Scikit-learn library for tackling different kinds of machine learning problems.

**Next Session:** We will learn how to evaluate the performance of our classification model using the most common metric: accuracy.