# Session 52: Linear Regression in Python

**Unit 5: Basics of Predictive Analytics**
**Hour: 52**
**Mode: Practical Lab**

---

### 1. Objective

This lab focuses on training a predictive model. We will build a **Linear Regression** model using Scikit-learn to solve the same problem we tackled in Excel: predicting a customer's `TotalCharges` based on their `tenure` and `MonthlyCharges`.

### 2. Setup

We will perform a simplified data preparation workflow for this regression task.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load and clean data
url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.dropna(inplace=True) # For simplicity, we drop the few missing rows

# 1. Separate Features (X) and Target (y)
X = df[['tenure', 'MonthlyCharges']]
y = df['TotalCharges']

# 2. Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 3. The Scikit-learn Workflow: `fit` and `predict`

Training a model in Scikit-learn follows a beautiful, consistent pattern for almost every algorithm:
1.  **Initialize** the model.
2.  **`fit`** the model to the training data (`X_train`, `y_train`). This is the "learning" step.
3.  **`predict`** on new, unseen data (`X_test`). This is the "testing" step.

#### Step 1: Initialize the Model

We create an instance of the `LinearRegression` class.

In [None]:
model = LinearRegression()

#### Step 2: Fit the Model

We call the `.fit()` method, passing our training data. The model will now learn the optimal coefficients (just like Excel did).

In [None]:
model.fit(X_train, y_train)

The model is now trained!

### 4. Interpreting the Model

We can inspect the learned coefficients, just like we did in Excel.
*   The **intercept** is stored in `model.intercept_`.
*   The **coefficients** for each feature in X are stored in `model.coef_`.

In [None]:
print(f"Intercept: {model.intercept_:.2f}")
print(f"Coefficients: {model.coef_}")

# Let's make that more readable
print(f"Coefficient for 'tenure': {model.coef_[0]:.2f}")
print(f"Coefficient for 'MonthlyCharges': {model.coef_[1]:.2f}")

**The Model Equation:**
`Predicted TotalCharges = (73.43 * tenure) + (19.32 * MonthlyCharges) - 1324.75`

**Interpretation:**
*   For every one-month increase in `tenure`, the `TotalCharges` are predicted to increase by `$73.43`, holding `MonthlyCharges` constant.
*   For every one-dollar increase in `MonthlyCharges`, the `TotalCharges` are predicted to increase by `$19.32`, holding `tenure` constant.

### 5. Making Predictions

Now we use our trained model to make predictions on the `X_test` data that it has never seen before.

In [None]:
y_pred = model.predict(X_test)

Let's compare the first few predictions (`y_pred`) with the actual true values (`y_test`).

In [None]:
print("First 5 Predictions:", y_pred[:5])
print("First 5 Actual Values:", y_test[:5].values)

The predictions seem to be in the right ballpark, but they aren't perfect. How good are they really?

### 6. Conclusion

In this lab, you learned the standard Scikit-learn workflow for training a predictive model:
1.  **Initialize** the model (`LinearRegression()`).
2.  **Fit** the model to the training data (`.fit()`).
3.  **Predict** on new data (`.predict()`).
4.  Inspect the model's learned parameters (`.intercept_` and `.coef_`) to understand how it makes predictions.

We have successfully trained a model and used it to make predictions.

**Next Session:** We will learn how to formally evaluate the performance of our regression model to know how accurate our predictions are.