**Logistic Regression for Classification with California Housing Data**


**Introduction**

Welcome back to our machine learning adventure! Today, we'll dive into logistic regression, a powerful technique for classification tasks.  We'll use the same California housing dataset, but this time our goal will be to predict whether a house is "expensive" or "affordable" based on its features.

**1. Libraries and Data Preparation**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [None]:
# Load the California housing dataset
housing = fetch_california_housing(as_frame=True)
df = housing.frame

# Define the target variable: Is the house expensive? (above median value)
median_house_value = df['MedHouseVal'].median()
df['Expensive'] = df['MedHouseVal'] > median_house_value

# Define features and target
X = df.drop(['MedHouseVal', 'Expensive'], axis=1)
y = df['Expensive']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**2. Introduction to Logistic Regression**

* **Classification:** Logistic regression is used to predict a categorical outcome (a class label). In this case, our labels are "Expensive" or "Affordable."
* **Probability:**  Logistic regression models the probability of a data point belonging to a specific class.
* **Sigmoid Function:** The core of logistic regression is the sigmoid function, which transforms any value into a probability between 0 and 1.

**3. Building the Logistic Regression Model**

In [None]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

* We increase `max_iter` to ensure convergence if needed.

**4. Making Predictions and Evaluating**

In [None]:
# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', report)

* **Accuracy:** The proportion of correctly predicted labels.
* **Confusion Matrix:** A table summarizing correct and incorrect predictions.
* **Classification Report:** Provides precision, recall, F1-score, and support for each class.

**5. Examining Feature Importance**

In [None]:
# Get the coefficients (log-odds)
coefficients = pd.DataFrame({'Feature': X.columns, 'Coefficient': model.coef_[0]})

# Sort by absolute value for magnitude of impact
coefficients = coefficients.reindex(coefficients['Coefficient'].abs().sort_values(ascending=False).index)

# Display the coefficients
print("Feature Importance (Log-Odds):\n", coefficients.to_markdown(index=False))

* **Interpretation:** Coefficients in logistic regression are log-odds. A positive coefficient indicates a feature increases the probability of being in the "Expensive" class, and vice versa.

**Visualize Feature Importance:**
(Similar code as in linear regression example)

**Student Challenges and Notes:**

* **Probability Threshold:** Discuss how changing the probability threshold for classifying as "Expensive" can affect precision and recall.
* **Class Imbalance:** If the dataset is imbalanced (e.g., far more "Affordable" houses), explore techniques like oversampling or undersampling.
* **Advanced Models:** Introduce students to other classification models like Support Vector Machines or Random Forests.

In [None]:
# Visualize feature importance
plt.figure(figsize=(12, 6))
sns.barplot(data=coefficients, x='Coefficient', y='Feature')
plt.title('Feature Importance (Absolute Coefficients - Log-Odds)')
plt.xlabel('Coefficient Value (Log-Odds)')
plt.ylabel('Feature')
plt.grid(axis='x', alpha=0.75)
plt.show()

**Explanation:**
* The code will create a bar chart that will help visualize feature weights in the Logistic regression model.
* **Data:** We use the `coefficients` DataFrame we created earlier, which contains the features and their corresponding log-odds coefficients.
* **Bar Plot:**  `sns.barplot` is used to create a horizontal bar chart.
    * **x-axis:** Shows the coefficient values (log-odds).
    * **y-axis:** Displays the feature names.
* **Title and Labels:** We set informative titles and labels for clarity.
* **Grid:**  Adds a grid to make it easier to read the values.
* **Show:** The `plt.show()` function displays the plot.

**Interpretation:**
* The length of each bar represents the magnitude of the feature's impact on the probability of a house being classified as "Expensive."
* The direction of the bar (positive or negative) indicates the direction of the effect:
    * Positive: Higher values of the feature increase the likelihood of "Expensive."
    * Negative: Higher values of the feature decrease the likelihood of "Expensive."

**Student Challenges and Notes:**
* **Understanding Log-Odds:** Discuss the concept of log-odds and how it relates to probability. Students might need a refresher on interpreting coefficients in logistic regression.
* **Comparing Features:** Encourage students to analyze the plot. Ask them: Which features are the most important predictors of whether a house is "expensive"?
* **Limitations:** Explain that feature importance in logistic regression isn't as straightforward as in linear regression due to the non-linear nature of the model.

<div class="md-recitation">
  Sources
  <ol>
  <li><a href="https://www.techladder.in/article/decision-tree-detailed-explanation">https://www.techladder.in/article/decision-tree-detailed-explanation</a></li>
  <li><a href="https://note.com/nymnkun/n/n4dd2b2e2e271?magazine_key=m533a95a20eec">https://note.com/nymnkun/n/n4dd2b2e2e271?magazine_key=m533a95a20eec</a></li>
  <li><a href="https://iq.opengenus.org/smote-for-imbalanced-dataset/">https://iq.opengenus.org/smote-for-imbalanced-dataset/</a></li>
  <li><a href="https://github.com/Abbey225/my_app">https://github.com/Abbey225/my_app</a></li>
  <li><a href="https://github.com/Gaurav0771/Ml-Tutorials">https://github.com/Gaurav0771/Ml-Tutorials</a></li>
  </ol>
</div>