<a href="https://colab.research.google.com/github/comparativechrono/Principles-of-Data-Science/blob/main/Week_8/Section_4_Python_Example__Regression_and_Classification_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Section 4: Python example - regression and classification models

In predictive modelling, regression and classification are two foundational techniques used to predict continuous outcomes and to classify data into categories, respectively. This section provides practical Python examples for implementing both types of models using scikit-learn, a robust library for machine learning in Python. The examples will demonstrate how to build, evaluate, and interpret a linear regression model and a logistic regression model.

1. Setting Up the Environment:

To begin, ensure that your Python environment is equipped with scikit-learn. If it's not installed, you can easily add it via pip:

In [None]:
pip install scikit-learn

2. Importing Required Libraries:

In addition to scikit-learn, import Pandas for data manipulation and matplotlib for visualization:

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, confusion_matrix, accuracy_score
import matplotlib.pyplot as plt

3. Preparing the Data:

Let's create a synthetic dataset for a regression problem and a classification problem:

In [None]:
import pandas as pd
from sklearn.datasets import make_regression, make_classification

# Regression Data
X_reg, y_reg = make_regression(n_samples=200, n_features=1, noise=15, random_state=42)
df_reg = pd.DataFrame(data=X_reg, columns=['Feature'])
df_reg['Target'] = y_reg

# Classification Data
X_clf, y_clf = make_classification(
    n_samples=200,
    n_features=2,
    n_informative=2,  # Increased to 2 to satisfy the condition
    n_redundant=0,
    n_clusters_per_class=1,  # Reduced clusters per class
    n_classes=2,
    random_state=42
)
df_clf = pd.DataFrame(data=X_clf, columns=['Feature1', 'Feature2'])
df_clf['Target'] = y_clf

# Display the first few rows of the dataframes
print(df_reg.head())
print(df_clf.head())


4. Building and Training Models:

Linear Regression:

In [None]:
# Splitting the regression data
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(df_reg[['Feature']], df_reg['Target'], test_size=0.2, random_state=42)
# Initializing and training the linear regression model
model_reg = LinearRegression()
model_reg.fit(X_train_reg, y_train_reg)

Logistic Regression:

In [None]:
# Splitting the classification data
X_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(df_clf[['Feature1', 'Feature2']], df_clf['Target'], test_size=0.2, random_state=42)
# Initializing and training the logistic regression model
model_clf = LogisticRegression()
model_clf.fit(X_train_clf, y_train_clf)

5. Evaluating the Models:

Evaluating Linear Regression:

In [None]:
# Predicting and calculating the RMSE
y_pred_reg = model_reg.predict(X_test_reg)
rmse = np.sqrt(mean_squared_error(y_test_reg, y_pred_reg))
print(f"RMSE for the Regression Model: {rmse:.2f}")
# Plotting regression results
plt.scatter(X_test_reg, y_test_reg, color='blue', label='Actual')
plt.plot(X_test_reg, y_pred_reg, color='red', label='Predicted')
plt.title('Linear Regression')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.legend()
plt.show()

Evaluating Logistic Regression:

In [None]:
# Making predictions and evaluating the model
y_pred_clf = model_clf.predict(X_test_clf)
acc = accuracy_score(y_test_clf, y_pred_clf)
print(f"Accuracy for the Logistic Regression Model: {acc:.2f}")
# Confusion Matrix
cm = confusion_matrix(y_test_clf, y_pred_clf)
print("Confusion Matrix:\n", cm)
# Visualizing classification results
plt.figure(figsize=(10, 6))
plt.scatter(X_test_clf['Feature1'], X_test_clf['Feature2'], c=y_pred_clf, cmap='coolwarm', label='Predicted Class')
plt.title('Logistic Regression Classification')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()

6. Conclusion:

These examples illustrate the fundamental steps in building and evaluating predictive models for both regression and classification tasks using scikit-learn. We have covered these in previous weeks, but hopefully practicing helps reinforce the core concept – evaluate everything!