<a href="https://colab.research.google.com/github/comparativechrono/Principles-of-Data-Science/blob/main/Week_2/Section_4___Python_Example__Logistic_Regression_Implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Section 4 - Python example - logistic regression implementation

Logistic regression is a widely-used statistical method for binary classification. It predicts the probability of occurrence of an event by fitting data to a logistic curve. This method is particularly useful for scenarios where you need to classify outcomes into two distinct categories. In this example, we'll demonstrate how to implement logistic regression in Python using the scikit-learn library, focusing on a binary classification problem that predicts whether a customer will make a purchase based on their age and income.

1. Setting Up the Environment:

Ensure that your Python environment is set up with the necessary libraries. If you haven't already installed scikit-learn, you can do so using pip:

In [None]:
pip install numpy pandas scikit-learn matplotlib

2. Importing Required Libraries:

Start by importing the necessary libraries. We'll use Pandas for data manipulation, NumPy for numerical operations, Matplotlib for plotting, and several modules from scikit-learn for preparing data and implementing logistic regression.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

3. Generating Synthetic Data:

For this example, we'll create a synthetic dataset where 'Age' and 'Income' predict whether a customer makes a purchase (1) or not (0).

In [None]:
# Set the seed for reproducibility
np.random.seed(42)

# Generate synthetic data
data_size = 1000
ages = np.random.randint(18, 70, data_size)
incomes = np.random.randint(30000, 100000, data_size)
purchases = np.random.binomial(1, p=(ages - 18) / (70 - 18), size=data_size)  # Probability based on age

# Create a DataFrame
df = pd.DataFrame({
    'Age': ages,
    'Income': incomes,
    'Purchase': purchases
})

# Show the first few entries
print(df.head())

4. Data Visualization:

Visualizing the data can provide insights into the relationship between features and the target variable.

In [None]:
plt.scatter(df['Age'], df['Income'], c=df['Purchase'], cmap='winter', alpha=0.5)
plt.title('Customer Data (Age vs Income)')
plt.xlabel('Age')
plt.ylabel('Income')
plt.colorbar(label='Purchase')
plt.show()

5. Preparing Data for Modelling:

Before modeling, split the data into features (X) and target (y), and then into training and testing sets.

In [None]:
# Define features and target
X = df[['Age', 'Income']]
y = df['Purchase']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

6. Building and Training the Logistic Regression Model:

Utilize scikit-learn to create and train the logistic regression model.

In [None]:
# Create a logistic regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)

7. Making Predictions and Evaluating the Model:

After training, use the model to make predictions, and then evaluate its performance using a confusion matrix and classification report.

In [None]:
# Predicting the test set results
predictions = model.predict(X_test)

# Evaluating the model
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

8. Conclusion:

This implementation showcases how logistic regression can be applied to a simple binary classification problem in Python. Through this exercise, we've seen how different age and income levels can influence the likelihood of purchases among customers. Logistic regression's output, which is probabilistic, provides a robust framework for classification and offers insights that can inform strategic business decisions.