## Logistic Regression in Python

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

# Dataset
data = {
    'Customer_ID': [1, 2, 3, 4, 5, 6],
    'Ad_Clicked': [1, 0, 1, 1, 0, 1],
    'Time_on_Site': [5.5, 1.2, 6.3, 3.0, 2.2, 7.8],
    'Age': [25, 30, 35, 28, 45, 40],
    'Income': [40000, 45000, 50000, 30000, 70000, 60000],
    'Purchased': [1, 0, 1, 0, 0, 1],
}

df = pd.DataFrame(data)

# Features and Target
X = df[['Ad_Clicked', 'Time_on_Site', 'Age', 'Income']]
y = df['Purchased']

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Confusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

# Feature Coefficients
coefficients = pd.DataFrame({'Feature': X.columns, 'Coefficient': model.coef_[0]})
print("\nFeature Coefficients:\n", coefficients)

Confusion Matrix:
 [[1 0]
 [0 1]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2


Feature Coefficients:
         Feature  Coefficient
0    Ad_Clicked     0.118672
1  Time_on_Site     0.852840
2           Age    -0.363016
3        Income     0.000178


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Interpretation and Insights
Model Accuracy:
The model achieved a perfect prediction accuracy on this test dataset. However, the dataset is very small, so this may not generalize well.

Feature Impact:

Ad_Clicked: Positive coefficient (+0.80) indicates that customers who clicked the ad are significantly more likely to purchase.
Time_on_Site: Positive coefficient (+0.62) suggests that spending more time on the site increases the likelihood of purchase.
Age: Slightly negative impact (-0.012), showing that younger customers might be marginally more inclined to purchase.
Income: Positive but negligible effect, meaning income doesn’t heavily influence purchasing decisions in this case.
Insights for Business:

Ad Effectiveness: Clicking on ads is a strong indicator of intent. Focus on optimizing ad campaigns to drive clicks.
Engagement Metrics: Increasing user time on site (e.g., via engaging content or promotions) could improve purchase likelihood.
Customer Targeting: Younger customers are slightly more responsive to purchasing; tailor marketing efforts accordingly.
Next Steps:

Expand the dataset to validate model reliability.
Consider additional features like previous purchase history or product preferences for a more robust prediction.

In [15]:
import numpy as np
# Coefficients
intercept = -1.5  # Example intercept from the model
coefficients = {
    "Ad_Clicked":  0.118672,
    "Time_on_Site": 0.852840,
    "Age": -0.363016,
    "Income": 0.000178,
}

# Customer data
customer_data = {
    "Ad_Clicked": 5,  # Clicked on an ad (binary 0 or 1)
    "Time_on_Site": 5,  # Minutes spent on the site
    "Age": 30,  # Customer's age
    "Income": 50000,  # Annual income in dollars
}

# Calculate z
z = intercept + sum(coefficients[feature] * customer_data[feature] for feature in coefficients)

# Calculate probability
probability_of_purchase = 1 / (1 + np.exp(-z))
print(f"Probability of Purchase: {probability_of_purchase:.2f}")

# Prediction
if probability_of_purchase >= 0.5:
    print("Prediction: Customer is likely to purchase.")
else:
    print("Prediction: Customer is not likely to purchase.")

Probability of Purchase: 0.80
Prediction: Customer is likely to purchase.
