<a href="https://colab.research.google.com/github/Sanidhyar10/Intro-to-Data-Science-using-python-/blob/main/Logistic_Regression_Modelling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#11: Logistic Regression Modelling

Logistic Regression is a statistical method used for predicting the probability of an event with two possible outcomes. It's commonly used in binary classification problems, where the goal is to determine which category an observation belongs to. The logistic function is employed to model the relationship between input features and the probability of the positive outcome. The model provides a probability score that can be interpreted and used for making binary decisions.

## 11.1 How to Perform Logistic Regression Using Python



In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split

# Load the bank dataset (replace 'bank.csv' with the actual path to your dataset)
bank_data = pd.read_csv('/content/bank.csv')

# For simplicity, separate the variables into predictor variables X and response variable y
X = pd.DataFrame(bank_data[['duration', 'pdays']])
X = sm.add_constant(X)
y = bank_data['deposit'].map({'yes': 1, 'no': 0})  # Convert 'deposit' to binary

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Perform logistic regression on the training set
logreg_bank = sm.Logit(y_train, X_train).fit()

# View the model results for the training set
logreg_bank.summary2()

# Create a synthetic test dataset
np.random.seed(42)  # For reproducibility
num_samples = 1000

test_data = {
    'duration': np.random.randint(1, 100, num_samples),
    'pdays': np.random.randint(1, 30, num_samples),
}

test_df = pd.DataFrame(test_data)

# Add a constant to the test dataset
X_test_synthetic = sm.add_constant(test_df)

# Predict using the logistic regression model
y_pred_synthetic = logreg_bank.predict(X_test_synthetic)

# Convert predicted probabilities to binary predictions
y_pred_binary_synthetic = (y_pred_synthetic >= 0.5).astype(int)

# View the predictions
synthetic_predictions = pd.DataFrame({'Predicted Probability': y_pred_synthetic, 'Predicted Binary': y_pred_binary_synthetic})
print(synthetic_predictions)


Optimization terminated successfully.
         Current function value: 0.540955
         Iterations 6
     Predicted Probability  Predicted Binary
0                 0.170975                 0
1                 0.214509                 0
2                 0.159231                 0
3                 0.198858                 0
4                 0.182978                 0
..                     ...               ...
995               0.170948                 0
996               0.164001                 0
997               0.156300                 0
998               0.172047                 0
999               0.162416                 0

[1000 rows x 2 columns]


## 11.2 How to Perform Poisson Regression Using Python

In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

# Load the bank dataset (replace 'bank.csv' with the actual path to your dataset)
bank_data = pd.read_csv('/content/bank.csv')

# For simplicity, let's use the 'previous' variable as the predictor and 'duration' as the response
X = pd.DataFrame(bank_data[['previous']])
X = sm.add_constant(X)
y = pd.DataFrame(bank_data['duration'])  # Assuming 'duration' is the response variable

# Convert categorical variable 'deposit' to binary
y_binary = (y > y.median()).astype(int)

# Run Poisson regression using GLM
poisreg_bank = sm.GLM(y_binary, X, family=sm.families.Poisson()).fit()

# View the model results
poisreg_bank.summary()


0,1,2,3
Dep. Variable:,duration,No. Observations:,11162.0
Model:,GLM,Df Residuals:,11160.0
Model Family:,Poisson,Df Model:,1.0
Link Function:,Log,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-9441.3
Date:,"Tue, 05 Dec 2023",Deviance:,7742.5
Time:,18:38:24,Pearson chi2:,5590.0
No. Iterations:,5,Pseudo R-squ. (CS):,9.762e-05
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-0.7001,0.014,-49.189,0.000,-0.728,-0.672
previous,0.0059,0.006,1.062,0.288,-0.005,0.017
