# Building a Logistic Regression using statsmodel library

Create a logistic regression based on the bank data provided. 

The data is based on the marketing campaign efforts of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Note that the first column of the dataset is the index.

You can check the statsmodel documentation here:
https://www.statsmodels.org/stable/user-guide.html

## Import the relevant libraries

In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report


## Load the data

Load the ‘Example_bank_data.csv’ dataset.

In [5]:
data = pd.read_csv('Example-bank-data.csv')
data.head()

Unnamed: 0.1,Unnamed: 0,duration,y
0,0,117,no
1,1,274,yes
2,2,167,no
3,3,686,yes
4,4,157,no


We want to know whether the bank marketing strategy was successful, so we need to transform the outcome variable into 0s and 1s in order to perform a logistic regression.

In [16]:
#Prepare the data by separating the features (X) and the target variable (y):

X = data.iloc[:, 1:-1]  # Exclude the first column (index) and the last column (target variable)
y = data.iloc[:, -1]    # Target variable



### Declare the dependent and independent variables

In [17]:
#Split the data into training and testing sets using train_test_split():

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### Simple Logistic Regression

Run the regression and visualize it on a scatter plot (no need to plot the line).

In [18]:
#Create a logistic regression model using LogisticRegression() and fit it:
model = LogisticRegression()
model.fit(X_train, y_train)


LogisticRegression()

In [20]:
#Make predictions on the testing data using predict():
y_pred = model.predict(X_test)


In [21]:
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:\n", classification_rep)

Accuracy: 0.6923076923076923
Classification Report:
               precision    recall  f1-score   support

          no       0.70      0.81      0.75        59
         yes       0.69      0.53      0.60        45

    accuracy                           0.69       104
   macro avg       0.69      0.67      0.68       104
weighted avg       0.69      0.69      0.69       104

