Skip to content

Creating a logistic regression model using python on bank data, to find out if the customer has subscribed to a specific plan or not

Notifications You must be signed in to change notification settings

Lakshya-Ag/Bank-Marketing-Logistic-Regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Bank-Marketing

Creating a logistic regression model using python on a bank data, to find out if the customer have subscribed to a specific plan or not.

Problem Statement

The data is related to direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

In this project, you need to build a model for deciding whether a campaign will be successful in getting a client to sign up for the term deposits.

Dataset

The dataset is in the form of a csv file and the link to download is given below: https://cdn.upgrad.com/UpGrad/temp/e4993de3-06a6-4c7d-b12f-774ce36b592e/bank.csv

Data Description

Bank client data

  • age (numeric)
  • job: type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
  • marital: marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
  • education: (categorical "unknown","secondary","primary","tertiary")
  • default: has credit in default? (binary: "yes","no")
  • balance: average yearly balance, in euros (numeric)
  • housing: has housing loan? (binary: "yes","no")
  • loan: has personal loan? (binary: "yes","no")

Data related to the last contact of the current campaign

  • contact: contact communication type (categorical: "unknown","telephone","cellular")
  • day: last contact day of the month (numeric)
  • month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
  • duration: last contact duration, in seconds (numeric)

Other attributes:

  • campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
  • pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
  • previous: number of contacts performed before this campaign and for this client (numeric)
  • poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')

Output variable (desired target):

  • y: has the client subscribed a term deposit? (binary: "yes","no")

Objectives

You are required to prepare a well-commented an interactive python notebook as your solution to this problem statement. The notebook must meet the following objectives:

  • Clean the data and drop useless columns.
  • Make an EDA report, i.e., perform a univariate and bivariate analysis. Also, derive new features based on the given features, remove outliers and correlated variables if necessary.
  • Visualize the distributions of various features and correlations between them.
  • Perform feature engineering to extract the correct features for the model.
  • Build a logistic regression model
  • Evaluate the model used.

Model Evaluation

When you're done with the model building and residual analysis and have made predictions on the test set, just make sure you use y_test and y_pred.

where y_test is the test data set for the target variable, and y_pred is the variable containing the predicted values of the target variable on the test set. Also, remember if the VIF for the selected features is not coming high always check the p-values of the variables before applying the model on test data.