## Classification-based Collaborative Filtering Systems using Logistic Regression 

__Introduction:__
In this type of recommender system, recommendations are made based on similarites between diffrent users. For example, suppose you are bank and a new customer has applied for a loan, the new customer don't have credit history and you want to know if she/he will pay back the loan. Colaborative filtering recommendation systems help us in this types of scenarios.
The dataset is open-sourced and available for download at the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Bank+Marketing#).
    

In [16]:
# importing libraries
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

In [17]:
# importing the data set. It's important to mention that the dataset is not it's original format. It includes some dummy variables that are created for this project
bank_full = pd.read_csv('..Datasets//bank_full_w_dummy_vars.csv')
bank_full.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,...,job_unknown,job_retired,job_services,job_self_employed,job_unemployed,job_maid,job_student,married,single,divorced
0,58,management,married,tertiary,no,2143,yes,no,unknown,5,...,0,0,0,0,0,0,0,1,0,0
1,44,technician,single,secondary,no,29,yes,no,unknown,5,...,0,0,0,0,0,0,0,0,1,1
2,33,entrepreneur,married,secondary,no,2,yes,yes,unknown,5,...,0,0,0,0,0,0,0,1,0,0
3,47,blue-collar,married,unknown,no,1506,yes,no,unknown,5,...,0,0,0,0,0,0,0,1,0,0
4,33,unknown,single,unknown,no,1,no,no,unknown,5,...,1,0,0,0,0,0,0,0,1,1


In [18]:
bank_full.info() # Understanding the dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 37 columns):
age                             45211 non-null int64
job                             45211 non-null object
marital                         45211 non-null object
education                       45211 non-null object
default                         45211 non-null object
balance                         45211 non-null int64
housing                         45211 non-null object
loan                            45211 non-null object
contact                         45211 non-null object
day                             45211 non-null int64
month                           45211 non-null object
duration                        45211 non-null int64
campaign                        45211 non-null int64
pdays                           45211 non-null int64
previous                        45211 non-null int64
poutcome                        45211 non-null object
y                               45

__y_binary__ column is the target column and we are going to be using it to train our model and predict whether new users, will subscribe based on their user attributes. 
We're going to train our model on the last 19 binary variables. These variables describe a persons housing loan status, their loan default status, their previous responses to past marketing campaigns, in other words whether they subscribed or did not subscribe in the past. It also describes their employment industry and their relationship status.

In [19]:
X = bank_full.iloc[:,[18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36]].values

# target variable
y = bank_full.iloc[:,17].values

We are ready to train our model

In [20]:
# instantiating logistic regression
LogReg = LogisticRegression()
LogReg.fit(X, y)
y_pred = LogReg.predict(X)

### Evaluating our model
To evaluate the performance of our model we're going to use classification report function from sklearn.

In [21]:
# print our classificatin report function
print(classification_report(y, y_pred))

              precision    recall  f1-score   support

           0       0.90      0.99      0.94     39922
           1       0.67      0.17      0.27      5289

    accuracy                           0.89     45211
   macro avg       0.79      0.58      0.61     45211
weighted avg       0.87      0.89      0.86     45211



### Predicing a new customer

Now, we want test our model with a new customer. We have some information about the new customer so we created a list of values for diffrent attributes that is 1 for attributes that are positive i,e present and 0 for negative or not present. Based of what we know about the new customer we want to predict a label, in other words what we are really predicting is whether he/she will accept the offer if we marketed to.

In [13]:
new_user = [[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]]
y_pred = LogReg.predict(new_user)
y_pred

array([0], dtype=int64)

We have got a label of 0 for the new customer which means that she/he will not accept the offer.

<center> THE END <center/>