# Recommendation System with Python Machine Learning & AI

In this hands-on demo, we will cover the different types of recommendation systems out there, and show how to build each one in Python. 

First, we will discuss the concepts behind how recommendation systems work. 
Once we are familiar with the underlying concepts, we will talk about how to apply statistical and machine learning methods to construct our own recommenders. 

Discussion will include how to build a popularity-based recommender using the Pandas library, how to recommend similar items based on correlation, and how to deploy various machine learning algorithms to make recommendations. 

Learning Objectives:
+ Working with recommendation systems
+ Evaluating similarity based on correlation
+ Building a popularity-based recommender
+ Classification-based recommendations
+ Making a collaborative filtering system
+ Content-based recommender systems
+ Evaluating recommenders



# What is a recommendation system?

The fundamental purpose of a recommendation system is to find and recommend items that a user is most likely to be interested in.


**Collaborative Filtering:** 

Collaborative filtering systems recommend items based on how well users prefer those items over others. It's based on crowdsourced user preference data. There are two approaches of collaborative filtering, user based and item based.

Popularity based systems --> recommendation based on popularity.
not necassarily personal.


To show application of popularity based systems we are going to use Python. First we need to import libraries that we need. In Python, *library is a collection of functions and methods that allows you to perform many actions without writing your code*.

In [None]:
import pandas as pd
import numpy as np

## What is pandas:
In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

## What is numpy:
Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.


More resources can be found here:
https://colab.research.google.com/notebooks/mlcc/intro_to_pandas.ipynb

## Data

For this part we need to bring a dataset to our programming environment. We will use one from UCI Machine Learning Repository.

This dataset is hosted on:
https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data


I have already saved the dataset in my Github and I can just bring them by a shareable links.

In [None]:
url = 'https://raw.githubusercontent.com/ArashVafa/ML-AI/master/rating_final.csv'
frame = pd.read_csv(url)

We can check the data and see what we are dealing with.

In [None]:
frame.head()

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2


In [None]:
url2 = 'https://raw.githubusercontent.com/ArashVafa/ML-AI/master/chefmozcuisine.csv'
cuisine = pd.read_csv(url2)

cuisine.tail()

Unnamed: 0,placeID,Rcuisine
911,132005,Seafood
912,132004,Seafood
913,132003,International
914,132002,Seafood
915,132001,Dutch-Belgian


## Recommending based on counts

First and obvious way of recoemmnding is based on popularity. So if one restaurant has higher votes we choose that one.

Fist we re-group our data by placeID and rating, and then sort them in ascending order of rating, then we just check first few ones.

![alt text](https://miro.medium.com/max/1169/1*drbslVSlF6M5WL1NsBdRQQ.png) 

In [None]:
rating_count = pd.DataFrame(frame.groupby('placeID')['rating'].count())

rating_count.sort_values('rating', ascending=False).head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
135085,36
132825,32
135032,28
135052,25
132834,25


Now we have ID of the restaurants and then we make another table using those IDs and then make another one that includes IDs and  correspodnig cuisine associated to that ID.

In [None]:
most_rated_places = pd.DataFrame([135085, 132825, 135032, 135052, 132834], index=np.arange(5), columns=['placeID'])

summary = pd.merge(most_rated_places, cuisine, on='placeID')
summary

Unnamed: 0,placeID,Rcuisine
0,135085,Fast_Food
1,132825,Mexican
2,135032,Cafeteria
3,135032,Contemporary
4,135052,Bar
5,135052,Bar_Pub_Brewery
6,132834,Mexican


In [None]:
cuisine['Rcuisine'].describe()

count         916
unique         59
top       Mexican
freq          239
Name: Rcuisine, dtype: object

## Segment 3 - Making Recommendations Based on Correlation

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense correlation is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a limited supply product and its price.



In these systems, you use Pearson's R correlation to recommend an item that is most similar to the item a user has already chosen. In other words, to recommend an item that has a review score that correlates with another item that a user has already chosen. Based on similarity between user ratings. Just to refresh on Pearson R, the Pearson R correlation coefficient is a measure of linear correlation between two variables, or in this case, two items ratings. The Pearson correlation coefficient is represented by the symbol R and with an R value that's close to one or negative one than you know you have a strong linear relationship between two variables. As R values get closer to zero, you know that the two variables are not linearly correlated. Correlation based recommenders use item-based similarity. That is, they recommend an item based on how well it correlates with other items with respect to user ratings. Let's look at the logic of this. Check out or mystery shopper here. Shopper D. We see that she has already chosen and reviewed the camera. She gave it a rating of four stars. Now let's see who else reviewed the camera. It looks like users A, B, and C also reviewed the camera, but now let's take a closer look. Look at the ratings each of these users gave. User A gave a four stars, user B gave four stars, and user C gave 2.5 stars. Based on correlations between user ratings, we'd say that user A's and user B's ratings are more similar to or more highly correlated with user D's ratings. Now let's look at what other items user A and user B liked. The both gave pretty good ratings t

In [None]:
url3 = 'https://raw.githubusercontent.com/ArashVafa/ML-AI/master/geoplaces2.csv'

geodata = pd.read_csv(url3, encoding = 'latin-1')
#geodata = pd.read_csv(url3)
geodata.tail()

Unnamed: 0,placeID,latitude,longitude,the_geom_meter,name,address,city,state,country,fax,zip,alcohol,smoking_area,dress_code,accessibility,price,url,Rambience,franchise,area,other_services
125,132866,22.14122,-100.931311,0101000020957F000013871838EC4A58C1B5DF74F8E396...,Chaires,Ricardo B. Anaya,San Luis Potosi,San Luis Potosi,Mexico,?,?,No_Alcohol_Served,not permitted,informal,completely,medium,?,familiar,f,closed,none
126,135072,22.149192,-101.002936,0101000020957F0000E7B79B1DB94758C1D29BC363D8AA...,Sushi Itto,Venustiano Carranza 1809 C Polanco,San Luis Potosi,SLP,Mexico,?,78220,No_Alcohol_Served,none,informal,no_accessibility,medium,sushi-itto.com.mx,familiar,f,closed,none
127,135109,18.921785,-99.23535,0101000020957F0000A6BF695F136F5AC1DADF87B20556...,Paniroles,?,?,?,?,?,?,Wine-Beer,not permitted,informal,no_accessibility,medium,?,quiet,f,closed,Internet
128,135019,18.875011,-99.159422,0101000020957F0000B49B2E5C6E785AC12F9D58435241...,Restaurant Bar Coty y Pablo,Paseo de Las Fuentes 24 Pedregal de Las Fuentes,Jiutepec,Morelos,Mexico,?,?,No_Alcohol_Served,none,informal,completely,low,?,familiar,f,closed,none
129,132877,22.135364,-100.934948,0101000020957F000090735015B84B58C1AF0DC0414698...,sirloin stockade,?,?,?,?,?,?,No_Alcohol_Served,none,informal,completely,low,?,familiar,f,closed,none


In [None]:
places =  geodata[['placeID', 'name']]
places.head()

Unnamed: 0,placeID,name
0,134999,Kiku Cuernavaca
1,132825,puesto de tacos
2,135106,El Rincón de San Francisco
3,132667,little pizza Emilio Portes Gil
4,132613,carnitas_mata


# **Grouping and Ranking Data**

In [None]:
rating = pd.DataFrame(frame.groupby('placeID')['rating'].mean())
rating.head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
132560,0.5
132561,0.75
132564,1.25
132572,1.0
132583,1.0


In [None]:
rating['rating_count'] = pd.DataFrame(frame.groupby('placeID')['rating'].count())
rating.head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
132560,0.5,4
132561,0.75,4
132564,1.25,4
132572,1.0,15
132583,1.0,4


In [None]:
rating.describe()

Unnamed: 0,rating,rating_count
count,130.0,130.0
mean,1.179622,8.930769
std,0.349354,6.124279
min,0.25,3.0
25%,1.0,5.0
50%,1.181818,7.0
75%,1.4,11.0
max,2.0,36.0


In [None]:
rating.sort_values('rating_count', ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


In [None]:
places[places['placeID']==135085]

Unnamed: 0,placeID,name
121,135085,Tortas Locas Hipocampo


In [None]:
cuisine[cuisine['placeID']==135085]

Unnamed: 0,placeID,Rcuisine
44,135085,Fast_Food


## **Preparing Data For Analysis**

In [None]:
places_crosstab = pd.pivot_table(data=frame, values='rating', index='userID', columns='placeID')
places_crosstab.head()

placeID,132560,132561,132564,132572,132583,132584,132594,132608,132609,132613,132626,132630,132654,132660,132663,132665,132667,132668,132706,132715,132717,132723,132732,132733,132740,132754,132755,132766,132767,132768,132773,132825,132830,132834,132845,132846,132847,132851,132854,132856,...,135044,135045,135046,135047,135048,135049,135050,135051,135052,135053,135054,135055,135057,135058,135059,135060,135062,135063,135064,135065,135066,135069,135070,135071,135072,135073,135074,135075,135076,135079,135080,135081,135082,135085,135086,135088,135104,135106,135108,135109
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
U1001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,1.0,,,,,,,,...,,1.0,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,
U1002,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,...,,,,,,,,,1.0,,,,,,1.0,,1.0,,,,,,,,,,,,,,,,,1.0,,,,1.0,,
U1003,,,,,,,,,,,,,,,,,,,,,,2.0,,,,2.0,2.0,,,,,2.0,,,,,,,,,...,,,,,,,,,,,,,,,2.0,,,,0.0,,,,,,,,,2.0,,2.0,2.0,,,,,,,,,
U1004,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,1.0,2.0,,,,,,,,,,,,,,,,,,,,,2.0,,
U1005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,...,,,,,,,1.0,,,,,,1.0,,,,,,,,2.0,,,,,,,,2.0,,,,,,,,,,,


In [None]:
Tortas_ratings = places_crosstab[135085]
Tortas_ratings[Tortas_ratings>=0]

userID
U1001    0.0
U1002    1.0
U1007    1.0
U1013    1.0
U1016    2.0
U1027    1.0
U1029    1.0
U1032    1.0
U1033    2.0
U1036    2.0
U1045    2.0
U1046    1.0
U1049    0.0
U1056    2.0
U1059    2.0
U1062    0.0
U1077    2.0
U1081    1.0
U1084    2.0
U1086    2.0
U1089    1.0
U1090    2.0
U1092    0.0
U1098    1.0
U1104    2.0
U1106    2.0
U1108    1.0
U1109    2.0
U1113    1.0
U1116    2.0
U1120    0.0
U1122    2.0
U1132    2.0
U1134    2.0
U1135    0.0
U1137    2.0
Name: 135085, dtype: float64

# **Evaluating Similarity Based on Correlation**

In [None]:
similar_to_Tortas = places_crosstab.corrwith(Tortas_ratings)

corr_Tortas = pd.DataFrame(similar_to_Tortas, columns=['PearsonR'])
corr_Tortas.dropna(inplace=True)
corr_Tortas.head()

  c = cov(x, y, rowvar)
  c *= np.true_divide(1, fact)


Unnamed: 0_level_0,PearsonR
placeID,Unnamed: 1_level_1
132572,-0.428571
132723,0.301511
132754,0.930261
132825,0.700745
132834,0.814823


In [None]:
Tortas_corr_summary = corr_Tortas.join(rating['rating_count'])

In [None]:
Tortas_corr_summary[Tortas_corr_summary['rating_count']>=10].sort_values('PearsonR', ascending=False).head(10)

Unnamed: 0_level_0,PearsonR,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135076,1.0,13
135085,1.0,36
135066,1.0,12
132754,0.930261,13
135045,0.912871,13
135062,0.898933,21
135028,0.892218,15
135042,0.881409,20
135046,0.867722,11
132872,0.840168,12


In [None]:
places_corr_Tortas = pd.DataFrame([135085, 132754, 135045, 135062, 135028, 135042, 135046], index = np.arange(7), columns=['placeID'])
summary = pd.merge(places_corr_Tortas, cuisine,on='placeID')
summary

Unnamed: 0,placeID,Rcuisine
0,135085,Fast_Food
1,132754,Mexican
2,135028,Mexican
3,135042,Chinese
4,135046,Fast_Food


In [None]:
places[places['placeID']==135046]

Unnamed: 0,placeID,name
42,135046,Restaurante El Reyecito


In [None]:
cuisine['Rcuisine'].describe()

count         916
unique         59
top       Mexican
freq          239
Name: Rcuisine, dtype: object

# Chapter 2 - Machine Learning Based Recommendation Systems
## Segment 1 - Classification-based Collaborative Filtering Systems
## Logistic Regression as a Classifier

In [None]:
from pandas import Series, DataFrame
from sklearn.linear_model import LogisticRegression

In [None]:
url4 = 'https://raw.githubusercontent.com/ArashVafa/ML-AI/master/bank_full_w_dummy_vars.csv'

bank_full = pd.read_csv(url4)
bank_full.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y,y_binary,housing_loan,credit_in_default,personal_loans,prev_failed_to_subscribe,prev_subscribed,job_management,job_tech,job_entrepreneur,job_bluecollar,job_unknown,job_retired,job_services,job_self_employed,job_unemployed,job_maid,job_student,married,single,divorced
0,58,management,married,tertiary,no,2143,yes,no,unknown,5,may,261,1,-1,0,unknown,no,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0
1,44,technician,single,secondary,no,29,yes,no,unknown,5,may,151,1,-1,0,unknown,no,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1
2,33,entrepreneur,married,secondary,no,2,yes,yes,unknown,5,may,76,1,-1,0,unknown,no,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0
3,47,blue-collar,married,unknown,no,1506,yes,no,unknown,5,may,92,1,-1,0,unknown,no,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0
4,33,unknown,single,unknown,no,1,no,no,unknown,5,may,198,1,-1,0,unknown,no,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1


In [None]:
bank_full.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 37 columns):
 #   Column                        Non-Null Count  Dtype 
---  ------                        --------------  ----- 
 0   age                           45211 non-null  int64 
 1   job                           45211 non-null  object
 2   marital                       45211 non-null  object
 3   education                     45211 non-null  object
 4   default                       45211 non-null  object
 5   balance                       45211 non-null  int64 
 6   housing                       45211 non-null  object
 7   loan                          45211 non-null  object
 8   contact                       45211 non-null  object
 9   day                           45211 non-null  int64 
 10  month                         45211 non-null  object
 11  duration                      45211 non-null  int64 
 12  campaign                      45211 non-null  int64 
 13  pdays           

In [None]:
#X = bank_full.loc[:,list([18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36])].values
X = bank_full[bank_full.columns[18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,3]]
y = bank_full.loc[:,17].values

IndexError: ignored

In [None]:
LogReg = LogisticRegression()
LogReg.fit(X, y)

In [None]:
new_user = [[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]]
y_pred = LogReg.predict(new_user)
y_pred

# Chapter 2 - Machine Learning Based Recommendation Systems
## Segment 2 - Model-based Collaborative Filtering Systems
## SVD Matrix Factorization

In [None]:
import sklearn
from sklearn.decomposition import TruncatedSVD

The MovieLens dataset was collected by the GroupLens Research Project at the University of Minnesota. You can download the dataset for this demostration at the following URL: https://grouplens.org/datasets/movielens/100k/

I have already uploaded datasets on Github and we can just fetch them from there.

In [None]:
columns = ['user_id', 'item_id', 'rating', 'timestamp']

url5 = 'https://raw.githubusercontent.com/ArashVafa/ML-AI/master/ml-100k/u.data'


frame = pd.read_csv(url5, sep='\t', names=columns)
frame.head()

In [None]:
columns = ['item_id', 'movie title', 'release date', 'video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
          'Animation', 'Childrens', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
          'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

url6 = 'https://raw.githubusercontent.com/ArashVafa/ML-AI/master/ml-100k/u.item'

movies = pd.read_csv(url6, sep='|', names=columns, encoding='latin-1')
movie_names = movies[['item_id', 'movie title']]
movie_names.head()

In [None]:
combined_movies_data = pd.merge(frame, movie_names, on='item_id')
combined_movies_data.head()

In [None]:
combined_movies_data.groupby('item_id')['rating'].count().sort_values(ascending=False).head()

In [None]:
filter = combined_movies_data['item_id']==50
combined_movies_data[filter]['movie title'].unique()

# Building a Utility Matrix

In [None]:
rating_crosstab = combined_movies_data.pivot_table(values='rating', index='user_id', columns='movie title', fill_value=0)
rating_crosstab.head()

# Transposing the Matrix

In [None]:
rating_crosstab.shape

In [None]:
X = rating_crosstab.T
X.shape

## Decomposing the Matrix

In [None]:
SVD = TruncatedSVD(n_components=12, random_state=17)

resultant_matrix = SVD.fit_transform(X)

resultant_matrix.shape

## Generating a Correlation Matrix

In [None]:
corr_mat = np.corrcoef(resultant_matrix)
corr_mat.shape

## Isolating Star Wars From the Correlation Matrix

In [None]:
movie_names = rating_crosstab.columns
movies_list = list(movie_names)

star_wars = movies_list.index('Star Wars (1977)')
star_wars

In [None]:
corr_star_wars = corr_mat[1398]
corr_star_wars.shape

## Recommending a Highly Correlated Movie

In [None]:
list(movie_names[(corr_star_wars<1.0) & (corr_star_wars > 0.9)])

In [None]:
list(movie_names[(corr_star_wars<1.0) & (corr_star_wars > 0.95)])

# Chapter 2 - Machine Learning Based Recommendation Systems
## Segment 3 - Content-Based Recommender Systems
## Nearest Neighbors Algorithm

In [None]:
import sklearn
from sklearn.neighbors import NearestNeighbors

mtcars dataset source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

In [None]:
url7 = 'https://raw.githubusercontent.com/ArashVafa/ML-AI/master/mtcars.csv'

cars = pd.read_csv(url7)

cars.columns = ['car_names', 'mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
cars.head()

In [None]:
t = [15, 300, 160, 3.2]

X = cars.ix[:,(1, 3, 4, 6)].values
X[0:5]

In [None]:
nbrs = NearestNeighbors(n_neighbors=1).fit(X)

In [None]:
print(nbrs.kneighbors([t]))

In [None]:
cars

# Chapter 2 - Machine Learning Based Recommendation Systems
## Segment 4 - Evaluating Recommendation Systems

In [None]:
from pandas import Series, DataFrame
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

This bank marketing dataset is open-sourced and available for download at the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Bank+Marketing#).

It was originally created by: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

In [None]:
bank_full.head()

In [None]:
bank_full.info()

In [None]:
X = bank_full.ix[:,(18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36)].values
y = bank_full.ix[:,17].values

In [None]:
LogReg = LogisticRegression()
LogReg.fit(X, y)
y_pred = LogReg.predict(X)

In [None]:
print(classification_report(y, y_pred))