These materia are from linkedin ([Building a Recommendation System with Python Machine Learning & AI](https://www.linkedin.com/learning/building-a-recommendation-system-with-python-machine-learning-ai))

# Chapter 1 - Simple Approaches to Recommender Systems

## Recommendations based on counts

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data
They were originally published by: Blanca Vargas-Govea, Juan Gabriel GonzÃ¡lez-Serna, Rafael Ponce-MedellÃ­n. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSysâ€™11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.

In [1]:
import pandas as pd
import numpy as np

In [2]:
frame = pd.read_csv('rating_final.csv')
cuisine = pd.read_csv('chefmozcuisine.csv')
geodata = pd.read_csv('geoplaces2.csv')#, encoding = 'mbcs')

In [3]:
rating_count = pd.DataFrame(frame.groupby('placeID')['rating'].count())

rating_count.sort_values('rating', ascending=False).head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
135085,36
132825,32
135032,28
135052,25
132834,25


In [4]:
most_rated_places = pd.DataFrame([135085, 132825, 135032, 135052, 132834], index=np.arange(5), columns=['placeID'])

summary = pd.merge(most_rated_places, cuisine, on='placeID')
summary

Unnamed: 0,placeID,Rcuisine
0,135085,Fast_Food
1,132825,Mexican
2,135032,Cafeteria
3,135032,Contemporary
4,135052,Bar
5,135052,Bar_Pub_Brewery
6,132834,Mexican


## Recommendations Based on Correlation

In [5]:
places_crosstab = pd.pivot_table(data=frame, values='rating', index='userID', columns='placeID')
places_crosstab.head()

placeID,132560,132561,132564,132572,132583,132584,132594,132608,132609,132613,...,135080,135081,135082,135085,135086,135088,135104,135106,135108,135109
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U1001,,,,,,,,,,,...,,,,0.0,,,,,,
U1002,,,,,,,,,,,...,,,,1.0,,,,1.0,,
U1003,,,,,,,,,,,...,2.0,,,,,,,,,
U1004,,,,,,,,,,,...,,,,,,,,2.0,,
U1005,,,,,,,,,,,...,,,,,,,,,,


In [6]:
# # Evaluating Similarity Based on Correlation
Tortas_ratings = places_crosstab[135085]
similar_to_Tortas = places_crosstab.corrwith(Tortas_ratings)

corr_Tortas = pd.DataFrame(similar_to_Tortas, columns=['PearsonR'])
corr_Tortas.dropna(inplace=True)
corr_Tortas.head()

  c *= 1. / np.float64(fact)


Unnamed: 0_level_0,PearsonR
placeID,Unnamed: 1_level_1
132572,-0.428571
132723,0.301511
132754,0.930261
132825,0.700745
132834,0.814823


# Chapter 2 -  Machine Learning Based Recommendation Systems

## Classification-based Collaborative Filtering Systems

 Logistic Regression as a Classifier
 
 This bank marketing dataset is open-sourced and available for download at the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Bank+Marketing#).

It was originally created by: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

 ```python
from sklearn.linear_model import LogisticRegression
# data
bank_full = pd.read_csv('bank_full_w_dummy_vars.csv')
X = bank_full.ix[:,(18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36)].values
y = bank_full.ix[:,17].values
# fit
LogReg = LogisticRegression()
LogReg.fit(X, y)
# predict
new_user = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
y_pred = LogReg.predict(new_user)
```


## Model-based Collaborative Filtering Systems

SVD Matrix Factorization

The MovieLens dataset was collected by the GroupLens Research Project at the University of Minnesota. You can download the dataset for this demostration at the following URL: https://grouplens.org/datasets/movielens/100k/

``` python
from sklearn.decomposition import TruncatedSVD
# data 
columns = ['user_id', 'item_id', 'rating', 'timestamp']
frame = pd.read_csv('ml-100k/u.data', sep='\t', names=columns)
columns = ['item_id', 'movie title', 'release date', 'video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
          'Animation', 'Childrens', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
          'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

movies = pd.read_csv('ml-100k/u.item', sep='|', names=columns, encoding='latin-1')
movie_names = movies[['item_id', 'movie title']]
combined_movies_data = pd.merge(frame, movie_names, on='item_id')
combined_movies_data.groupby('item_id')['rating'].count().sort_values(ascending=False).head()
rating_crosstab = combined_movies_data.pivot_table(values='rating', index='user_id', columns='movie title', fill_value=0)
# fit
X = rating_crosstab.T
SVD = TruncatedSVD(n_components=12, random_state=17)
resultant_matrix = SVD.fit_transform(X)
corr_mat = np.corrcoef(resultant_matrix)
# predict 
movie_names = rating_crosstab.columns
movies_list = list(movie_names)
star_wars = movies_list.index('Star Wars (1977)')
corr_star_wars = corr_mat[1398]
list(movie_names[(corr_star_wars<1.0) & (corr_star_wars > 0.9)])
```

## Content-Based Recommender Systems

Nearest Neighbors Algorithm

*mtcars dataset source:* 
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

```python
from sklearn.neighbors import NearestNeighbors
# data 
cars = pd.read_csv('mtcars.csv')
t = [15, 300, 160, 3.2]
X = cars.ix[:,(1, 3, 4, 6)]
# fit 
nbrs = NearestNeighbors(n_neighbors=1).fit(X)
nbrs.kneighbors([t])
```

# Chapter 3 -  Evaluating Recommendation Systems

This bank marketing dataset is open-sourced and available for download at the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Bank+Marketing#).

It was originally created by: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

In [7]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

In [8]:
bank_full = pd.read_csv('bank_full_w_dummy_vars.csv')
X = bank_full.ix[:,(18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36)].values
y = bank_full.ix[:,17].values

In [9]:
LogReg = LogisticRegression()
LogReg.fit(X, y)
y_pred = LogReg.predict(X)

In [10]:
print(classification_report(y, y_pred))

             precision    recall  f1-score   support

          0       0.90      0.99      0.94     39922
          1       0.67      0.17      0.27      5289

avg / total       0.87      0.89      0.86     45211

