<a href="https://colab.research.google.com/github/PhDNoe/PI_ML_OPS/blob/main/recoSystem/MLOPS_RS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Noelia's ML_OPS Project!  👻👻

###  Notebook: Recomendation System
![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)


### Google drive mount

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%cd /content/drive/My\ Drive/DatosParaColab/MLOPS
%ls

/content/drive/My Drive/DatosParaColab/MLOPS
all_ratings.csv              model_platforlessm.pkl
all_together_with_score.csv  model_platform.pkl


![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

### Load all_ratings.csv 


![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

In [None]:
import pandas as pd

In [None]:
rating = pd.read_csv('all_ratings.csv')
rating.head()

Unnamed: 0.1,Unnamed: 0,userId,rating,timestamp,movieId
0,0,1,1.0,1425941529,as680
1,1,1,4.5,1425942435,ns2186
2,2,1,5.0,1425941523,hs2381
3,3,1,5.0,1425941546,ns3663
4,4,1,5.0,1425941556,as9500


![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

### We have two options
case 1) A movie X on two different platforms will have a different average rating. For instance, movies available on different platforms are treated as completely distinct and independent movies. <br>
case 2) Consider that a movie X will have a uniform rating across any platform

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)


In [None]:
# rating dataframe --> movies in different platforms are treated like a different movie
rating.drop(columns=["Unnamed: 0"], inplace=True)
rating.rename(columns={'rating':'score'}, inplace=True)

# rating 2 --> movies in different platforms are treated like the same movie (with the same average score)
rating2 = rating.copy()
rating2['movieId'] = rating2['movieId'].map(lambda x: x.replace(x[0],""))

In [None]:
rating2.head()

Unnamed: 0,userId,score,timestamp,movieId
0,1,1.0,1425941529,s680
1,1,4.5,1425942435,s2186
2,1,5.0,1425941523,s2381
3,1,5.0,1425941546,s3663
4,1,5.0,1425941556,s9500



![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
### Install scikit-surprise


![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

In [None]:
!pip install scikit-surprise

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

### Load  dataset. Case 1

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

In [None]:

from surprise import Dataset, Reader
from surprise import SVD
from surprise.model_selection import train_test_split
from surprise.model_selection import cross_validate

# Load Dataset rating data
reader = Reader(rating_scale=(0, 5))
data_platform = Dataset.load_from_df(rating[['userId', 'movieId', 'score']], reader)




![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
### Train on rating dataframe (case 1 )

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

In [None]:
train_set_p, test_set_p = train_test_split(data_platform, test_size = 0.25)


In [None]:
model_platform = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.4)
model_platform.fit(train_set_p)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f056bb50fd0>

---
### Predict

---

In [None]:
userId=124378
movieId = 'hs2381'
predicted_score_ss = model_platform.predict(userId, movieId).est

predicted_score_ss

3.627841631886107

---
### Save model

----

In [None]:
import pickle

with open('model_platform.pkl', 'wb') as f:
    pickle.dump(model_platform, f)

---
### rmse across all test set

---

In [None]:
from surprise import accuracy
# Calcular las predicciones sobre los datos de prueba
predictions = model_platform.test(test_set_p)

# Calcular la precisión del modelo
accuracy.rmse(predictions)

RMSE: 0.9701


0.9700930783606576

---
### If i need to load model -->

---

In [None]:
import pickle

# Cargar el modelo desde el archivo
with open('model_platform.pkl', 'rb') as f:
    model_platform = pickle.load(f)

---
### Cross validation on 2 folds (it takes too long)

---

In [None]:
model_cv = SVD(n_factors=50)
x = cross_validate(model_cv, data_platform, measures=['RMSE', 'MAE'], cv=2, verbose=True)

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

### Import garbage collector 

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

In [None]:
import gc


In [None]:
gc.collect()

0

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
### Load dataset. Case 2

![divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

In [None]:
data_platformless = Dataset.load_from_df(rating2[['userId', 'movieId', 'score']], reader)


In [None]:
train_set_pss, test_set_pss = train_test_split(data_platformless, test_size = 0.25)
model_platformless = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.4)


In [None]:
model_platformless.fit(train_set_pss)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f6eb20d85b0>

---
### Prediction

---

In [None]:
userId=124378
movieId = 's2381'
predicted_score_ss = model_platformless.predict(userId, movieId).est

predicted_score_ss

3.62119327658048

---
### Cross validation across 2 folds

---

In [None]:
model_cv2 = SVD(n_factors=50)
x2 = cross_validate(model_cv2, data_platformless, measures=['RMSE', 'MAE'], cv=2, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 2 split(s).

                  Fold 1  Fold 2  Mean    Std     
RMSE (testset)    0.9920  0.9917  0.9918  0.0002  
MAE (testset)     0.7698  0.7698  0.7698  0.0000  
Fit time          106.39  120.31  113.35  6.96    
Test time         113.99  131.03  122.51  8.52    


----

### Save model

---

In [None]:
import pickle

with open('model_platforlessm.pkl', 'wb') as f:
    pickle.dump(model_platformless, f)

---
### Predictions across the entire test dataset

---

In [None]:

predictions_ss = model_platformless.test(test_set_pss)

# Calcular la precisión del modelo
accuracy.rmse(predictions_ss)

RMSE: 0.9709


0.9709165599562262

---
### Load the model

---

In [None]:
# Cargar el modelo desde el archivo
with open('model_platformless.pkl', 'rb') as f:
    model_platformless = pickle.load(f)

---
