## Code Samples

### Predictor sample

In [9]:
from predict import main as predict

userID = 1 
movie_ID = 3
dataset_name = 'ml-latest-small'

predict([userID, movie_ID, dataset_name])

Predicted rating for movie "Grumpier Old Men (1995)" and user with userId=1: 4.036883202686046 (rounded: 4.0)


### Dataset sample
Let's validate the above result with get_movie_by_id method from our MovieLensDataset class.

In [21]:
from dataset import MovieLensDataset

dataset = MovieLensDataset(dataset_name)
dataset.get_movie_by_id(movie_ID)

Movie(id=3, title='Grumpier Old Men (1995)', genres=['Comedy', 'Romance'])

### Baseline predictor sample

In [13]:
from baseline import BaselinePredictor

baseline = BaselinePredictor()
baseline.fit(dataset)
baseline.predict(userID, movie_ID)

3.2596153846153846

Since we are testing predictions, it would be better if we had an actual value of the rating. Let's check it.

In [27]:
ratings = dataset.get_ratings()
ratings = ratings.reset_index()
result = ratings[(ratings['userId'] == userID) & (ratings['movieId'] == movie_ID)]
print(result)

   userId  movieId  rating  timestamp
1       1        3     4.0  964981247


As we can see, baseline in this case gives prediction closer to the mean (which is around 3.5), whereas predictor class returns more accurate result, nonetheless substantially further from the mean. We predict that such behaviour may occur often, due to the construction of our algorithm.

### Preprocessing
Finally, let's demonstrate the Preprocessing class

In [41]:
from preprocessing import MovieLensDatasetPreprocessor

preprocessor = MovieLensDatasetPreprocessor()
print(dataset.get_movies().head())
preprocessor.fit(dataset)

                                      title  \
movieId                                       
1                          Toy Story (1995)   
2                            Jumanji (1995)   
3                   Grumpier Old Men (1995)   
4                  Waiting to Exhale (1995)   
5        Father of the Bride Part II (1995)   

                                              genres  
movieId                                               
1        Adventure|Animation|Children|Comedy|Fantasy  
2                         Adventure|Children|Fantasy  
3                                     Comedy|Romance  
4                               Comedy|Drama|Romance  
5                                             Comedy  


In [39]:
ohe_movies = preprocessor.movies_ohe()
ohe_movies.head()

Unnamed: 0_level_0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1,0,1,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0
5,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


As we can see above, ohe_movies worked as intended. Now let's proceed to processing ratings:

In [42]:
preprocessed_ratings = preprocessor.preprocess_ratings()
print(preprocessed_ratings.head())

        Action  Adventure  Animation  Children  Comedy  Crime  Documentary  \
userId                                                                       
1        False       True       True      True    True  False        False   
1        False      False      False     False    True  False        False   
1         True      False      False     False   False   True        False   
1        False      False      False     False   False  False        False   
1        False      False      False     False   False   True        False   

        Drama  Fantasy  Film-Noir  ...  rating_0.5  rating_1.0  rating_1.5  \
userId                             ...                                       
1       False     True      False  ...       False       False       False   
1       False    False      False  ...       False       False       False   
1       False    False      False  ...       False       False       False   
1       False    False      False  ...       False       False 

As you can see, we dropped timestamp column and used OHE on the rating.