# Movie Recommender System - Thirteen Analytics Consulting

© Explore Data Science Academy

---

<img alt="Movie recommendations" src="https://github.com/Explore-AI/unsupervised-predict-streamlit-template/raw/master/resources/imgs/Image_header.png">

## Introduction

In the 21st century, recommender systems have proven to be socially and economically critical in the optimization of decision making around daily tasks. The notion is to create a seamless process that increases the chance of going with the best choice in every scenario. The application of recommender system is pronounced in email spam classification and movie title selections. In this project, the goal is to develop a model that can help viewers get the best title from thousands of movie titles.  

The key objectives of the project include:
* Develop a recommendation algorithm based on content or collaborative filtering. 
* The model predict how an individual will rate a movie based on historical movie selection patterns.
* The model should demonstrate the potential to be applied to create an economically robust system that enables users to have a personalized recommendaation system for a daily use. 

## Comet API

Comet is a version control system for machine learning models. Each project is made of experiments that keep record of all the libraries and actions taken to get to the final model. 

In [1]:
# import comet_ml at the top of your file
from comet_ml import Experiment

# Create an experiment with your api key
experiment = Experiment(
    api_key="pRMxFxeNwUPYOyNGu3BPn91GY",
    project_name="movie-recommender",
    workspace="jakam",
)

COMET INFO: Experiment is live on comet.ml https://www.comet.ml/jakam/movie-recommender/96f8a94e0c0c499ca855c1bae991b792



## Importing Packages

In [2]:
# Standard Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
%matplotlib inline
from datetime import datetime
import re
#import preprocessor as p

import surprise
from sklearn.metrics.pairwise import cosine_similarity 
from sklearn.feature_extraction.text import TfidfVectorizer
from surprise import Dataset
from surprise import Reader

# Libraries used during sorting procedures.
import operator # <-- Convienient item retrieval during iteration 
import heapq # <-- Efficient sorting of large lists


import warnings
warnings.filterwarnings('ignore')

## Loading Data

The project utilises a special version of MovieLens dataset, which has been enriched with additional data and resampled for fair evaluation purposes. 

**Data Source**<br>
The data is maintained by GroupLens research group in the Department of Computer Science and Engineering at the University of Minnesota. The additional data content was legally scraped from [IMDB](https://www.imdb.com/).

**Provided Files**
* genome_scores.csv - a score mapping the strength between movies and tag-related properties. Read more here
* genome_tags.csv - user assigned tags for genome-related scores
* imdb_data.csv - Additional movie metadata scraped from IMDB using the links.csv file.
* links.csv - File providing a mapping between a MovieLens ID and associated IMDB and TMDB IDs.
* sample_submission.csv - Sample of the submission format for the hackathon.
* tags.csv - User assigned for the movies within the dataset.
* test.csv - The test split of the dataset. Contains user and movie IDs with no rating data.
* train.csv - The training split of the dataset. Contains user and movie IDs with associated rating data.
<br>

In [5]:
df_train=pd.read_csv('data/train.csv')
df_test=pd.read_csv('data/test.csv')
df_tag=pd.read_csv('data/tags.csv')
df_movies=pd.read_csv('data/movies.csv')
df_links=pd.read_csv('data/links.csv')
df_imdb=pd.read_csv('data/imdb_data.csv')
df_genome_tags=pd.read_csv('data/genome_tags.csv')
df_genome_scores=pd.read_csv('data/genome_scores.csv')

All the required for the analysis has been loaded in preparation for preview and exploratoray analysis. 

In [7]:
# Preview train data
df_train.head(10)

Unnamed: 0,userId,movieId,rating,timestamp
0,5163,57669,4.0,1518349992
1,106343,5,4.5,1206238739
2,146790,5459,5.0,1076215539
3,106362,32296,2.0,1423042565
4,9041,366,3.0,833375837
5,120949,81768,3.0,1289595242
6,19630,62049,4.0,1246729817
7,21066,2282,1.0,945785907
8,117563,120474,4.0,1515108225
9,144018,1997,5.0,1109967647


### End Experiment

In [None]:
#experiment.end()