# Movie Recommendations

# Notebook 1: Introduction

## Purpose

There are so many movies and online information about them to help viewers decide what to watch. With an overwhelming amount of information, it can be challenging for people to decide and watch a movie that they would enjoy. Recommendation systems are developed to help resolve this issue by providing movie recommendations. 

The following notebooks encode various recommendation systems using information provided by the latest and smallest dataset from [MovieLens](https://grouplens.org/datasets/movielens/). The raw data was not uploaded to Github following the Usage License quote: "The user may not redistribute the data without separate permission."

The selected content-based and collaborative filtering recommendation systems were chosen with respect to evaluating them appropriately across the same metric, the **mean average precision at k (MAP@k)**. 

# Notebook Structure 

### Notebook 1: Introduction 

### Notebook 2: Initial Setup

**1. Setup** - importing libraries and functions

**2. Load Data** - loading datasets

**3. Clean Data** - removing duplicates 

### Notebook 3: Exploratory Data Analysis (EDA)

**4. Exploratory Data Analysis (EDA)** - data visualisations

### Notebook 4: Data Preparation

**5. Data Preparation** - preparing data for modelling purposes

### Notebook 5: Recommendation Systems

**6. Modelling** - built models on the training set

### Notebook 6: Evaluation

**7. Evaluation** - evaluate models with MAP@k using the test set 

**8. Conclusion** - conclusive remarks

# About the Data

Summary
=======

This dataset (ml-latest) describes 5-star rating and free-text tagging activity from [MovieLens](http://movielens.org), a movie recommendation service. It contains 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users and was last updated in September 2018. The dataset was sourced [here](https://grouplens.org/datasets/movielens/), and was not uploaded to Github following the Usage License quote: "*The user may not redistribute the data without separate permission.*"


Content and Use of Files
========================

Formatting and Encoding
-----------------------

The dataset files are written as [comma-separated values](http://en.wikipedia.org/wiki/Comma-separated_values) files with a single header row. Columns that contain commas (`,`) are escaped using double-quotes (`"`). These files are encoded as UTF-8. 


Movie Ids
---------

These movie ids are consistent with those used on the MovieLens web site (e.g., id `1` corresponds to the URL <https://movielens.org/movies/1>). 


User Ids
--------

MovieLens users in `ratings.csv` were selected at random for inclusion. Their ids have been anonymized. 


Movies Data File Structure (movies.csv)
---------------------------------------

Movie information is contained in the file `movies.csv`. Each line of this file after the header row represents one movie, and has the following format:

    movieId,title,genres

Movie titles are entered manually or imported from <https://www.themoviedb.org/>, and include the year of release in parentheses. Errors and inconsistencies may exist in these titles.

Genres are a pipe-separated list, and are selected from the following:

* Action
* Adventure
* Animation
* Children's
* Comedy
* Crime
* Documentary
* Drama
* Fantasy
* Film-Noir
* Horror
* Musical
* Mystery
* Romance
* Sci-Fi
* Thriller
* War
* Western
* (no genres listed). 

Ratings Data File Structure (ratings.csv)
-----------------------------------------

All ratings are contained in the file `ratings.csv`. Each line of this file after the header row represents one rating of one movie by one user, and has the following format:

    userId,movieId,rating,timestamp

The lines within this file are ordered first by userId, then, within user, by movieId.

Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).

Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.