# Unsupervised Learning Predict Student Solution

© Explore Data Science Academy

---
### Honour Code

We, **Team 9**, confirm - by submitting this document - that the solutions in this notebook are a result of our own work and that we abide by the [EDSA honour code](https://drive.google.com/file/d/1QDCjGZJ8-FmJE3bZdIQNwnJyQKPhHZBn/view?usp=sharing).

Non-compliance with the honour code constitutes a material breach of contract.



### Predict Overview: EDSA Movie Recommendation 2022

The global movie industry is a multi-billion dollar industry. According to <a href="https://www.forbes.com/sites/bradadgate/2022/03/17/overview-of-the-entertainment-market-in-2021-coming-out-of-covid-19/?sh=2e3d5f94519c">Forbes</a>, a total revenue of 36.8 billion USD was made in the US entertainment (home and mobile) market in 2021. This market consist of digital and physical (discs), as well as the theatrical market, and the revenue accounted for a year-over-year increase of 14%, a figure that surpassed the record 36.1 billion USD in 2019.

When pay TV subscriptions were included the revenue for the entertainment market jumped to 133.5 billion USD, and this represents a slight drop-off from revenue in 2020 (133.7 billion USD). This is a clear indication that pay TV subscritions are the biggest revenue generating facet of the movie industry.

Globally, in 2021, the home, mobile and theatrical market totaled 99.7 billion USD in revenue, while when pay TV subscription was included, the entertainment market reached 328.2 billion USD in revenue. According to the 2021 report by The Motion Picture Association in the US, there were 135 streaming video providers in the U.S. offering movies and television shows to viewers, with Netlix being a major player.

Providers of streaming services heavily depend on movie recommendation algorithms. The Netflix Recommendation Engine is the most succesful of these algorithms. It’s so accurate that 80% of Netflix viewer activity is driven by personalised recommendations from the engine. It’s estimated that the Netflix recommendation Engine saves Netflix <a href="https://www.lighthouselabs.ca/en/blog/how-netflix-uses-data-to-optimize-their-product#:~:text=The%20Netflix%20Recommendation%20Engine&text=It's%20so%20accurate%20that%2080,is%20driven%20by%20personalised%20recommendations.">over 1 billion USD per year</a>.

Providers of streaming services are in a race to optimize the performance of their movie recommendation algorithm such that it performs as good or even better than the Netflix Recommendation Engine; therefore, develeping a good movie recommendation system come with enormous economic gains.  As a team, we intend to develope a recommendation algorithm based on content or collaborative filtering, that is capable of accurately predicting how a user will rate a movie they have not yet viewed, based on their historical preferences. This will enable our algorithm to recommend movies that users will most likely rate high and want to watch.

<a id="cont"></a>

## Table of Contents

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Loading Data</a>

<a href=#three>3. Exploratory Data Analysis (EDA)</a>

<a href=#four>4. Data Engineering</a>

<a href=#five>5. Modeling</a>

<a href=#six>6. Model Performance</a>

<a href=#seven>7. Model Explanations</a>

<a href=#eight>8. Comet</a>

 <a id="one"></a>
## 1. Importing Packages
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Importing Packages ⚡ |
| :--------------------------- |
| In this section, we import and briefly discuss the libraries that will be used throughout your analysis and modelling. |

---

In [1]:
from comet_ml import Experiment

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import string
from wordcloud import WordCloud

<a id="two"></a>
## 2. Loading the Data
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Loading the data ⚡ |
| :--------------------------- |
| In this section, we load the data from the `CSV files` file into a DataFrame. |

---

In [None]:
#df = pd.read_csv("train.csv")

<a id="three"></a>
## 3. Exploratory Data Analysis (EDA)
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Exploratory data analysis ⚡ |
| :--------------------------- |
| In this section, we perform an in-depth analysis of all the variables in the DataFrame. |



Here we use various methods to take an in-depth look at our dataframe. These methods include:
<ul>
<li>isnull()</li>
<li>info()</li>
<li>shape</li>
<li>WordCloud()</li>

</ul>

<a id="four"></a>
## 4. Data Engineering
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Data engineering ⚡ |
| :--------------------------- |
| In this section, we clean the dataset and possibly create new features - as identified in the EDA phase. |

---

<a id="five"></a>
## 5. Modelling
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Modelling ⚡ |
| :--------------------------- |
| In this section, we construct a recommendation algorithm based on content or collaborative filtering, capable of accurately predicting how a user will rate a movie they have not yet viewed, based on their historical preferences.

---

<a id="six"></a>
## 6. Model Performance
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model performance ⚡ |
| :--------------------------- |
| In this section, we compare the relative performance of the various trained ML models on a holdout dataset and comment on what model is the best and why. |

---

<a id="seven"></a>
## 7. Model Explanations
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model explanation ⚡ |
| :--------------------------- |
| In this section, we discuss how the best performing model works in a simple way so that both technical and non-technical stakeholders can grasp the intuition behind the model's inner workings. |

---

<a id="eight"></a>
## 8. Comet
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model explanation ⚡ |
| :--------------------------- |
| In this section, we customize and combine our data, code, visualizations, reports, and much more. |

---