# Unsupervised Learning Solution
### EDSA - Movie Recommendation 2022 
#### AI Incorporated - Team 4 EDSA

© Explore Data Science Academy

<img src="https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2205222%2Fbca114f2e4f6b9b46f2cc76527d7401e%2FImage_header.png?generation=1593773828621598&alt=media" width=100%/> 

<a id="cont"></a>
## Table of Content

<a href=#one>1. Introduction</a>

    1.1 Overview
    1.2 Problem Statement
    1.3 Model Versioning with COMET
    1.4 Required Installations
       
<a href=#two>1. Import Packages</a>

<a href=#three>2. Collect Data</a>

<a href=#four>4. Exploratory Data Analysis (EDA)<a>
    
    4.1 

<a href=#five>5. Data Processing</a>
    
    5.1 

<a href=#six>6. Feature Engineering</a>

<a href=#seven>7. Modelling</a>
    
    7.1 

<a href=#eight>8. Model Performance</a>
    
    8.1 

<a href=#nine>9. Saving & Exporting Model</a>
    
    9.1 Export Test Prediction as CSV
    9.2 Log to Comet

<a href=#ten>10. Conclusion</a>

<a href=#eleven>11. Recommendation</a>

<a href=#ref>Reference Document Links</a>

<a id="one"></a>
## 1. INTRODUCTION
<a href=#cont>Back to Table of Contents</a>

#### 1.1 Overview

In today’s technology driven world, recommender systems are socially and economically critical to ensure that individuals can make optimised choices surrounding the content they engage with on a daily basis. One application where this is especially true is movie recommendations; where intelligent algorithms can help viewers find great titles from tens of thousands of options.

Hence, We will be constructing a recommendation algorithm based on `Content` and `Collaborative` filtering, capable of accurately predicting how a user will rate a movie they have not yet viewed, based on their historical preferences.

<img src="https://miro.medium.com/max/1400/1*odvftNNQJp3O6vpwmZsJOQ.png" width=100%/> 

#### 1.2 Problem Statement

In this era of Artifical Intelligence, Everthing from the Government to Education down to the ever growing entertainment industry now realies on AI tech to boost their Efficiency. Organisations using recommender systems focus on increasing sales and deliveries as a result of very personalized offers and an enhanced customer experience.

Hence, we will be providing an accurate and robust solution to this challenge has immense economic potential, with users of the system being personalised recommendations - generating platform affinity for the streaming services which best facilitates their audience's viewing.

#### 1.3 Model Versioning with COMET

To Begin with, We will be using Comet, a great tool for model versioning and experimentation as it records the parameters and conditions from each of your experiements- allowing us to reproduce your results, or go back to a previous version of our experiment.

In [1]:
# Install Comet
# !pip install comet_ml

In [2]:
# Import Comet package
from comet_ml import Experiment

# Setting the API key

# experiment = Experiment(
#     api_key="__________",
#     project_name="____________",
#     workspace="________",
# )

'\nGo ahead and get your api_key, project_name & workspace from your\nComet Project Folder.\n'

####  1.4 Required Installations

So, Let's Proceed

<a id="two"></a>
## 2. IMPORT PACKAGES
<a href=#cont>Back to Table of Contents</a>

In this section, we will be importing libraries which are a collections of modules in their classes and based on their functionality. For this Analysis and Modelling, we wil be requiring;

   ` For Data Manupulation, libraries such as Pandas, Numpy etc.`
   
`For Data Visualization, libraries such as mathplotlib, seaborn`
    
`libraries for data prepartion, feature selection, model building, Performance Calculation and more.`

**SEE** in-line comments BELOW for purpose per importation.

In [6]:
""" 
For a seamless run, 
All required libraries will be imported here. 
"""

# Libraries for data loading, data manipulation and data visulisation
import pandas as pd                                                   # for loading CSV data
import numpy as np                                                    # Used for mathematical operations
import matplotlib.pyplot as plt                                       # for Graphical Representation                                                 
import seaborn as sns                                                 # for specialized plots
import re                                                             # for handling Regular expressions                                                           
sns.set()                                                             # set plot style

# Libraries for data preparation


# Libraries for Feature Extraction


# Libraries for Model Building


# Libraries for calculating performance metrics
import time

# Libraries to Save/Restore Models
import pickle

import warnings
warnings.filterwarnings('ignore')
%matplotlib inline 

<a id="three"></a>
## 3. Collect Data
<a href=#cont>Back to Table of Contents</a>

This dataset consists of several million 5-star ratings obtained from users of the online MovieLens movie recommendation service. The MovieLens dataset has long been used by industry and academic researchers to improve the performance of explicitly-based recommender systems, and now you get to as well!

Data available on [Kaggle](https://www.kaggle.com/competitions/edsa-movie-recommendation-2022/data)

We'll be using a special version of the MovieLens dataset which is enriched with additional data, and resampled for fair evaluation purposes.

In [3]:
# Load Data


In [4]:
# View Dataset


<a id="four"></a>
## 4. Exploratory Data Analysis (EDA)
<a href=#cont>Back to Table of Contents</a>

This includes looking to understand patterns in our data, pinpoint any outliers and indicate relationships between variables. This phase we will be carrying out some data analysis, descriptive statistics and data visualisations, all in the bid to understand to properly fine refining the data in the feature engineering in preparation for modeling. 

Hence, let's proceed to carrying out some EDA

In [5]:
# Overview of data


In [6]:
# check to confirm count of null values


<a id="five"></a>
## 5. DATA PROCESSING
<a href=#cont>Back to Table of Contents</a>

The primary funtion of data processing is to provide Faster, higher-quality data, which is key to any successesful model building, and also enabling more valuable insights to be extracted as well. Therefore, Let's commence processing and cleaning our data.

<a id="six"></a>
## 6. Feature Engineering
<a href=#cont>Back to Table of Contents</a>

This involves preparations to make ready our data to serve those structured selected features to models upon request.

<a id="seven"></a>
## 7. Modeling
<a href=#cont>Back to Table of Contents</a>

There are several modelling techniques we can apply as classifiers, and of the vast options, we will be trying;

`1. Base Model: ` Description..... 

`2. Model 2: ` Description..... 

`3. Model 3 `Description..... 


<a id="eight"></a>
## 8. MODEL PERFORMANCE
<a href=#cont>Back to Table of Contents</a>

Here will be reviewing the individual performance of our machine learning model and why to use one in place of the other

### 8.1 Model Testing Scores


In [7]:
# Test scores 


#### 8.2 Best Model Resolution

From the Result we can conclusively say ..............

#### 8.3 Hypertune Best Model

For every model, our goal is to minimize the error or say to have classification or predictions as close as possible to actual values. This is one of the cores or say the major objective of hyperparameter tuning. 

In [8]:
# Creating a pipeline for the gridsearch

# set parameter grid
# param_grid = { }  

# hyper_best_model = Pipeline([ ])

# # Fiting data to Best Model 
# hyper_best_model.fit(X_train, y_train) 

# # predicting the fit on validation set
# y_pred = hyper_best_model.predict(X_val)  

In [9]:
# Best Model Score in %


#### 8.4 Best Model Visual Evaluation
Measuring the effectiveness and performance is what exactly the confusion matrix is design to do. So we will be putting this up bot in Numbers and visuals.

As you can see above........

<a id="nine"></a>
## 9. SAVING & EXPORTING MODEL
<a href=#cont>Back to Table of Contents</a>

Now, we don't want our models just be sitting in some jupyter notebook, at this point, Let's save results to desired format, preferrably CSV and model as a pickle file. This will be used for deployment purposes to solving real life scenerios.

#### 9.1 Export Test Prediction as CSV

In [12]:
'''
Unhash to Run 
(CTRL + /)
'''

# X_test = vect.transform(df_test['message']) 
# test_pred = hyper_best_model.predict(X_test)
# save_df = pd.DataFrame(test_pred, columns=['sentiment'])
# output=pd.DataFrame({'tweetid': df_test['tweetid']})
# submission=output.join(save_df)

'\nUnhash to Run \n(CTRL + /)\n'

In [13]:
# submission.to_csv('submission_hyper.csv', index=False)

In [14]:
# Export Model as pickle file

# model_save_path = "1.0_EDSA_T4_Content_Recommender.pkl"
# with open(model_save_path,'wb') as file:
#     pickle.dump(hyper_best_model, file)

#### 9.2 Log to Comet

In [15]:
# Create dictionaries for the data we want to log
# This had to be defined since that applied to our model is the best from the grid search.

# params ={"random_state": 42,
#          "model_type ": "LogisticsRegression",
#          "Bag of words": "Count_Vectorizer",
#          "C": 0.1,
#          "min_df": 1,
#          "max_df": 0.9,
#          "n_grams": "(1, 2)"
#         }

# nb_metrics ={"Accuracy": metrics.accuracy_score(y_val_en, y_pred),
#              "recall": metrics.recall_score(y_val_en, y_pred, average='micro'),
#              "f1": metrics.f1_score(y_val_en, y_pred, average='micro'),
#             }

# confusionmatrix = confusion_matrix(y_val_en, y_pred)

In [16]:
#log parameters and results
# experiment.log_parameters(params)
# experiment.log_metrics(nb_metrics)
# experiment.log_notebook('5.0 Advance_Classification_Notebook.ipynb', overwrite=False)
# experiment.log_confusion_matrix(labels=["News", "pro", "Neutral","Anti"], matrix=confusionmatrix)

NOTE: It is required If using comet within a jupyter notebook, to end our experiment on completion as illustrated below.

In [17]:
# STRICTLY FOR LOCAL JUPYTER NOTEBOOKS
# experiment.end()

Kindly [Go to Streamlite Webpage](http://) to test-run an actual perfromance of our model on the web.

<a id="ten"></a>
## 10. Conclusion
<a href=#cont>Back to Table of Contents</a>

   In summary .......

<a id="eleven"></a>
## 11. Recommendation
<a href=#cont>Back to Table of Contents</a>

.......

<a id="ref"></a>
## Reference Links
<a href=#cont>Back to Table of Contents</a>

* [EXPLORE Data Science Academy Resources](https://explore-datascience.net/)
* [GitHub Collab Ref.](https://github.com/)
* [Commet Collab Ref](https://www.comet.ml/) 
* [Kaggle Collab Ref](https://www.kaggle.com/competitions/edsa-movie-recommendation-2022/overview)
* [How to Build a Movie Recommendation System by Ramya Vidiyala](https://towardsdatascience.com/how-to-build-a-movie-recommendation-system-67e321339109)