# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objective

At the end of this experiment, you will be able to :

* Recommend movies to the users

In [None]:
#@title Experiment Walkthrough Video

from IPython.display import HTML

HTML("""<video width="800" height="400" controls>
  <source src="https://cdn.talentsprint.com/talentsprint/archives/sc/aiml/aiml_labs_blr/movie_recommendation_system_knn.mp4" type="video/mp4">
</video>
""")

## Dataset

### Description

The dataset chosen for this experiment is a subset of the original movielens dataset.

Consider the problem of recommending movies to users. You have M Users and N Movies. 
Now, you want to predict whether a given test user $x$ will watch movie $y$.

User $x$ has seen and not seen few movies in the past. you will use $x$'s movie watching history as a feature for our recommendation system.

Let us use KNN to find the K nearest neighbour users (users with similar taste) to $x$, and make predictions based on their entries for movie $y$.

A user either had seen the movie (1) or not seen the movie (0). You can represent this as a matrix of size M×N. (M rows and N columns). We have actually used a dictionary with the keys userId and movieId to represent this matrix.

Each element of the matrix is either zero or one. If (u, m) entry in this matrix is 1, then the $u^{th}$ user has seen the movie $m$.

#### Training set
M×N binary matrix indicating seen/not-seen.
#### Test set: 
L test cases with $(x, y)$ pairs. $x$ is N-dimensional binary vector with missing $y^{th}$ entry - which we want to predict.


### Data Source

* AIML_DS_MOVIE-TRAIN_SMALLSUBSETOFMOVIELENSDATASET.csv

*  AIML_DS_MOVIE-TEST_SMALLSUBSETOFMOVIELENSDATASET.csv

This is a small subset of the original movielens dataset.
https://grouplens.org/datasets/movielens/



* Let us use KNN to find the K nearest neighbour users (users with similar taste) to $x$, and make predictions based on their entries for the movie $y$.

* We have given the code for Cosine distance, when computing nearest neighbours.

### Setup Steps

In [None]:
#@title Please enter your registration id to start: (e.g. P181900101) { run: "auto", display-mode: "form" }
Id = "2100121" #@param {type:"string"}


In [None]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "5142192291" #@param {type:"string"}


In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()
  
notebook= "U4W23_57_MovieRecommendationSystem_KNN_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")
    ipython.magic("sx wget https://cdn.talentsprint.com/aiml/Experiment_related_data/AIML_DS_MOVIE-TEST_SMALLSUBSETOFMOVIELENSDATASET.csv")
    ipython.magic("sx wget https://cdn.talentsprint.com/aiml/Experiment_related_data/AIML_DS_MOVIE-TRAIN_SMALLSUBSETOFMOVIELENSDATASET.csv")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None
    
    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getWalkthrough() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook, "feedback_walkthrough":Walkthrough ,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:        
        print(r["err"])
        return None   
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://aiml.iiith.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if not Additional: 
      raise NameError
    else:
      return Additional  
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None
  
  
def getWalkthrough():
  try:
    if not Walkthrough:
      raise NameError
    else:
      return Walkthrough
  except NameError:
    print ("Please answer Walkthrough Question")
    return None
  
def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None
  

def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError 
    else: 
      return Answer
  except NameError:
    print ("Please answer Question")
    return None
  

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup() 
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


### Importing required packages


In [None]:
import pandas as pd

### Setting up the files

In [None]:
Train_set = "AIML_DS_MOVIE-TRAIN_SMALLSUBSETOFMOVIELENSDATASET.csv"
Test_set = "AIML_DS_MOVIE-TEST_SMALLSUBSETOFMOVIELENSDATASET.csv"   

In [None]:
Train_set

'AIML_DS_MOVIE-TRAIN_SMALLSUBSETOFMOVIELENSDATASET.csv'

### Loading the data from set up files


In [None]:
rated = pd.read_csv(Train_set, converters={"userId":int, "movieId":int})
rated.describe()

Unnamed: 0,userId,movieId,rating
count,80045.0,80045.0,80045.0
mean,345.401574,1654.71185,3.544594
std,195.180637,1887.186635,1.058349
min,0.0,0.0,0.5
25%,179.0,327.0,3.0
50%,363.0,870.0,4.0
75%,518.0,2337.0,4.0
max,670.0,9065.0,5.0


In [None]:
rated

Unnamed: 0,userId,movieId,rating
0,0,0,2.5
1,0,1,3.0
2,0,2,3.0
3,0,3,2.0
4,0,5,2.0
...,...,...,...
80040,670,7005,2.5
80041,670,4771,4.0
80042,670,1329,4.0
80043,670,1331,2.5


In [None]:
userCount = max(rated.userId)
movieCount = max(rated.movieId)
print(userCount, movieCount)

670 9065


In [None]:
# User who have watched the movie are considered as 1 in the dictionary
seen = {}
for x in rated.values:
    seen[(int(x[0]), int(x[1]))] = 1     # Storing Key as (userId, movieId): value as 1 in dictionary
len(seen) 

80045

In [None]:
# Storing all matching possibilities of users and movies
allUsersMovies = [(u,m) for u in range(userCount) for m in range(movieCount)]  

# 670*9065 is the total matching possibilities of users and movies
len(allUsersMovies) 

6073550

In [None]:
# If one particular match (user, movie) is not provided in data, then that user has not watched that movie, so it is considered as 0 in the dictionary
for x in allUsersMovies:
    if x not in seen:
        seen[x] = 0

Now we have the data loaded into a dictionary, let us recast the distance function to use it. Given two users, $u_1$ and $u_2$, for a movie $mx$, we must ignore the entries for $mx$ while computing the distance

In [None]:
# This is actually to find the distance between user 1 and user2 for all the movies
def distance(u1, u2, mx):
    d = 0 - seen[(u1, mx)] * seen[(u2, mx)]  # Consider all movies except mx movie, only if mx is watched by u2. Otherwie 'd' value will be 0

    for m in range(movieCount):
        d += seen[(u1, m)] * seen[(u2, m)]      # Distance is based on how many movies did user1 and user2 watched in similar
    return d

def kNN(k, givenUser, givenMovie):
    '''calculating the distance between given user and all other remaining users,
    returning the top 'k' no.of users with higher distance (as cosine distance is based on similarity)'''
    distances = []
    for u in range(userCount):
        if u != givenUser:  # Considering all users other than given user to main
            distances.append([distance(u, givenUser, givenMovie), u])
    distances.sort()
    distances.reverse() # Because cosine distances mean higher = closer
    return distances[:k] 

def prediction(k, givenUser, givenMovie):
    '''For the given user and given movie we are getting k-nearest neighbours based on cosine distance, 
       if half of the neighbour users saw the given movie, which means user is likely to watch the movie'''
    neighbours = kNN(k, givenUser, givenMovie)
    howmanySaw = sum([seen[(u, givenMovie)] for d, u in neighbours])

    return 2 * howmanySaw > k      # Predict 1 (True) if more than half of the similar users have seen this movie, otherwise 0 (False).    

In [None]:
test_data = pd.read_csv(Test_set)
test_data.head()

Unnamed: 0,userId,movieId,rating
0,0,4,4.0
1,0,9,2.0
2,0,13,4.0
3,0,16,2.0
4,0,17,2.5


In [None]:
# Take input from test data for prediction
prediction(5,0,4)

False

### Summary

In above experiment we have learnt how to build recommendation systems using KNN classifier.

### Please answer the questions below to complete the experiment:

In [None]:
#@title State True or False: In the experiment above, two users are considered nearest neighbors, if they both have watched same number of movies(not necessarily common movies)? { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "TRUE" #@param ["","TRUE","FALSE"]


In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good, But Not Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "nn" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [None]:
#@title  Experiment walkthrough video? { run: "auto", vertical-output: true, display-mode: "form" }
Walkthrough = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 15090
Date of submission:  13 Feb 2021
Time of submission:  08:32:57
View your submissions: https://aiml.iiith.talentsprint.com/notebook_submissions
