#### Using the small MovieLens data set, this program allows users to input a movie they like (in the data set) and recommends ten other movies for them to watch. 

For this task, we will use a correlation based recommender.

In [1]:
import pandas as pd
import numpy as np
import warnings
rat=pd.read_csv('ratings.csv')
mov=pd.read_csv('movies.csv')

In [3]:
mov.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [4]:
rat=rat.drop('timestamp',axis=1)

In [5]:
mov.shape

(9742, 3)

So there are 9742 movies in this database.

In [6]:
rat.shape

(100836, 3)

So with about 100,000 ratings of about 10,000 movies, we see that there are 10 ratings per movie, on average.  
Now let's attach titles to the ratings.

In [7]:
rat2=rat.merge(mov, on='movieId', how='left')

Perhaps it is worthwhile to know how many ratings any given movie has in the database.

In [8]:
numratings=rat2.groupby('title')['rating'].count()
numratings.head()

title
'71 (2014)                                 1
'Hellboy': The Seeds of Creation (2004)    1
'Round Midnight (1986)                     2
'Salem's Lot (2004)                        1
'Til There Was You (1997)                  2
Name: rating, dtype: int64

Now we can build the reshaped table with each movie in its own column, and each person writing the reviews in their own row. 

In [9]:
bigtable=rat2.pivot_table(index='userId',columns='title',values='rating' )

In [10]:
bigtable.shape

(610, 9719)

That's mysterious- what happened to the other 23 movies? Maybe their titles could not be made into columns? After all, there were no missing titles.

In [12]:
warnings.simplefilter("ignore")

Once that's ready we can start the program!

Let's get a movie selection from the user. However, looking at the titles we see that they want the year of the movie's release as part of the title, so we will prompt the user for a few samples as a convenience.

In [14]:
valid=False
while valid==False:
    r=np.random.default_rng()
    rand=r.integers (9742, size=5)

    prompt= str('Choose a movie that you enjoy, including the year of release.\n'+ 
                'Here are a few options:\n'+ str(mov.title[rand].values)+'\n')
    movchoice=input(prompt)
    if movchoice in mov.title.values:
        valid=True
    else:
        print('You entered ',movchoice)
        again=input('That movie name or year is not in the list. Would you like to try again? (y/n)')
        if again !='y':
            print ('OK, goodbye\n')
            break


if valid==True:
    print('\n\n')
    # This is the main engine- the 'correlate with' method
    corr2=bigtable.corrwith(bigtable[movchoice])
    corr2.dropna(inplace=True)
    corr2=pd.DataFrame(corr2).join(numratings, on='title')
    corr2=corr2[corr2['rating']>50]
    if len(corr2)>0:
        print('Here are a number of movies that you may enjoy if you enjoyed '+ movchoice+'\n')
        print(corr2.sort_values(0, ascending=False).head(10))
    else:
        print('Unfortunately, it seems that I cannot recommend anything reliably. Try a different movie next time.')

Choose a movie that you enjoy, including the year of release.
Here are a few options:
['Young Guns (1988)' 'Boys from Brazil, The (1978)' 'Duchess, The (2008)'
 'Lost Skeleton of Cadavra, The (2002)' "Dad's Army (1971)"]
Young Guns (1988)



Here are a number of movies that you may enjoy if you enjoyed Young Guns (1988)

                                                           0  rating
title                                                               
Eyes Wide Shut (1999)                               0.934439      53
Spirited Away (Sen to Chihiro no kamikakushi) (...  0.878310      87
Superbad (2007)                                     0.857690      55
No Country for Old Men (2007)                       0.843527      64
Up (2009)                                           0.833333     105
Bourne Identity, The (2002)                         0.810501     112
Toy Story 3 (2010)                                  0.805837      55
Naked Gun 33 1/3: The Final Insult (1994)           0.80

Works great!