<a href="https://colab.research.google.com/github/ilmuneraka/letterboxd-friends-ranker/blob/main/Letterboxd_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

DOMAIN = "https://letterboxd.com"

# Scraping with BeautifulSoup

In [2]:
# we will try to get html of my letterboxd page
username = "cacingpincang"
url = DOMAIN + "/" + username + "/films/"
url_page = requests.get(url)
soup = BeautifulSoup(url_page.content, 'html.parser')

In [3]:
# we will take a look at the html
soup


<!DOCTYPE html>

<!--[if lt IE 7 ]> <html lang="en" class="ie6 lte9 lte8 lte7 lte6 no-js"> <![endif]-->
<!--[if IE 7 ]>    <html lang="en" class="ie7 lte9 lte8 lte7 no-js"> <![endif]-->
<!--[if IE 8 ]>    <html lang="en" class="ie8 lte9 lte8 no-js"> <![endif]-->
<!--[if IE 9 ]>    <html lang="en" class="ie9 lte9 no-js"> <![endif]-->
<!--[if (gt IE 9)|!(IE)]><!--> <html class="no-mobile no-js" id="html" lang="en"> <!--<![endif]-->
<head>
<meta charset="utf-8"/>
<meta content="width=1024" name="viewport"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="Alfian Hakim’s films" name="description"/>
<meta content="https://letterboxd.com/cacingpincang/films/" property="og:url"/>
<meta content="Alfian Hakim’s films" property="og:title"/>
<meta content="Alfian Hakim’s films" property="og:description"/>
<meta content="https://s.ltrbxd.com/static/img/default-share.e38c5d62.png" property="og:image"/>
<meta content="Letterboxd" name="application-name"/>
<meta content

### Checkpoint 1
- the movie list is inside unordered list (ul) tag with class name of "poster-list -p70 -grid film-list clear", so we will find that tag to get the list of movies
- the information we could take is the movie's rating, title, liked or not, and link
- the rating format is in stars, so we need a function that transforms it into number
- we also need to deal with pagination

In [4]:
def transform_ratings(some_str):
    """
    transforms raw star rating into float value
    :param: some_str: actual star rating
    :rtype: returns the float representation of the given star(s)
    """
    stars = {
        "★": 1,
        "★★": 2,
        "★★★": 3,
        "★★★★": 4,
        "★★★★★": 5,
        "½": 0.5,
        "★½": 1.5,
        "★★½": 2.5,
        "★★★½": 3.5,
        "★★★★½": 4.5
    }
    try:
        return stars[some_str]
    except:
        return -1

# find the unordered list
ul = soup.find("ul", {"class": "poster-list"})

# find the <li> inside <ul>
movies = ul.find_all("li")
# we will try to iterate through the movies and then print the title and rating
for movie in movies:
  print(movie.find('img')['alt'])
  print(movie.find('p', {"class": "poster-viewingdata"}).get_text().strip())
  print(transform_ratings(movie.find('p', {"class": "poster-viewingdata"}).get_text().strip()))
  print("Liked: " + str(movie.find('span', {'class': 'like'})!=None))

A Man Called Otto
★★★★
4
Liked: False
Puss in Boots: The Last Wish
★★★★★
5
Liked: True
The Menu
★★★★
4
Liked: False
Glass Onion: A Knives Out Mystery
★★★★
4
Liked: False
The Banshees of Inisherin
★★★★★
5
Liked: True
Stealing Raden Saleh
★★★★½
4.5
Liked: False
All or Nothing: Arsenal
★★★★
4
Liked: False
Nope
★★★★
4
Liked: False
Missing Home
★★★★½
4.5
Liked: False
Elvis
★★★★
4
Liked: False
Men
★★½
2.5
Liked: False
Srimulat: Hil Yang Mustahal – Babak Pertama
★★★★
4
Liked: False
Doctor Strange in the Multiverse of Madness
★★★★
4
Liked: False
Sonic the Hedgehog 2
★★★★
4
Liked: True
Everything Everywhere All at Once
★★★★½
4.5
Liked: True
The Batman
★★★★½
4.5
Liked: True
Neymar: The Perfect Chaos
★★★★
4
Liked: False
Spider-Man: No Way Home
★★★½
3.5
Liked: False
Nussa
★★★½
3.5
Liked: False
Clickbait
★★★½
3.5
Liked: False
Free Guy
★★★★
4
Liked: False
The Suicide Squad
★★★★
4
Liked: True
Red Rocket
★★★★½
4.5
Liked: False
Cruella
★★★
3
Liked: False
Wrath of Man
★★★½
3.5
Liked: False
Mortal Kombat

In [5]:
# dealing with pagination
li_pagination = soup.findAll("li", {"class": "paginate-page"})
li_pagination

[<li class="paginate-page paginate-current"><span>1</span></li>,
 <li class="paginate-page"><a href="/cacingpincang/films/page/2/">2</a></li>,
 <li class="paginate-page"><a href="/cacingpincang/films/page/3/">3</a></li>,
 <li class="paginate-page"><a href="/cacingpincang/films/page/4/">4</a></li>]

In [6]:
if len(li_pagination) == 0:
  # this is when there's only one page
  ul = soup.find("ul", {"class": "poster-list"})
  if (ul != None):
    movies = ul.find_all("li")
else:
  # this is when there's more than one page
  for i in range(len(li_pagination)):
    url = DOMAIN + "/" + username + "/films/page/" + str(i+1)
    url_page = requests.get(url)
    soup = BeautifulSoup(url_page.content, 'html.parser')
    ul = soup.find("ul", {"class": "poster-list"})
    if (ul != None):
      movies = ul.find_all("li")
      print("the first movie of page {} is {}".format(i+1, movies[0].find('img')['alt']))

the first movie of page 1 is A Man Called Otto
the first movie of page 2 is Take the Ball, Pass the Ball
the first movie of page 3 is 3 Idiots
the first movie of page 4 is Reservoir Dogs


In [7]:
# we will store the data to a pandas dataframe
# so the final function will be like this

def scrape_films(username):
    movies_dict = {}
    movies_dict['id'] = []
    movies_dict['title'] = []
    movies_dict['rating'] = []
    movies_dict['liked'] = []
    movies_dict['link'] = []
    url = DOMAIN + "/" + username + "/films/"
    url_page = requests.get(url)
    soup = BeautifulSoup(url_page.content, 'html.parser')
    
    # check number of pages
    li_pagination = soup.findAll("li", {"class": "paginate-page"})
    if len(li_pagination) == 0:
        ul = soup.find("ul", {"class": "poster-list"})
        if (ul != None):
            movies = ul.find_all("li")
            for movie in movies:
                movies_dict['id'].append(movie.find('div')['data-film-id'])
                movies_dict['title'].append(movie.find('img')['alt'])
                movies_dict['rating'].append(transform_ratings(movie.find('p', {"class": "poster-viewingdata"}).get_text().strip()))
                movies_dict['liked'].append(movie.find('span', {'class': 'like'})!=None)
                movies_dict['link'].append(movie.find('div')['data-target-link'])
    else:
        for i in range(len(li_pagination)):
            url = DOMAIN + "/" + username + "/films/page/" + str(i+1)
            url_page = requests.get(url)
            soup = BeautifulSoup(url_page.content, 'html.parser')
            ul = soup.find("ul", {"class": "poster-list"})
            if (ul != None):
                movies = ul.find_all("li")
                for movie in movies:
                    movies_dict['id'].append(movie.find('div')['data-film-id'])
                    movies_dict['title'].append(movie.find('img')['alt'])
                    movies_dict['rating'].append(transform_ratings(movie.find('p', {"class": "poster-viewingdata"}).get_text().strip()))
                    movies_dict['liked'].append(movie.find('span', {'class': 'like'})!=None)
                    movies_dict['link'].append(movie.find('div')['data-target-link'])
    
    df_film = pd.DataFrame(movies_dict)    
    return df_film

In [8]:
# let's try it
my_film = scrape_films("cacingpincang")

In [9]:
my_film.head()

Unnamed: 0,id,title,rating,liked,link
0,842221,A Man Called Otto,4.0,False,/film/a-man-called-otto/
1,242285,Puss in Boots: The Last Wish,5.0,True,/film/puss-in-boots-the-last-wish/
2,521323,The Menu,4.0,False,/film/the-menu-2022/
3,586723,Glass Onion: A Knives Out Mystery,4.0,False,/film/glass-onion-a-knives-out-mystery/
4,598882,The Banshees of Inisherin,5.0,True,/film/the-banshees-of-inisherin/


## Checkpoint 2
- It worked!!

# Creating Other Functions
- Now we will scrape the friends list with similar approach, but for efficiency I will directly give the final function
- In this case, I will give 4 different interpretation of 'friends', (1) following, (2) followers, (3) union of following and followers, and (4) mutual/intersection of following and followers

In [10]:
def list_friends(username, ftype='following'):
    friends_list = []
    if ((ftype == 'following') | (ftype == 'followers')):
        url = DOMAIN + "/" + username + "/{0}/".format(ftype)
        while True:
            url_page = requests.get(url)
            soup = BeautifulSoup(url_page.content, 'html.parser')
            friends = soup.findAll('div', {'class':'person-summary'})

            for friend in friends:
                username_b = friend.find('a', {'class':'avatar'})['href'].replace('/','')
                friends_list.append(username_b)

            # check if there's next page
            if soup.find('a', {'class':'next'}) is None:
                break
            else:
                url = DOMAIN + soup.find('a', {'class':'next'})['href']
    elif (ftype == 'both'):
        url = DOMAIN + "/" + username + "/following/"
        while True:
            url_page = requests.get(url)
            soup = BeautifulSoup(url_page.content, 'html.parser')
            friends = soup.findAll('div', {'class':'person-summary'})

            for friend in friends:
                username_b = friend.find('a', {'class':'avatar'})['href'].replace('/','')
                friends_list.append(username_b)

            # check if there's next page
            if soup.find('a', {'class':'next'}) is None:
                break
            else:
                url = DOMAIN + soup.find('a', {'class':'next'})['href']
        url = DOMAIN + "/" + username + "/followers/"
        while True:
            url_page = requests.get(url)
            soup = BeautifulSoup(url_page.content, 'html.parser')
            friends = soup.findAll('div', {'class':'person-summary'})

            for friend in friends:
                username_b = friend.find('a', {'class':'avatar'})['href'].replace('/','')
                friends_list.append(username_b)

            # check if there's next page
            if soup.find('a', {'class':'next'}) is None:
                break
            else:
                url = DOMAIN + soup.find('a', {'class':'next'})['href']
        friends_list = list(dict.fromkeys(friends_list))
    elif (ftype == 'mutual'):
        following_list = []
        url = DOMAIN + "/" + username + "/following/"
        while True:
            url_page = requests.get(url)
            soup = BeautifulSoup(url_page.content, 'html.parser')
            friends = soup.findAll('div', {'class':'person-summary'})

            for friend in friends:
                username_b = friend.find('a', {'class':'avatar'})['href'].replace('/','')
                following_list.append(username_b)

            # check if there's next page
            if soup.find('a', {'class':'next'}) is None:
                break
            else:
                url = DOMAIN + soup.find('a', {'class':'next'})['href']
        followers_list = []
        url = DOMAIN + "/" + username + "/followers/"
        while True:
            url_page = requests.get(url)
            soup = BeautifulSoup(url_page.content, 'html.parser')
            friends = soup.findAll('div', {'class':'person-summary'})

            for friend in friends:
                username_b = friend.find('a', {'class':'avatar'})['href'].replace('/','')
                followers_list.append(username_b)

            # check if there's next page
            if soup.find('a', {'class':'next'}) is None:
                break
            else:
                url = DOMAIN + soup.find('a', {'class':'next'})['href']
        for following in following_list:
            if following in followers_list:
                friends_list.append(following)
    return friends_list

In [11]:
my_friends = list_friends('cacingpincang', 'mutual')

In [12]:
my_friends[:10]

['jigsaw123',
 'ashleewilliams',
 'sefitofransgiox',
 'alamledp',
 'imajinasi',
 'vonnysimarmata',
 'ajinur',
 'mukhlis',
 'rez4',
 'perryshadow']

## Checkpoint 3
- Now that we could get the username of our friends, we will use the previous function to get our friends' movies
- We will also compare our friends' movies to ours so we need a function that defines similarity score
- Lastly, we need a function to do comparison and summarize it in a table

## My Simple Similarity Scoring System (the index_score column)
- If you both like/don't like the movie and give the same ratings, the score is perfect (in this case: 2) it means you have the same exact opinion on that movie
- If you both like the movie and give different ratings, the score is 2 (you both like it) minus the rating points difference divided by 5 (the range of the rating)
- If you both don't like the movie and give different ratings, the score is 1 minus the rating points difference divided by 5 (the range of the rating)

And the final score is similarity score multiplied by number of movies both have rated
### why multiply it with number of movies?
my assumption is if I have very similar rating and liking behavior with my friend, but we watch different movies then we're not really close because we have different references

In [13]:
def score_index(rating_x, liked_x, rating_y, liked_y):
    score = 0.0
    if ((rating_x == rating_y) & (liked_x == liked_y)):
        score = 2.0
    # both like but different ratings
    elif ((liked_x == True) & (liked_x == liked_y)):
        score = 2.0-(abs(rating_x-rating_y)/5)
    else:
        score = 1.0-(abs(rating_x-rating_y)/5)
    return score


def compare_ratings_friends(username_a, df_a, username_b, df_b):
    
    # movies they both liked
    df_liked = pd.merge(df_a[['id', 'title', 'link', 'liked']], df_b[['id', 'liked']])
    df_liked = df_liked[df_liked['liked']==True].reset_index(drop=True)
    
    # movies they gave same rate
    df_same = pd.merge(df_a[['id', 'title', 'rating']], df_b[['id', 'rating']])
    
    # movies they gave different rate
    df_different = pd.merge(df_a[['id', 'title', 'rating']], df_b[['id', 'rating']], how='inner', on='id')
    df_different = df_different[df_different['rating_x']!=df_different['rating_y']].reset_index(drop=True)
    df_different['difference'] = df_different['rating_x']-df_different['rating_y']
    df_different['difference_abs'] = abs(df_different['rating_x']-df_different['rating_y'])
    df_different = df_different.rename(columns={'rating_x': 'rating_{0}'.format(username_a), 'rating_y': 'rating_{0}'.format(username_b)})
    
    # calculate index
    df_merge = pd.merge(df_a, df_b, on = ['id', 'title'])
    if len(df_merge) > 0:
        df_merge['score'] = df_merge.apply(lambda row: score_index(row['rating_x'], row['liked_x'],
                                                                   row['rating_y'], row['liked_y']),
                                           axis = 1)
        index = df_merge['score'].sum()/(2*len(df_merge))
    else:
        index = 0
    return df_liked, df_same, df_different, index

def scrape_friends(username, friends_list, limit=20):
    df_a = scrape_films(username)
    df_a = df_a[df_a['rating']!=-1].reset_index(drop=True)
    
    friends_dict = {}
    friends_dict['username'] = []
    friends_dict['index_score'] = []
    friends_dict['no_of_movies'] = []
    
    friends_data = {}
    for username_b in friends_list:
        print('scraping for '+username_b)
        df_b = scrape_films(username_b)
        df_b = df_b[df_b['rating']!=-1].reset_index(drop=True)
        no_of_movies = len(pd.merge(df_a[['id']], df_b[['id']]))
        # limit is the minimum number of movies you and your friend both rated
        # if number of movies you both have rated is less than the limit, then we won't calculate similarity
        if no_of_movies >= limit:
            friends_dict['username'].append(username_b)
            print('comparing for '+username_b)
            df_liked, df_same, df_different, index = compare_ratings_friends(username, df_a, username_b, df_b)
            friends_dict['index_score'].append(index)
            friends_dict['no_of_movies'].append(no_of_movies)
            friends_data[username_b] = {}
            friends_data[username_b]['df_b'] = df_b
            friends_data[username_b]['df_liked'] = df_liked
            friends_data[username_b]['df_same'] = df_same
            friends_data[username_b]['df_different'] = df_different

    df_friends = pd.DataFrame(friends_dict)
    # the final score of friends is similarity*number of movies you both have watched
    # why multiply it with number of movies?
    # my assumption is if I have very similar rating and liking behavior with my friend,
    # but we watch different movies then we're not really close because we have different references
    df_friends['total_index'] = df_friends['index_score']*df_friends['no_of_movies']
    return df_friends, friends_data, df_a

In [14]:
df_friends, friends_data, df_a = scrape_friends('cacingpincang', my_friends)

scraping for jigsaw123
scraping for ashleewilliams
scraping for sefitofransgiox
scraping for alamledp
comparing for alamledp
scraping for imajinasi
scraping for vonnysimarmata
comparing for vonnysimarmata
scraping for ajinur
comparing for ajinur
scraping for mukhlis
comparing for mukhlis
scraping for rez4
scraping for perryshadow
comparing for perryshadow
scraping for vitaminc1
comparing for vitaminc1
scraping for wetuburubur
scraping for omarlae2106
comparing for omarlae2106
scraping for brendonyu668
comparing for brendonyu668
scraping for bluewhale307
scraping for ccj6529
comparing for ccj6529
scraping for katekidz
comparing for katekidz
scraping for jenniferrx
comparing for jenniferrx
scraping for inanamaa
comparing for inanamaa
scraping for asddljhgfh
comparing for asddljhgfh
scraping for outkastatlast
scraping for christine2597
comparing for christine2597
scraping for jordlyphe
comparing for jordlyphe
scraping for resulonrsl
comparing for resulonrsl
scraping for txmnxwton
comparin



comparing for zandar
scraping for hjc2244
comparing for hjc2244
scraping for skipperjens
comparing for skipperjens
scraping for matthewzeitoun
scraping for kikyrahmannisa
comparing for kikyrahmannisa
scraping for djdoublem3
comparing for djdoublem3
scraping for rom0618
comparing for rom0618


In [15]:
# let's see my top 10 friends
df_friends.sort_values('total_index', ascending=False).head(10)

Unnamed: 0,username,index_score,no_of_movies,total_index
7,brendonyu668,0.641509,53,34.0
5,vitaminc1,0.596364,55,32.8
9,katekidz,0.573585,53,30.4
26,rom0618,0.571277,47,26.85
22,hjc2244,0.489815,54,26.45
17,laythtbaileh,0.62561,41,25.65
11,inanamaa,0.558889,45,25.15
0,alamledp,0.610256,39,23.8
3,mukhlis,0.5475,40,21.9
13,christine2597,0.4875,44,21.45


## Checkpoint 4
- Now that I know **brendonyu668** is my closest friend, I will try to look more into his highly rated movies hoping that I will probably like those movies too because we have similar taste

In [26]:
# movies that I and brendonyu668 liked
friends_data['brendonyu668']['df_liked']

Unnamed: 0,id,title,link,liked
0,598882,The Banshees of Inisherin,/film/the-banshees-of-inisherin/,True
1,474474,Everything Everywhere All at Once,/film/everything-everywhere-all-at-once/,True
2,348914,The Batman,/film/the-batman/,True
3,369835,The Suicide Squad,/film/the-suicide-squad/,True
4,510047,Promising Young Woman,/film/promising-young-woman/,True
5,291610,"Three Billboards Outside Ebbing, Missouri",/film/three-billboards-outside-ebbing-missouri/,True
6,353117,Get Out,/film/get-out-2017/,True
7,52516,Django Unchained,/film/django-unchained/,True
8,45409,Shutter Island,/film/shutter-island/,True
9,40100,The Hangover,/film/the-hangover/,True


In [28]:
# movies that brendonyu668 liked
friends_data['brendonyu668']['df_b'][friends_data['brendonyu668']['df_b']['liked'] == True].head(10)

Unnamed: 0,id,title,rating,liked,link
1,484263,Guillermo del Toro's Pinocchio,4.5,True,/film/guillermo-del-toros-pinocchio/
2,521323,The Menu,4.0,True,/film/the-menu-2022/
3,586723,Glass Onion: A Knives Out Mystery,4.0,True,/film/glass-onion-a-knives-out-mystery/
4,598882,The Banshees of Inisherin,5.0,True,/film/the-banshees-of-inisherin/
12,293465,Top Gun: Maverick,5.0,True,/film/top-gun-maverick/
14,565852,The Northman,4.5,True,/film/the-northman/
16,474474,Everything Everywhere All at Once,5.0,True,/film/everything-everywhere-all-at-once/
18,348914,The Batman,4.5,True,/film/the-batman/
21,441471,West Side Story,4.5,True,/film/west-side-story-2021/
23,466291,"tick, tick...BOOM!",4.5,True,/film/tick-tick-boom-2021/


In [16]:
# let's see my bottom 10 friends
df_friends.sort_values('total_index').head(10)

Unnamed: 0,username,index_score,no_of_movies,total_index
25,djdoublem3,0.480952,21,10.1
1,vonnysimarmata,0.565,20,11.3
6,omarlae2106,0.421667,30,12.65
23,skipperjens,0.607143,21,12.75
18,ikiefriandi,0.593182,22,13.05
12,asddljhgfh,0.578261,23,13.3
24,kikyrahmannisa,0.593478,23,13.65
20,theorollason,0.503448,29,14.6
8,ccj6529,0.551724,29,16.0
10,jenniferrx,0.543333,30,16.3


# Creating Functions to Rank Movies
- Notice that the function we created before stores the movie list of our friends inside friends_data, we need these to make movie recommendations
- The main goal of this simple recommendation system is finding movie that's been rated and liked by our friends, our friends gave good rating on that movie, and if our closer friends rated that movies highly, it should have higher score

4 attributes to be considered when ranking movies
1. Number of friends that have rated the movie
2. Number of friends that liked the movie
3. Ratings given by our friends on the movie
4. Friends index score so our closer friends' favorite movies will be ranked higher

In [17]:
def recommend_movies(df_friends, friends_data, df_a):
    df_movies = pd.DataFrame()

    # adding friends index score
    for i in friends_data.keys():
        df_friend_movies = friends_data[i]['df_b'].copy()
        df_friend_movies['friends_score'] = df_friends[df_friends['username'] == i]['total_index'].values[0]
        df_movies = pd.concat([df_movies, df_friend_movies])
        
    df_no_of_rate = pd.DataFrame(df_movies.id.value_counts()).reset_index()
    df_no_of_rate.rename({'index':'id', 'id':'no_of_rate'}, axis='columns', inplace=True)
    
    df_recom = df_movies.groupby(['id', 'title', 'link']).agg({'rating':'mean', 'liked':'sum', 'friends_score':'mean'})
    df_recom = df_recom.reset_index()
    df_recom = pd.merge(df_recom, df_no_of_rate)
    
    # excluding movies that we already watched
    df_recom = pd.merge(df_recom, df_a[['id']], how="outer", indicator=True)
    df_recom = df_recom[df_recom['_merge'] == 'left_only']
    del df_recom['_merge']

    # in this case, i use weights on the 4 attributes
    # you can customize the weights yourself, r_w is rating weight, l_w is liked weight, fs_w is friends score weight, nor_w is number of friends that rated the movie weight
    # i make the nor_w to 0 because I already use 'total_index' which is similarity score * number of friends that rated the movie, so basically i only use 3 of 4 attributes available
    # you can formulate your own calculation here
    r_w = 6
    l_w = 3
    fs_w = 2
    nor_w = 0
    df_recom['index'] = r_w/5*df_recom['rating']+l_w*df_recom['liked']/df_recom['liked'].max()+fs_w*df_recom['friends_score']/df_recom['friends_score'].max()+nor_w*df_recom['no_of_rate']/df_recom['no_of_rate'].max()
    return df_recom

In [18]:
# let's try it
df_recom = recommend_movies(df_friends, friends_data, df_a)

In [19]:
# let's see my top 10 recommendations
df_recom.sort_values('index', ascending=False).head(10)

Unnamed: 0,id,title,link,rating,liked,friends_score,no_of_rate,index
702,293465,Top Gun: Maverick,/film/top-gun-maverick/,4.607143,11.0,21.121429,14.0,9.771008
525,251943,Spider-Man: Into the Spider-Verse,/film/spider-man-into-the-spider-verse/,4.5625,10.0,21.55,16.0,9.46992
1010,371378,Dune,/film/dune-2021/,4.175,11.0,19.8625,20.0,9.178382
233,171384,Whiplash,/film/whiplash-2014/,4.5,8.0,22.240909,11.0,8.890107
1300,422682,Marriage Story,/film/marriage-story-2019/,4.25,9.0,22.404167,12.0,8.872438
389,216086,The Handmaiden,/film/the-handmaiden/,4.916667,5.0,25.691667,6.0,8.774911
35,114564,Her,/film/her/,4.208333,8.0,22.858333,12.0,8.576426
2364,51896,The Dark Knight,/film/the-dark-knight/,4.555556,6.0,24.2,9.0,8.52656
1387,433863,The Lighthouse,/film/the-lighthouse-2019/,4.3,8.0,19.59,15.0,8.494171
954,359859,Roma,/film/roma-2018/,4.6,6.0,21.35,10.0,8.412246


## Checkpoint 5
- Looks like I need to watch Top Gun: Maverick because because it my friends rated it very highly, 11 of 14 liked it
- I will try to explore more with advanced queries, for example I want movies that have index higher than 7.5, but unpopular (rated by not more than 7 but more than 2 friends)

In [24]:
df_recom[(df_recom['index']>7.5) & (df_recom['no_of_rate'] <= 8) & (df_recom['no_of_rate'] > 2)].sort_values('index', ascending=False).head(10)

Unnamed: 0,id,title,link,rating,liked,friends_score,no_of_rate,index
389,216086,The Handmaiden,/film/the-handmaiden/,4.916667,5.0,25.691667,6.0,8.774911
2457,532082,Chernobyl,/film/chernobyl/,4.75,5.0,21.975,6.0,8.356283
1451,441471,West Side Story,/film/west-side-story-2021/,4.75,5.0,20.408333,6.0,8.264127
3007,668077,The Queen's Gambit,/film/the-queens-gambit/,4.25,6.0,23.34375,8.0,8.109525
3109,695473,CODA,/film/coda-2021/,4.4375,5.0,20.45625,8.0,7.891945
2366,51902,Akira,/film/akira/,4.75,3.0,23.2875,4.0,7.888035
2443,527671,The Father,/film/the-father-2020/,4.357143,5.0,22.0,7.0,7.886325
743,310705,The Favourite,/film/the-favourite/,4.25,5.0,23.8875,8.0,7.868783
2351,51858,Amélie,/film/amelie/,4.666667,3.0,24.616667,3.0,7.866221
903,350384,The Big Sick,/film/the-big-sick/,4.833333,3.0,20.816667,3.0,7.842692


Voila! I will probably watch The Handmaiden next since it has an average ratings of 4.91 from 6 friends, with 5 likes

# Deployment with Streamlit
I also deployed these functions to Streamlit web app, you can access it on https://letterboxd-friends-ranker.streamlit.app
**Thank you!**