# Netflix recommendations

<div class="alert alert-block alert-warning">
Replace <code>YOUR_GITHUB_TOKEN</code> in the install script. To get your token follow the instructions in the <a href="../README.md">README.md</a>
</div>

## Boilerplate

In [None]:
%pip install gcsfs pandas
%pip install  'https://us-central1-data-359211.cloudfunctions.net/github-proxy/superlinked-1.11.3-py3-none-any.whl?token=YOUR_GITHUB_TOKEN'

In [1]:
import os
os.chdir("../../..")

## Imports and constants

In [2]:
import pandas as pd

from datetime import timedelta

from superlinked.evaluation.charts.recency_plotter import RecencyPlotter
from superlinked.framework.common.dag.period_time import PeriodTime
from superlinked.framework.common.schema.schema import schema
from superlinked.framework.common.schema.schema_object import String, Timestamp
from superlinked.framework.common.schema.id_schema_object import IdField
from superlinked.framework.common.parser.dataframe_parser import DataFrameParser
from superlinked.framework.dsl.executor.in_memory.in_memory_executor import InMemoryExecutor, InMemoryApp
from superlinked.framework.dsl.index.index import Index
from superlinked.framework.dsl.query.param import Param
from superlinked.framework.dsl.query.query import Query
from superlinked.framework.dsl.query.result import Result
from superlinked.framework.dsl.source.in_memory_source import InMemorySource
from superlinked.framework.dsl.space.text_similarity_space import TextSimilaritySpace
from superlinked.framework.dsl.space.recency_space import RecencySpace

In [3]:
MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2"
YEAR_IN_DAYS = 365
TOP_N = 10
DATASET_URL = "https://storage.googleapis.com/superlinked-notebook-netflix-shows-dataset/titles.csv"

## Explore dataset

In [4]:
movie_df = pd.read_csv(DATASET_URL)
movie_df = movie_df[["description", "genres", "title", "release_year"]].drop_duplicates(subset=["description"]).dropna(how='any')
movie_df['id'] = movie_df["description"].map(hash)
movie_df["genres"] = movie_df["genres"].apply(lambda x: " ".join(eval(x)))
movie_df["timestamp"] = [pd.Timestamp(year=year, month=1, day=1).timestamp() for year in movie_df["release_year"].tolist()]
movie_df.head()

Unnamed: 0,description,genres,title,release_year,id,timestamp
0,This collection includes 12 World War II-era p...,documentation,Five Came Back: The Reference Films,1945,6505068203725275578,-788918400.0
1,A mentally unstable Vietnam War veteran works ...,drama crime,Taxi Driver,1976,-1613392425401971389,189302400.0
2,Intent on seeing the Cahulawassee River before...,drama action thriller european,Deliverance,1972,-2809990181430763402,63072000.0
3,"King Arthur, accompanied by his squire, recrui...",fantasy action comedy,Monty Python and the Holy Grail,1975,2024336573761082562,157766400.0
4,12 American military prisoners in World War II...,war action,The Dirty Dozen,1967,3346870346573015289,-94694400.0


## Set up Superlinked

In [5]:
@schema
class MovieSchema:
    description: String
    title: String
    release_timestamp: Timestamp
    genres: String
    id: IdField

In [6]:
movie = MovieSchema()

In [7]:
description_space = TextSimilaritySpace(text=movie.description, model=MODEL)
title_space = TextSimilaritySpace(text=movie.title, model=MODEL)
genre_space = TextSimilaritySpace(text=movie.genres, model=MODEL)
recency_space = RecencySpace(timestamp=movie.release_timestamp, period_time_list=[
    PeriodTime(timedelta(days=4 * YEAR_IN_DAYS)), 
    PeriodTime(timedelta(days=10 * YEAR_IN_DAYS)), 
    PeriodTime(timedelta(days=40 * YEAR_IN_DAYS))],
    negative_filter=-0.25)

In [8]:
movie_index = Index(spaces=[description_space, title_space, genre_space, recency_space])

In [9]:
query_text_param = Param("query_text")

simple_query = (
    Query(movie_index, weights={
        description_space: Param("description_weight"),
        title_space: Param("title_weight"),
        genre_space: Param("genre_weight"),
        recency_space: Param("recency_weight")
    })
    .find(movie)
    .similar(description_space.text, query_text_param)
    .similar(title_space.text, query_text_param)
    .similar(genre_space.text, query_text_param)
)

advanced_query = (
    Query(movie_index, weights={
        description_space: Param("description_weight"),
        title_space: Param("title_weight"),
        genre_space: Param("genre_weight"),
        recency_space: Param("recency_weight")
    })
    .find(movie)
    .similar(description_space.text, Param("description_query_text"))
    .similar(title_space.text, Param("title_query_text"))
    .similar(genre_space.text, Param("genre_query_text"))
)

In [10]:
df_parser = DataFrameParser(schema=movie, mapping={movie.release_timestamp: "timestamp"})

In [11]:
source: InMemorySource = InMemorySource(movie, parser=df_parser)
executor: InMemoryExecutor = InMemoryExecutor(sources=[source], indices=[movie_index])
app: InMemoryApp = executor.run()

This next one might take several minutes to run. Getting a coffee or water, or doing a quick planking workout is advised.

In [12]:
source.put([movie_df])

## Understanding recency

Recency can seem quite complex at first, let's see how the score looks like for the relevant time periods

In [13]:
recency_plotter = RecencyPlotter(recency_space)
chart = recency_plotter.plot_recency_curve()
chart

notice the breaks in the score at 4, 10 and 40 years - those are our period times. Titles older than 40 years get `negative_filter` score.

## Run queries

### Helpers

In [14]:
KEEPCOLS = ["description", "genres", "title", "release_year", "order"]

def get_ordered_result_tuples(result: Result, top_n: int) -> list[tuple[int]]:
    return [(i+1, int(entity.id_.object_id)) for i, entity in enumerate(result.entities[:top_n])]

def get_movies_by_id_list(id_list_tuple: list[tuple[int]], df: pd.DataFrame, keepcols: list[str] | None = None) -> pd.DataFrame:
    if keepcols is None:
        keepcols = list(KEEPCOLS)
    if df.index.name != "id":
        df = df.set_index("id")
    result_df = df.loc[[id_tuple[1] for id_tuple in id_list_tuple]]
    result_df["order"] = [id_tuple[0] for id_tuple in id_list_tuple]
    return result_df[keepcols].reset_index(drop=True).set_index("order")

def parse_results(result: Result, df: pd.DataFrame, top_n: int = TOP_N) -> pd.DataFrame:
    id_tuples = get_ordered_result_tuples(result=result, top_n=top_n)
    return get_movies_by_id_list(id_list_tuple=id_tuples, df=df)

### Queries

With the simple query, I can search with my text in all of the fields

In [15]:
result: Result = app.query(
    simple_query,
    query_text="Heartfelt romantic comedy",
    description_weight=1,
    title_weight=1,
    genre_weight=1,
    recency_weight=0
)

In [16]:
parse_results(result, movie_df, 10)

Unnamed: 0_level_0,description,genres,title,release_year
order,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,When a group of people meets at the same party...,comedy romance,"Love, Surreal and Odd",2017
2,'Love Actually' follows the lives of eight ver...,drama comedy romance,Love Actually,2003
3,A young woman develops romantic feelings for h...,comedy romance,Must Be... Love,2013
4,Romantic anthology web series revolving around...,drama romance,Love Daily,2018
5,"Love You, is a 2011 Taiwanese drama starring J...",comedy drama romance,Drunken to Love You,2011
6,Laida Magtalas is a modern-day Belle who works...,comedy drama romance,A Very Special Love,2008
7,It tells the love story of two childhood sweet...,comedy romance,A Love So Beautiful,2017
8,Christina's love life is stuck in neutral. Aft...,comedy romance,The Sweetest Thing,2002
9,Mike Birbiglia shares a lifetime of romantic b...,comedy documentation romance,Mike Birbiglia: My Girlfriend's Boyfriend,2013
10,"An LA girl, unlucky in love, falls for an East...",romance comedy,Love Hard,2021


After looking at the results, I see some titles I have already seen. I can bias towards recent titles by upweighting recency. Weights are normalised to have unit sum, so you don't have to worry about how you set them.

In [17]:
result: Result = app.query(
    simple_query,
    query_text="Heartfelt romantic comedy",
    description_weight=1,
    title_weight=1,
    genre_weight=1,
    recency_weight=1
)

In [18]:
parse_results(result, movie_df, 10)

Unnamed: 0_level_0,description,genres,title,release_year
order,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,"In this romantic comedy, several friends, each...",comedy romance,F*ck Love Too,2022
2,"Fidelity tells a story of marital fidelity, in...",drama romance,"Devotion, a Story of Love and Desire",2022
3,This black humor pan-Arabic anthology series i...,comedy drama romance,"Love, Life & Everything in Between",2022
4,"An LA girl, unlucky in love, falls for an East...",romance comedy,Love Hard,2021
5,Aspiring pop star Erica ends up as the enterta...,romance comedy,Resort to Love,2021
6,Exploration into the tense relationship of suc...,romance,Love or Money,2021
7,A relatable romance drama about a couple in th...,drama romance,Welcome to Wedding Hell,2022
8,When a group of people meets at the same party...,comedy romance,"Love, Surreal and Odd",2017
9,An ad executive and a fashion designer-blogger...,comedy romance,Love Tactics,2022
10,Often (mis)guided by a cheeky imaginary wizard...,comedy romance,Eternally Confused and Eager for Love,2022


Still using the simple query, I can give more weight to spaces if I think my query is more related to that space - matches there should count more. Here I give additional weight to the genre, leave the description as is, and downweight the title as my query text is mostly a genre with some additional context. I keep recency with unit weight too, as I would like my results to be a bit biased towards recent movies.

In [19]:
result = app.query(
    simple_query,
    query_text="Heartfelt romantic comedy",
    description_weight=1,
    title_weight=0.1,
    genre_weight=2,
    recency_weight=1
)

In [20]:
parse_results(result, movie_df, 10)

Unnamed: 0_level_0,description,genres,title,release_year
order,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,"In this romantic comedy, several friends, each...",comedy romance,F*ck Love Too,2022
2,"An LA girl, unlucky in love, falls for an East...",romance comedy,Love Hard,2021
3,A relatable romance drama about a couple in th...,drama romance,Welcome to Wedding Hell,2022
4,This black humor pan-Arabic anthology series i...,comedy drama romance,"Love, Life & Everything in Between",2022
5,In this rom-com challenging the concept of sou...,comedy romance,Four to Dinner,2022
6,Short films follow young adults as they naviga...,drama romance comedy,Feels Like Ishq,2021
7,Todd and Rory are intellectual soul mates. He ...,comedy romance,Straight Up,2019
8,Guille decides it's time to take the next step...,comedy romance,"Let's Tie the Knot, Honey!",2022
9,Incurable romantic Lotte's life is upended whe...,comedy romance,Just Say Yes,2021
10,"In pursuit of both success and validation, a g...",romance comedy,Slay,2021


With the advanced query, I can even supply different search terms for each attribute of the movie.

In [21]:
result = app.query(
    advanced_query,
    description_query_text="Heartfelt lovely romantic comedy for a cold autumn evening.",
    title_query_text="love",
    genre_query_text="drama comedy romantic",
    description_weight=1,
    title_weight=1,
    genre_weight=1,
    recency_weight=0
)

In [22]:
parse_results(result, movie_df, 10)

Unnamed: 0_level_0,description,genres,title,release_year
order,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,'Love Actually' follows the lives of eight ver...,drama comedy romance,Love Actually,2003
2,Rebellious Mickey and good-natured Gus navigat...,comedy drama romance,Love,2016
3,A rising black painter tries to break into a c...,romance drama,Really Love,2020
4,"An LA girl, unlucky in love, falls for an East...",romance comedy,Love Hard,2021
5,Romantic anthology web series revolving around...,drama romance,Love Daily,2018
6,Adam and Marklin’s 5-year relationship has gon...,comedy drama romance,Almost Love,2019
7,Laida Magtalas is a modern-day Belle who works...,comedy drama romance,A Very Special Love,2008
8,Two young kids fall in love with each other. B...,romance drama,Endless Love,1981
9,Love is as tough as it is sweet for a lovestru...,comedy drama,A Love So Beautiful,2020
10,A young woman develops romantic feelings for h...,comedy romance,Must Be... Love,2013


And even give different weights to each subsearch if I really do care that the title is related to love but I am not really emotionally attached to my description of the movie I would want to see.

In [23]:
result = app.query(
    advanced_query,
    description_query_text="Heartfelt lovely romantic comedy for a cold autumn evening.",
    title_query_text="love",
    genre_query_text="drama comedy romantic",
    description_weight=0.2,
    title_weight=3,
    genre_weight=1,
    recency_weight=0
)

In [24]:
parse_results(result, movie_df, 10)

Unnamed: 0_level_0,description,genres,title,release_year
order,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Rebellious Mickey and good-natured Gus navigat...,comedy drama romance,Love,2016
2,'Love Actually' follows the lives of eight ver...,drama comedy romance,Love Actually,2003
3,The story of a family and the various situatio...,thriller drama,Love,2020
4,A rising black painter tries to break into a c...,romance drama,Really Love,2020
5,Adam and Marklin’s 5-year relationship has gon...,comedy drama romance,Almost Love,2019
6,Two young kids fall in love with each other. B...,romance drama,Endless Love,1981
7,"The story of Richard and Mildred Loving, an in...",drama romance,Loving,2016
8,Laida Magtalas is a modern-day Belle who works...,comedy drama romance,A Very Special Love,2008
9,"Love, Now is a 72 episode Taiwanese idol roman...",drama,"Love, Now",2012
10,Ian Montes is a picture of success. Despite be...,drama romance,A Love Story,2007


Then I can bias again towards recent movies

In [25]:
result = app.query(
    advanced_query,
    description_query_text="Heartfelt lovely romantic comedy for a cold autumn evening.",
    title_query_text="love",
    genre_query_text="drama comedy romantic",
    description_weight=0.2,
    title_weight=3,
    genre_weight=1,
    recency_weight=5
)

In [26]:
parse_results(result, movie_df, 10)

Unnamed: 0_level_0,description,genres,title,release_year
order,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,"After his ad agency goes bankrupt, an indebted...",romance drama,Doom of Love,2022
2,An ad executive and a fashion designer-blogger...,comedy romance,Love Tactics,2022
3,"Fidelity tells a story of marital fidelity, in...",drama romance,"Devotion, a Story of Love and Desire",2022
4,This black humor pan-Arabic anthology series i...,comedy drama romance,"Love, Life & Everything in Between",2022
5,A modern love story set in the near future whe...,scifi comedy romance drama,AI Love You,2022
6,Often (mis)guided by a cheeky imaginary wizard...,comedy romance,Eternally Confused and Eager for Love,2022
7,"An LA girl, unlucky in love, falls for an East...",romance comedy,Love Hard,2021
8,The story of a family and the various situatio...,thriller drama,Love,2020
9,Haruto Asakura falls in love with hairdresser ...,drama romance,Love Like the Falling Petals,2022
10,"Inside a national weather service, love proves...",drama romance,Forecasting Love and Weather,2022


Or maybe to older ones

In [27]:
result = app.query(
    advanced_query,
    description_query_text="Heartfelt lovely romantic comedy for a cold autumn evening.",
    title_query_text="love",
    genre_query_text="drama comedy romantic",
    description_weight=0.2,
    title_weight=3,
    genre_weight=1,
    recency_weight=-10
)

parse_results(result, movie_df, 10)

Unnamed: 0_level_0,description,genres,title,release_year
order,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Two young kids fall in love with each other. B...,romance drama,Endless Love,1981
2,"Anil, a street singer, is humiliated and drive...",drama romance,Disco Dancer,1982
3,An honest man dreams of a better life for his ...,romance crime drama,Ujala,1959
4,Two talented song-and-dance men team up after ...,romance comedy,White Christmas,1954
5,"Brian Cohen is an average young Jewish man, bu...",comedy,Life of Brian,1979
6,Geeky student Arnie Cunningham falls for Chris...,horror thriller european,Christine,1983
7,Maharaj Brajbhan lives a wealthy lifestyle in ...,drama action romance,Bandie,1978
8,Shahjada Ijjat Beg comes to India with his car...,romance drama action,Sohni Mahiwal,1984
9,Two small children and a ship's cook survive a...,romance action drama,The Blue Lagoon,1980
10,"In the 1930s, bored waitress Bonnie Parker fal...",crime drama action,Bonnie and Clyde,1967


notice that every movie before 1984 has the same recency score as our largest period time is 40 years.