<img src = "https://github.com/singlestore-labs/spaces-notebooks/blob/e551e274bb67bb1e5081131ee1150cdba713fc43/common/images/singlestore-jupyter.png?raw=true">

<div id="singlestore-header" style="display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/browser.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Movie Recommender Part 6</h1>
    </div>
</div>

<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Action Required</b></p>
        <p>Select the database from the drop-down menu at the top of this notebook.</p>
    </div>
</div>

In [11]:
%%sql
DROP TABLE IF EXISTS movies_backup;
CREATE TABLE IF NOT EXISTS movies_backup LIKE movies;
ALTER TABLE movies_backup ADD FULLTEXT genres_index (genres);

In [12]:
%%sql
INSERT INTO movies_backup SELECT * FROM movies;
-- UPDATE movies_backup SET genres = REPLACE(genres, '|', ', ');
OPTIMIZE TABLE movies_backup FLUSH;

Table,Op,Msg_type,Msg_text


In [13]:
%%sql
SELECT id, title, genres FROM movies_backup LIMIT 5;

id,title,genres
121,"Boys of St. Vincent, The (1993)",Drama
187,Party Girl (1995),Comedy
498,Mr. Jones (1993),Drama|Romance
516,Renaissance Man (1994),Comedy|Drama|War
690,"Promise, The (Versprechen, Das) (1994)",Romance


# Example Queries

## Find Top Movies for a User Based on Similarity

This query finds movies that are most similar to a particular user based on dot product similarity.

In [14]:
%%sql
SELECT
    m.title,
    m.genres,
    (u.factors <*> m.factors) AS similarity_score
FROM
    users u, movies_backup m
WHERE
    u.id = 1
ORDER BY
    similarity_score DESC
LIMIT 10;

title,genres,similarity_score
Airport (1970),Drama,0.5594869256019592
Center Stage (2000),Drama,0.4920910000801086
"Dark Half, The (1993)",Horror|Mystery,0.4820263087749481
Christmas Vacation (1989),Comedy,0.4356957674026489
Beauty and the Beast (1991),Animation|Children's|Musical,0.4331190288066864
Caddyshack (1980),Comedy,0.4262603521347046
"Love Bug, The (1969)",Children's|Comedy,0.4238982498645782
Willy Wonka and the Chocolate Factory (1971),Adventure|Children's|Comedy|Fantasy,0.4190407991409302
Awakenings (1990),Drama,0.4100484251976013
Hush (1998),Thriller,0.4086280465126037


## Find Top Movies for a User Based on Full-Text

This query finds movies that are most similar to a particular user based on full-text.

In [15]:
%%sql
SELECT
    m.title,
    m.genres,
    MATCH(m.genres) AGAINST ('Thriller -Horror -Sci-Fi') AS score
FROM
    users u, movies_backup m
WHERE
    u.id = 1
    AND MATCH(m.genres) AGAINST ('Thriller -Horror -Sci-Fi')
ORDER BY
    score DESC
LIMIT 10;

title,genres,score
"Assignment, The (1997)",Thriller,1.0
Three Days of the Condor (1975),Thriller,1.0
"In Crowd, The (2000)",Thriller,1.0
Pacific Heights (1990),Thriller,1.0
Turbulence (1997),Thriller,1.0
Foreign Correspondent (1940),Thriller,1.0
Marathon Man (1976),Thriller,1.0
Four Rooms (1995),Thriller,1.0
Switchback (1997),Thriller,1.0
Love and a .45 (1994),Thriller,1.0


## Find Top Movies for a User Based on Similarity and Full-Text

This query finds movies that are most similar to a particular user based on dot product similarity and full-text.

In [16]:
%%sql
SELECT
    m.title,
    m.genres,
    (u.factors <*> m.factors) AS similarity_score
FROM
    users u, movies_backup m
WHERE
    u.id = 1
    AND MATCH(m.genres) AGAINST ('Thriller -Horror -Sci-Fi')
ORDER BY
    similarity_score DESC
LIMIT 10;

title,genres,similarity_score
Hush (1998),Thriller,0.4086280465126037
Ghost (1990),Comedy|Romance|Thriller,0.4045349061489105
What Lies Beneath (2000),Thriller,0.388087660074234
Executive Decision (1996),Action|Thriller,0.366390973329544
Surviving the Game (1994),Action|Adventure|Thriller,0.3404798805713653
"Thomas Crown Affair, The (1968)",Crime|Drama|Thriller,0.3370590806007385
Fatal Attraction (1987),Thriller,0.3334504663944244
Mulholland Falls (1996),Crime|Film-Noir|Thriller,0.3333781063556671
Disclosure (1994),Drama|Thriller,0.3146659135818481
I Saw What You Did (1965),Thriller,0.3138026297092438


<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Action Required</b></p>
        <p>Select the database from the drop-down menu at the top of this notebook. It updates the <b>connection_url</b> which is used by SQLAlchemy to make connections to the selected database.</p>
    </div>
</div>

In [17]:
from sqlalchemy import *

db_connection = create_engine(connection_url)

In [18]:
import pandas as pd

df = pd.read_sql("""
SELECT
    m.title,
    m.poster,
    (u.factors <*> m.factors) AS similarity_score
FROM
    users u
CROSS JOIN
    movies_backup m
WHERE
    u.id = 1
    AND MATCH(m.genres) AGAINST ('Thriller -Horror -Sci-Fi')
ORDER BY
    similarity_score DESC
LIMIT 10;
""", con = db_connection)

In [19]:
# Display movie recommendations with posters
from IPython.display import display, HTML

# Function to generate HTML img tag for each row
def display_image(url):
    return f'<img src = "{url}" width = "100">'

# Generate HTML content for table with outline and text/images for posters
html_table = []
html_table.append('<table style = "border-collapse: collapse; width: 100%; border: 1px solid #ddd;">')
html_table.append('<tr style = "text-align: left; border-bottom: 1px solid #ddd;">')
html_table.append('<th style = "padding: 10px; border-right: 1px solid #ddd;">Title</th>')
html_table.append('<th style = "padding: 10px;">Poster</th>')
html_table.append('</tr>')

# Iterate over DataFrame rows to populate table rows
for i in range(len(df)):
    html_table.append('<tr>')
    html_table.append(f'<td style = "padding: 10px; border-right: 1px solid #ddd;">{df["title"].iloc[i]}</td>')
    html_table.append(f'<td style = "padding: 10px;">{display_image(df["poster"].iloc[i])}</td>')
    html_table.append('</tr>')

html_table.append('</table>')

display(HTML(''.join(html_table)))

Title,Poster
Hush (1998),
Ghost (1990),
What Lies Beneath (2000),
Executive Decision (1996),
Surviving the Game (1994),
"Thomas Crown Affair, The (1968)",
Fatal Attraction (1987),
Mulholland Falls (1996),
Disclosure (1994),
I Saw What You Did (1965),
