# Visual RecSys for Streaming Platforms

Visual similarity recommendation refers to the process of suggesting items or content based on their visual similarity to a reference item. This type of recommendation system is commonly used in various domains, such as e-commerce, image search engines, and content recommendation platforms.

Methodology:-
1. Data Collection
2. Feature Extraction
3. Similarity Calculation
4. Ranking and Recommendation

## 1.Problem
To recommend movie posters from the dataset of movies posters given an image of the movie poster.

## 2.Data Collection
We are using the dataset taken from the Kaggle
https://www.kaggle.com/datasets/akshaypawar7/millions-of-movies

We have used the refined movie dataset `movies.csv`
 shared along with this code.

### Loading the data from the movie excel 

In [7]:
# loading the data from the movie excel 
import pandas as pd
excel_path="movies.csv"
moviedata = pd.read_csv(excel_path, delimiter = ',')
moviedata.head()

Unnamed: 0,movie_id,title,genres,original_language,overview,production_companies,release_date,runtime,vote_average,vote_count,credits,keywords,poster_path
0,1,Avatar: The Way of Water,Science Fiction-Adventure-Action,en,Set more than a decade after the events of the...,20th Century Studios-Lightstorm Entertainment,14-12-2022,192,7.751,6748,Sam Worthington-Zoe SaldaÃ±a-Sigourney Weaver-...,loss of loved one-dying and death-alien life-f...,https://image.tmdb.org/t/p/w500/t6HIqrRAclMCA6...
1,2,The Pope's Exorcist,Horror-Mystery-Thriller,en,Father Gabriele Amorth Chief Exorcist of the V...,Screen Gems-2.0 Entertainment-Jesus & Mary-Wor...,05-04-2023,103,7.433,545,Russell Crowe-Daniel Zovatto-Alex Essoe-Franco...,spain-rome italy-vatican-pope-pig-possession-c...,https://image.tmdb.org/t/p/w500/9JBEPLTPSm0d1m...
2,3,Shazam! Fury of the Gods,Action-Comedy-Fantasy,en,Billy Batson and his foster siblings who trans...,New Line Cinema-The Safran Company-DC Films-Wa...,15-03-2023,130,6.84,1355,Zachary Levi-Asher Angel-Jack Dylan Grazer-Ada...,superhero-end of the world-super power-aftercr...,https://image.tmdb.org/t/p/w500/2VK4d3mqqTc7LV...
3,4,The Super Mario Bros. Movie,Animation-Adventure-Family-Fantasy-Comedy,en,While working underground to fix a water main ...,Universal Pictures-Illumination-Nintendo,05-04-2023,92,7.556,332,Chris Pratt-Anya Taylor-Joy-Charlie Day-Jack B...,video game-plumber-magic mushroom-based on vid...,https://image.tmdb.org/t/p/w500/qNBAXBIQlnOThr...
4,5,Ant-Man and the Wasp: Quantumania,Action-Adventure-Science Fiction,en,Super-Hero partners Scott Lang and Hope van Dy...,Marvel Studios-Kevin Feige Productions,15-02-2023,125,6.448,1547,Paul Rudd-Evangeline Lilly-Jonathan Majors-Kat...,hero-ant-sequel-superhero-based on comic-famil...,https://image.tmdb.org/t/p/w500/ngl2FKBlU4fhbd...


In [2]:
import snowflake.connector

# Gets the version
ctx = snowflake.connector.connect(user='SHANTHINIMCA',password='Sgps@4565',account='hg09621.central-india.azure')
cs = ctx.cursor()
try:
    cs.execute("USE WAREHOUSE COMPUTE_WH")
    cs.execute("DROP DATABASE IF EXISTS MOVIES_DB")
    cs.execute("CREATE DATABASE MOVIES_DB")
    cs.execute("DROP SCHEMA IF EXISTS MOVIES_TABLES")
    cs.execute("CREATE SCHEMA MOVIES_TABLES")
finally:
    cs.close()
ctx.close()

In [3]:
from snowflake.snowpark.session import Session
import snowflake.snowpark.functions as F
import snowflake.snowpark.types as T

connection_parameters = {
    "account": 'hg09621.central-india.azure',
    "user": 'SHANTHINIMCA',
    "password": 'Sgps@4565',
    "role": "ACCOUNTADMIN",
    "warehouse": "COMPUTE_WH",
    "database": "MOVIES_DB",
    "schema": "MOVIES_TABLES"
    
}

session = Session.builder.configs(connection_parameters).create()

### Loading the data to the database into the movie table 

In [4]:
#Loading the data to the database into the movie table 
try:
    ctx = snowflake.connector.connect(user='SHANTHINIMCA',password='Sgps@4565',account='hg09621.central-india.azure',role="ACCOUNTADMIN",warehouse="COMPUTE_WH",database="MOVIES_DB",schema="MOVIES_TABLES")
    cursor = ctx.cursor()
    cursor.execute('DROP TABLE IF EXISTS MOVIE_DATA;')
    print('Creating table....')
    # in the below line please pass the create table statement which you want #to create
    cursor.execute("CREATE TABLE MOVIE_DATA(movie_id int primary key,movie_title TEXT, genres TEXT, original_language varchar(255),overview TEXT,production TEXT,release_date varchar(255),runtime varchar(255),voter_rating varchar(255),voters_count varchar(255),credits TEXT,keywords TEXT,Poster_path varchar(255))")
    print("Table is created....")
    #loop through the data frame
    for i,row in moviedata.iterrows():
        #here %S means string values 
        sql = "INSERT INTO MOVIE_DATA VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)"
        cursor.execute(sql, list(row))
        print("Record inserted:-" + row[1])
    cursor.execute("ALTER TABLE movie_data add poster_image binary;")
    print("Table is altered....")
finally:
    cs.close()
ctx.close()

Creating table....
Table is created....
Record inserted:-Avatar: The Way of Water
Record inserted:-The Pope's Exorcist
Record inserted:-Shazam! Fury of the Gods
Record inserted:-The Super Mario Bros. Movie
Record inserted:-Ant-Man and the Wasp: Quantumania
Record inserted:-Creed III
Record inserted:-Knock at the Cabin
Record inserted:-Guardians of the Galaxy Volume 3
Record inserted:-Plane
Record inserted:-Black Panther: Wakanda Forever
Record inserted:-Scream VI
Record inserted:-The Eighth Clause
Record inserted:-Puss in Boots: The Last Wish
Record inserted:-Evil Dead Rise
Record inserted:-John Wick: Chapter 4
Record inserted:-Lord of the Streets
Record inserted:-A Man Called Otto
Record inserted:-Huesera: The Bone Woman
Record inserted:-Dungeons & Dragons: Honor Among Thieves
Record inserted:-Narvik
Record inserted:-Ghosted
Record inserted:-Diabolik - Ginko all'attacco!
Record inserted:-65
Record inserted:-Shotgun Wedding
Record inserted:-M3GAN
Record inserted:-Cocaine Bear
Record in

### Downloading and inserting the poster images to the movie table in the movie database created

The below code can be tweak by setting the limits to the movie id column such that we can insert the desired number of movie poster images required. 

In [5]:
import requests
import shutil

def convert_data(file_name):
    with open(file_name, 'rb') as file:
        binary_data = file.read()
    return binary_data

try:
    ctx = snowflake.connector.connect(user='SHANTHINIMCA',password='Sgps@4565',account='hg09621.central-india.azure',role="ACCOUNTADMIN",warehouse="COMPUTE_WH",database="MOVIES_DB",schema="MOVIES_TABLES")
    cursor = ctx.cursor()
    cursor.execute("SELECT * FROM movie_data where movie_id > 0 and  movie_id < 100000")
    myresult = cursor.fetchall()
    print(len(myresult))
    for x in myresult:
        filename = "posters/"+str(x[0])+".jpg"
        # Open the url image, set stream to True, this will return the stream content.
        print(str(x[0])+":- "+x[1] + ":-"+ x[12])
        r = requests.get(x[12], stream = True)
        # Check if the image was retrieved successfully
        if r.status_code == 200:
            # Set decode_content value to True, otherwise the downloaded image file's size will be zero.
            r.raw.decode_content = True
            # Open a local file with wb ( write binary ) permission.
            with open(filename,'wb') as f:
                shutil.copyfileobj(r.raw, f)
            imagedata=convert_data(filename)
            sql = "Update movie_data set poster_image = ? where movie_id = ?"
            cursor.execute(sql, (imagedata, x[0]))
            print(str(x[0])+":- "+x[1] + " :- image inserted")
finally:
    cs.close()
ctx.close()

122
1:- Avatar: The Way of Water:-https://image.tmdb.org/t/p/w500/t6HIqrRAclMCA60NsSmeqe9RmNV.jpg
1:- Avatar: The Way of Water :- image inserted
2:- The Pope's Exorcist:-https://image.tmdb.org/t/p/w500/9JBEPLTPSm0d1mbEcLxULjJq9Eh.jpg
2:- The Pope's Exorcist :- image inserted
3:- Shazam! Fury of the Gods:-https://image.tmdb.org/t/p/w500/2VK4d3mqqTc7LVZLnLPeRiPaJ71.jpg
3:- Shazam! Fury of the Gods :- image inserted
4:- The Super Mario Bros. Movie:-https://image.tmdb.org/t/p/w500/qNBAXBIQlnOThrVvA6mA2B5ggV6.jpg
4:- The Super Mario Bros. Movie :- image inserted
5:- Ant-Man and the Wasp: Quantumania:-https://image.tmdb.org/t/p/w500/ngl2FKBlU4fhbdsrtdom9LVLBXw.jpg
5:- Ant-Man and the Wasp: Quantumania :- image inserted
6:- Creed III:-https://image.tmdb.org/t/p/w500/cvsXj3I9Q2iyyIo95AecSd1tad7.jpg
6:- Creed III :- image inserted
7:- Knock at the Cabin:-https://image.tmdb.org/t/p/w500/dm06L9pxDOL9jNSK4Cb6y139rrG.jpg
7:- Knock at the Cabin :- image inserted
8:- Guardians of the Galaxy Volume 3:

64:- Sniper: The White Raven :- image inserted
65:- Spider-Man: No Way Home:-https://image.tmdb.org/t/p/w500/uJYYizSuA9Y3DCs0qS4qWvHfZg4.jpg
65:- Spider-Man: No Way Home :- image inserted
66:- Violent Night:-https://image.tmdb.org/t/p/w500/1XSYOP0JjjyMz1irihvWywro82r.jpg
66:- Violent Night :- image inserted
67:- Ritual:-https://image.tmdb.org/t/p/w500/9i4v0kITWH4xVdTN0JgNjwKf3CY.jpg
67:- Ritual :- image inserted
68:- The Woman King:-https://image.tmdb.org/t/p/w500/438QXt1E3WJWb3PqNniK0tAE5c1.jpg
68:- The Woman King :- image inserted
69:- Babylon:-https://image.tmdb.org/t/p/w500/wjOHjWCUE0YzDiEzKv8AfqHj3ir.jpg
69:- Babylon :- image inserted
70:- Thor: Love and Thunder:-https://image.tmdb.org/t/p/w500/pIkRyD18kl4FhoCNQuWxWu5cBLM.jpg
70:- Thor: Love and Thunder :- image inserted
71:- Somebody I Used to Know:-https://image.tmdb.org/t/p/w500/ovHxxphDgjyEpYriDoGoIHfrdZL.jpg
71:- Somebody I Used to Know :- image inserted
72:- Scream:-https://image.tmdb.org/t/p/w500/fFE6E9jRKaiy7LHxk4jH1jIWzyT

## 3.Feature Extraction
Here we are using the `ResNet50` model and `ImageNet` weights for the feature extraction through transfer learning.

### `ResNet50` 
ResNet-50 is a convolutional neural network (CNN) model, that has a deep architecture consisting of 50 layers, including convolutional layers, pooling layers, fully connected layers, and shortcut connections known as skip connections.

The input to ResNet-50 is typically a 224x224 RGB image, and the output is a vector of probabilities representing the predicted probabilities of different classes. The model is trained using a large dataset, such as ImageNet, where it learns to classify images into one of the 1,000 predefined classes. ResNet-50 has also been used as a starting point for transfer learning, where the pre-trained model is fine-tuned on a specific task using a smaller dataset

###`ImageNet`
The ImageNet weights refer to the pre-trained weights of a neural network model, specifically trained on the ImageNet dataset. The ImageNet dataset is a large-scale dataset containing millions of labeled images belonging to thousands of different categories.


### Genres list

There are approximation of 19 genres associated in the movie data base 
The following are the list of the genres 

1.Action, 
2.Adventure,
3.Animation,
4.Comedy,
5.Crime,
6.Documentary,
7.Drama,
8.Family,
9.Fantasy,
10.History,
11.Horror,
12.Music,
13.Mystery,
14.Romance,
15.Science Fiction,
16.Thriller,
17.TV Movie,
18.War,
19.Western,

### Genre wise feature extraction from the poster images


In [None]:
import tensorflow_hub as hub
import tensorflow
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import GlobalMaxPooling2D
from tensorflow.keras.applications.resnet50 import ResNet50,preprocess_input
import numpy as np
from numpy.linalg import norm
import pickle
import os
import time
from tqdm import tqdm

# Create a function to write the image file in local machine 
def write_file(data, filename):
    with open(filename, 'wb') as f:
        f.write(data)

# Create a function to extract the feature of the image using model
def extract_features(img_path,model):
    img = image.load_img(img_path,target_size=(224,224))
    img_array = image.img_to_array(img)
    expanded_img_array = np.expand_dims(img_array, axis=0)
    preprocessed_img = preprocess_input(expanded_img_array)
    result = model.predict(preprocessed_img).flatten()
    normalized_result = result / norm(result)

    return normalized_result

#Create a ResNet Model
model = ResNet50(weights='imagenet',include_top=False,input_shape=(224,224,3))
model.trainable = False

model = tensorflow.keras.Sequential([
    model,
    GlobalMaxPooling2D()
])

genres=["Action", "Adventure", "Animation", "Comedy", "Crime", "Documentary", "Drama", "Family", "Fantasy", "History", "Horror", "Music", "Mystery", "Romance", "Science Fiction", "Thriller", "TV Movie", "War", "Western"]
try:
    ctx = snowflake.connector.connect(user='SHANTHINIMCA',password='Sgps@4565',account='hg09621.central-india.azure',role="ACCOUNTADMIN",warehouse="COMPUTE_WH",database="MOVIES_DB",schema="MOVIES_TABLES")
    cursor = conn.cursor()
    cursor.execute("select database();")
    record = cursor.fetchone()
    print("You're connected to database: ", record)
    for genre in genres:
        feature_list =[]
        filenumber=[]
        sql="SELECT movie_id, poster_image FROM movie_data where movie_id > 0 and movie_id < 50000 and poster_image Is Not Null and genres like (?)"
        cursor.execute(sql,("%"+genre+"%",))
        #cursor.execute("SELECT movie_id FROM movies.movie_data where movie_id < 10000 && poster_image Is Not Null")
        myresult = cursor.fetchall()
        for x in tqdm(myresult):
            filenumber.append(x[0])
            #Provide the poster image path for the feature extraction:-
            poster_image_path="posters/"+str(x[0])+".jpg"
            write_file(x[1], poster_image_path)
            feature_list.append(extract_features(poster_image_path,model))
        print(np.array(feature_list).shape)
        # provide the path for the feature extraction file 
        feature_extraction_file='Extraction/'+genre+'_imageFeaturesEmbeddings.pkl'
        pickle.dump(feature_list,open(feature_extraction_file,'wb'))
        time.sleep(30)
        print(len(filenumber))
        # provide the path for the feature extraction file 
        feature_filenumber_file='Extraction/'+genre+'_imageFeaturesFileNumber.pkl'
        pickle.dump(filenumber,open(feature_filenumber_file,'wb'))
        print("file has been loaded")
        time.sleep(30)
finally:
    cs.close()
ctx.close()
print("program has terminated")

Note:- The above code is a once time execution

##4.Similarity calculation:
The 'brute' algorithm has been used to determine the nearest neighbor images for the reference, using the Euclidean distance metric to measure visual similarity. 




###`Euclidean distance`

also known as Euclidean metric, is a measure of the straight-line distance between two points in Euclidean space.

###`Brute-Force Algorithm`

also known as an exhaustive search algorithm, is a straightforward approach to problem-solving that systematically tries every possible solution

## 5.Ranking & recommendation:
The best 5 visually similar images from each following genre of the reference film are suggested using the scores from the Euclidean distance metric computation.

# Movie Recommendation Front end application

The front end application has been developed using the Streamlit platform

To view the Streamlit app on a browser, run it with the following
  command:

    streamlit run pythonfile

Example:-

    streamlit run c:\users\vivek.kakumanu\desktop\python_learnings\python_script\project\poster_recomendation_system\movie_recomendation_based_on_genres.py


In [None]:
import streamlit as st
from PIL import Image
import numpy as np
import pickle
import tensorflow
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import GlobalMaxPooling2D
from tensorflow.keras.applications.resnet50 import ResNet50,preprocess_input
from sklearn.neighbors import NearestNeighbors
from numpy.linalg import norm
import mysql.connector as mysql
from mysql.connector import Error
import random

st.title('Movie Poster Recommender System')


#Create a file method 
def file_name(uploaded_file):
    return "uploads/"+ str(uploaded_file)+".jpg"


#Create a save file method 
def save_uploaded_file(data, uploaded_file):
    try:
        with open(file_name(uploaded_file),'wb') as f:
            f.write(data)
        return 1
    except:
        return 0

# Create a function to extract the feature of the image using model
def feature_extraction(img_path,model):
    img = image.load_img(img_path, target_size=(224, 224))
    img_array = image.img_to_array(img)
    expanded_img_array = np.expand_dims(img_array, axis=0)
    preprocessed_img = preprocess_input(expanded_img_array)
    result = model.predict(preprocessed_img).flatten()
    normalized_result = result / norm(result)

    return normalized_result

# Create a function to recommend the images based on the features extract by the model.
def recommend(features,genre):
  # provide the path for the feature extraction file 
    feature_extraction_file='Extraction/'+genre+'_imageFeaturesEmbeddings.pkl'        
    feature_list = np.array(pickle.load(open(feature_extraction_file,'rb')))
    neighbors = NearestNeighbors(n_neighbors=6, algorithm='brute', metric='euclidean')
    neighbors.fit(feature_list)

    distances, indices = neighbors.kneighbors([features])

    return indices

model = ResNet50(weights='imagenet',include_top=False,input_shape=(224,224,3))
model.trainable = False

model = tensorflow.keras.Sequential([
    model,
    GlobalMaxPooling2D()
])


def predict_movies(movie_id):
    # feature extract
        features = feature_extraction(file_name(movie_id),model)
        sql = "SELECT movie_title, genres FROM movies.movie_data where movie_id = "+str(movie_id)
        print(sql)
        cursor.execute(sql)
        txt=cursor.fetchall()
        print(txt[0][1])
        genres = txt[0][1].split('-')
        st.write(txt[0][0])
        st.write(txt[0][1])
        for genre in genres:
            # recommendention
            indices = recommend(features,genre)
            where_in = ','.join(['%s'] * len(indices[0]))
            # provide the path for the feature extraction file 
            feature_filenumber_file='Extraction/'+genre+'_imageFeaturesFileNumber.pkl'
            filenames1 = pickle.load(open(feature_filenumber_file,'rb'))
            sql = "SELECT movie_id,movie_title,poster_image FROM movies.movie_data where movie_id in (%s)" % (where_in)
            sql = sql+ "and genres like (%s) and movie_id not in (%s)"
            indices_list=[]
            length = len(indices[0])
            for i in range(0,length):
                indices_list.append(filenames1[indices[0][i]])
            tuple_list = tuple(indices_list) + ("%"+genre+"%", movie_id,)
            cursor.execute(sql,tuple_list)
            recomended_results = cursor.fetchall()
            recomended_result=[]
            for i in range (0,len(recomended_results)):
                for j in range(length):
                    if indices_list[j]==recomended_results[i][0]:
                        recomended_result.append(recomended_results[i])
                if len(recomended_result)==6:
                    break
            st.header(genre)
            col1,col2,col3,col4,col5 = st.columns(5)
            
            with col1:
                if save_uploaded_file(recomended_result[0][2],recomended_result[0][0]):
                # display the file
                    display_image = Image.open(file_name(recomended_result[0][0]))
                    st.image(display_image)
                    st.write(recomended_result[0][1])
            with col2:
                if save_uploaded_file(recomended_result[1][2],recomended_result[1][0]):
                # display the file
                    display_image = Image.open(file_name(recomended_result[1][0]))
                    st.image(display_image)
                    st.write(recomended_result[1][1])
            with col3:
                if save_uploaded_file(recomended_result[2][2],recomended_result[2][0]):
                # display the file
                    display_image = Image.open(file_name(recomended_result[2][0]))
                    st.image(display_image)
                    st.write(recomended_result[2][1])
            with col4:
                if save_uploaded_file(recomended_result[3][2],recomended_result[3][0]):
                # display the file
                    display_image = Image.open(file_name(recomended_result[3][0]))
                    st.image(display_image)
                    st.write(recomended_result[3][1]) 
            with col5:
                if save_uploaded_file(recomended_result[4][2],recomended_result[4][0]):
                # display the file
                    display_image = Image.open(file_name(recomended_result[4][0]))
                    st.image(display_image)
                    st.write(recomended_result[4][1])

if "refreshclick" not in st.session_state:
    st.session_state.refreshclick=False
    if "movie_id" not in st.session_state:
        randomlist=[]
        for i in range(0,30):
            n = random.randint(1,50000)
            randomlist.append(n)
        st.session_state.movie_id=randomlist

   
try:
    conn = mysql.connect(host='localhost', database='movies', user='root', password='@Temp2023')
    if conn.is_connected():
        cursor = conn.cursor()
        cursor.execute("select database();")
        record = cursor.fetchone()
        print("You're connected to database: ", record)
        print ("randon number",tuple(st.session_state.movie_id))
        where_in = ','.join(['%s'] * len(st.session_state['movie_id']))
        sql = "SELECT movie_id,movie_title,poster_image FROM movies.movie_data where movie_id in (%s)" % (where_in)
        sql = sql+ "and genres not like (%s) and genres not like (%s) and genres not like (%s) "
        print(sql)
        tuple_list = tuple(st.session_state.movie_id) + ("%TV Movie%","%Romance%","%Drama%",)
        print(len(tuple_list))
        cursor.execute(sql, tuple_list)
        myresult = cursor.fetchall()
        col1,col2,col3,col4,col5 = st.columns(5)
        st.session_state.refreshclick = False
        if(len(myresult)!=0):
            with col1:
                if save_uploaded_file(myresult[0][2],myresult[0][0]):
                # display the file
                    display_image = Image.open(file_name(myresult[0][0]))
                    clicked_0 = st.button(myresult[0][1] , st.image(display_image))
            with col2:
                if save_uploaded_file(myresult[1][2],myresult[1][0]):
                # display the file
                    display_image = Image.open(file_name(myresult[1][0]))
                    clicked_1 = st.button(myresult[1][1] , st.image(display_image))
                    
            with col3:
                if save_uploaded_file(myresult[2][2],myresult[2][0]):
                # display the file
                    display_image = Image.open(file_name(myresult[2][0]))
                    clicked_2 = st.button(myresult[2][1] , st.image(display_image))
            with col4:
                if save_uploaded_file(myresult[3][2],myresult[3][0]):
                # display the file
                    display_image = Image.open(file_name(myresult[3][0]))
                    clicked_3 = st.button(myresult[3][1] , st.image(display_image))
                    
            with col5:
                if save_uploaded_file(myresult[4][2],myresult[4][0]):
                # display the file
                    display_image = Image.open(file_name(myresult[4][0]))
                    clicked_4 = st.button(myresult[4][1] , st.image(display_image))
                    
            if(clicked_0):
                predict_movies(myresult[0][0])
            if(clicked_1):
                predict_movies(myresult[1][0])
            if(clicked_2):
                predict_movies(myresult[2][0])
            if(clicked_3):
                predict_movies(myresult[3][0])
            if(clicked_4):
                predict_movies(myresult[4][0])
                        
    else:
        st.header("Data Base Connection issue")
except Error as e:
   print(e)
clicked = st.button("Refresh")

In [None]:
!streamlit run mrs.py