# <center> **Base de données (Films)**

## **Présentation**

Des fichiers csv contenant les données ont été générés lors de la phase précédente de scrapping, nous allons maintenant lire ces fichiers csv dans des dataFrame Pandas et nettoyer les données puis les insérer dans une base de données SQL.


## **Questions**

**Quelles données mettre en SQL et en NoSQL ?**<br>
SQL pour les données tabulaires, NoSQL pour les données non tabulaires ou non structurées.

**Générer automatiquement des IDs chaines de caractères ?**

**Les requêtes faites à partir de notre API doivent avoir accès uniquement à la base en lecture seule**

**Comment faire le lien entre table SQL Movie et NoSQL infos ? (info_id ?)**

**Comment créer les tables infos et movies ? Faut il vraiment un ID commun?**


## **Sources**

**Neo4j pour créer des graphs ML**<br>
https://neo4j.com/docs/getting-started/appendix/tutorials/guide-import-relational-and-etl/


**Wikidata**<br>
https://query.wikidata.org/querybuilder/?uselang=fr<br>


## **Création du schéma relationnel**

![relation_schema](images/mpd.png)

Online tool: https://drawdb.vercel.app/editor<br>
https://www.mocodo.net/<br>

Lien vers mon schéma : [https://www.mocodo.net/?mcd=eNq9U0Fu2zAQvPMVe_FNKOqrb6pMNwRsKhDlBj0ZtMykBCTSoEgX6Q_6lKBP6C3-WClSkoUYPRQ1woOwyx3OzixWs9mND5rBbc9fGFPG8oykJckpLDEszz_vMV2mNMOwymnW3VO8XuN_oPxfkUzCSVsjgLetriS3Uivg4JSAipuDVLyW9vwC83kCDZctcFc5j7DWyL2ziX_tWjhqdxI_esqag9JNIwy8_oLlCl5_wwI2utIH3dWMOJrzSyuUFYbDkRvfDCphqlp8gHdzfvMVQktS4KzMC7aAgzSistrs5CG5JIo3AqU9hI91film-eY-Z7irV7o56lZEyJhEVFriz3lBcAfjVjxp8xxhfeJ5trTsAdopO9RjPEhlsMm_EJzARxojFsLRh2_ESva2nPa1KNVzbBnJ3oJGI-guZdAL_nqFGo0gGn6KdE3Ka9RgBqEC-9sH78qIkxTfg6lG-zAO2ttPoLXc7IzfY_WU9LgEXJxkUNOTJDCnl3hsuYASsxINyYXdSlt7-oMz4R_puGvBW7GLbdV-p7QVbYhiWx9L9ajD64mqIILQVd5JmI9OfRguUfguJk9d03Dz7E2Yeme_uWavuKwReihIWWIKn-LEtt24p6YW0CH82LZxo9ywTW7YpKnWoIqRDVmnxXT6VyH6A09tYpQ=
](Link Mocodo)<br>

L'utilitaire en ligne **drawdb** permet d'exporter un fichier .SQL contenant la description de notre base (tables avec les relations)

![export_SQL](images/export_SQL.png)

Après quelques modifications à ce fichier nous pouvons créer notre base de données ainsi que les tables à partir de l'invite de commandes.

<code>"C:\Program Files\MySQL\MySQL Server 8.0\bin\mysql.exe" < movies.sql -u root -p</code>



Une difficulté rencontrée est de faire apparaître les films similaires à un film dans la base de données, en effet cette situation est une relation "Many to many" où la table se référence elle-même, c'est un cas fréquent dans les réseaux sociaux où un utilisateur va avoir des amis eux-mêmes utilisateurs.<br>
Deux situations sont observées : <br>
- les relations symétriques (l'utilisateur Alice est amis avec l'utilisateur Bob implique nécessairement que Bob est ami d'Alice),
- les relations asymétriques (Alice est amis avec Bob mais Bob n'est pas forcément ami avec Alice).

Dans notre situation la relation est .....????

Pour représenter cette relation ....





**About Self many-to-many relationship**: https://stackoverflow.com/questions/17128472/many-to-many-on-same-table

Relations n-n

Remarque :
La table "reviews" sert de table de jonction entre les films et les utilisateurs, en effet la relation movies-users est une relation n-n car un utilateur peut écrire des avis pour plusieurs films et un film possède des avis de plusieurs utilisateurs.




**Remarque**<br>
Pour bien faire il ne faudrait scrapper ni les catégories, ni les pays, mais les récupérer à partir des informations de films scrappés puis ajouter les catégories nécessaires à la table des catégories (donc requêter pour voir les catégories déjà en base puis ajouter les nouvelles catégories (idem pour les pays))

Même chose pour les directeurs, les acteurs et les compositeurs.
Il faut faire une fonction qui complète les tables déjà en base.

### **Imports**

In [459]:
%reset

In [2]:
import math
import copy
import time
import re
import uuid
import json
import requests
import numpy as np
import pandas as pd
from tqdm import tqdm
from unidecode import unidecode
from collections import namedtuple

import mysql.connector
from IPython.display import display

pd.set_option('display.max_rows', 10)
tqdm.pandas()

### **Tools**

In [470]:
months_FR = ['janvier', 'février', 'mars', 'avril', 'mai', 'juin', 'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre']
months_EN = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']

# UNUSED
tables_fields = list(zip(['directors', 'actors', 'composers', 'categories', 'countries'], \
                         ['director_id', 'actor_id', 'composer_id', 'category_id', 'country_id'],
                         ['director_name', 'actor_name', 'composer_name', 'category', 'country']))

# --------------------------------------------------------- #
#                                                           #
#   Named Tuple defining the structures of the SQL tables   #
#                                                           #
# --------------------------------------------------------- #

Table = namedtuple('Table', (['Table_name', 'Field_id', 'Field']))
tup_categories = Table('categories', 'category_id', 'category')
tup_countries  = Table('countries', 'country_id', 'country')
tup_directors  = Table('directors', 'director_id', 'director_name')
tup_composers  = Table('composers', 'composer_id', 'composer_name')
tup_actors     = Table('actors', 'actor_id', 'actor_name')

def generate_ID():
    ''' Generate an ID '''
    return str(uuid.uuid4())

def string_with_comma_to_list_of_strings(st):
    ''' Convert a string such as "['string1', 'string2' ...]" into
        a list of string ['string1', 'string2', ...]

        Return: A list of strings.

        Arg:
         - st: string with the value to split.
    '''
    if pd.isna(st):
        return []
    return [item.strip() for item in st.split(",") if len(item.strip()) > 1]

def duration_to_minutes(st):
    ''' Convert duration string into number of minutes (integer)
        1h 35min   ---->    95

        Return: Integer representing the number of minutes.
        Arg:
         - st: duration string to convert.
    '''
    if pd.isna(st) or st == '': return 0
    if 'h' in st:
        a, b = st.split('h')
        if 'min' in b:
            b = b.replace('min', '')
            return 60 * int(a) + int(b.strip())
        return 60 * int(a)
    assert 'min' in st
    st = st.replace('min', '')
    return 60 * int(st)            

def unique_values_of_columns(df_data, column):
    ''' return a list with unique values found in the column,
        values in the column are like : 'value1,value2,value3' ..... 
        so we split all values in each row of the column and stack them in a list.

        Return : A list of values.

        Args:
         - df_data: dataframe with the data to extract,
         - column: string with the name of the column we want to work with.     
    '''
    df = df_data[column]
    df.dropna(inplace = True)
    df = df.apply(string_with_comma_to_list_of_strings)
    df = df.apply(pd.Series).stack().reset_index(drop=True)
    return df.unique()

def convert_months_FR_to_EN(st):
    ''' Convert french months to english months

        Return: string with a month in english.

        Arg:
         - st: string with a french date.
    '''
    if pd.isna(st): return ''
    for month_FR, month_EN in zip(months_FR, months_EN):
        if month_FR in st:
            return st.replace(month_FR, month_EN)
    print('ERROR', st)
    return st

### **Fonctions pour remplir les tables**

In [471]:
# ------------------------ #
#                          #
#    Filling SQL tables    #
#                          #
# ------------------------ #

def fill_in_table(lst, table_name, field_id, field, connector, cursor):
    ''' Fill in an SQL table from a list of values.
        For each value in the list 'lst' we generate an ID and insert into the table (ID, value).

        CAREFUL: To run only once for each table, otherwise : "ERROR Duplicate entry '0' for key 'table_name.PRIMARY'"

        Return: A dictionary mapping each value of the list to an ID newly generated {value1 : ID1, value2 : ID2, ...}

        Args:
         - lst: list of values to insert into the table,
         - table_name: string with the name of the table,
         - field_id: string with the ID of the value record inserted in the table,
         - field: string of the field in the table,
         - connector: MySQL connector connected to the relevant database,
         - cursor: MySQL cursor to execute SQL statements.
    '''
    assert False
    dic_return = {}
    for item in lst:
        dic_return[item] = generate_ID()
        query = f"INSERT INTO {table_name} ({field_id}, {field}) VALUES (%s, %s)"
        val = (dic_return[item], item)
        cursor.execute(query, val)
    connector.commit()
    return dic_return

def fill_in_categorial_table_with_new_values(tup_table, arr_values, connector, cursor):
    ''' Insert values into a database table,
        First we have to check that the value is not already in the table.

       Return: dictionary {value : id} with the whole table.
       Args:
        - tup_table (Table): named_tuple with all infos about table fields,
        - arr_values (np.array): array with all of values to insert into the table,
        - connector: MySQL connector connected to the relevant database,
        - cursor: MySQL cursor to execute SQL statements.
    '''

    # Query to get list of values already in the tabme
    query = (f"SELECT {tup_table.Field}, {tup_table.Field_id} FROM {tup_table.Table_name};")
    cursor.execute(query)
    result = cursor.fetchall()
    dic = dict(np.array(result))

    # Compute the difference between two sets to get
    arr_diff = np.setdiff1d(arr_values, list(dic.keys()), assume_unique = True)

    # Fill in the table with new values
    for item in arr_diff:
        dic[item] = generate_ID()
        query = f"INSERT INTO {tup_table.Table_name} ({tup_table.Field_id}, {tup_table.Field}) VALUES (%s, %s)"
        val = (dic[item], item)
        cursor.execute(query, val)

    # Validate all SQL operations
    connector.commit()
    return dic

def fill_in_pivot_table(table_name, field, lst_values, movie_id, cursor):
    ''' Fill in pivot table with couple value such (item, movie_id)
        where item is a value of lst_values.

        Args:
         - table_name: string with name of the pivot_table to fill in,
         - field: string with the name of the field (category_id, actor_id ...),
         - lst_values: list of values of the field,
         - movie_id: id of the movie to be connected,
         - cursor: MySQL cursor to execute SQL statements.
    '''
    for item in lst_values:
        query = f"INSERT INTO {table_name} ({field}, movie_id) VALUES (%s, %s)"
        val = (item, movie_id)
        cursor.execute(query, val)

In [472]:
def fill_in_categorial_tables(df_movies, connector, cursor):
    ''' Fill in the 5 categorial tables:
            categories, countries, directors, actors, composers.

        Return: 5 dictionaries for the 5 tables, 
                each dictionary containing pairs {value : id}

        Arg:
         - df_movies (Pandas Dataframe): containing all movies infos,
         - connector: MySQL connector connected to the relevant database,
         - cursor: MySQL cursor to execute SQL statements.
    '''
    # Fill in categories table
    arr_categories = np.array(unique_values_of_columns(df_movies, 'categories'))
    dict_category_id = fill_in_categorial_table_with_new_values(tup_categories, arr_categories, connector, cursor)

    # Fill in countries table
    arr_countries = np.array(unique_values_of_columns(df_movies, 'countries'))
    dict_country_id = fill_in_categorial_table_with_new_values(tup_countries, arr_countries, connector, cursor)

    # Fill in directors table
    arr_directors = np.array(unique_values_of_columns(df_movies, 'directors'))
    dict_director_id = fill_in_categorial_table_with_new_values(tup_directors, arr_directors, connector, cursor)

    # Fill in actors table
    arr_actors = np.array(unique_values_of_columns(df_movies, 'actors'))
    dict_actor_id = fill_in_categorial_table_with_new_values(tup_actors, arr_actors, connector, cursor)

    # Fill in composers table
    arr_composers = np.array(unique_values_of_columns(df_movies, 'composers'))
    dict_composer_id = fill_in_categorial_table_with_new_values(tup_composers, arr_composers, connector, cursor)

    return (dict_category_id, dict_country_id, dict_director_id, dict_actor_id, dict_composer_id)

def formatting_data(df, tuple_dict):
    '''
       Converting some columns of "df" into the appropriate format:
        - correct date format,
        - duration un minutes,
        - convert categories / countries into their IDs in the DB,
        - convert directors / actors / composers into their IDs in the DB.

        Return dataframe with formatted data.

        Args:
         - df Pandas Dataframe to be formatted,
         - tuple_dict: tuple with the 5 dictionnaries corresponding to the 5 tables:
                       categories, countries, directors, actors, composers.
    '''

    assert 'categories' in df.columns and 'countries' in df.columns and 'directors' in df.columns \
       and 'actors' in df.columns and 'composers' in df.columns and 'duration' in df.columns \
       and 'date' in df.columns and 'reviews' in df.columns and 'star_rating' in df.columns

    (dict_category_id, dict_country_id, dict_director_id, dict_actor_id, dict_composer_id) = tuple_dict

    df_formated = df.copy()
    df_formated['categories'] = df_formated['categories'].apply(string_with_comma_to_list_of_strings)
    df_formated['categories'] = df_formated['categories'].apply(lambda lst : list(set([dict_category_id[k] for k in lst])))

    df_formated['countries'] = df_formated['countries'].apply(string_with_comma_to_list_of_strings)
    df_formated['countries'] = df_formated['countries'].apply(lambda lst : list(set([dict_country_id[k] for k in lst])))

    df_formated['directors'] = df_formated['directors'].apply(string_with_comma_to_list_of_strings)
    df_formated['directors'] = df_formated['directors'].apply(lambda lst : list(set([dict_director_id[k] for k in lst])))

    df_formated['actors'] = df_formated['actors'].apply(string_with_comma_to_list_of_strings)
    df_formated['actors'] = df_formated['actors'].apply(lambda lst : list(set([dict_actor_id[k] for k in lst])))

    df_formated['composers'] = df_formated['composers'].apply(string_with_comma_to_list_of_strings)
    df_formated['composers'] = df_formated['composers'].apply(lambda lst : list(set([dict_composer_id[k] for k in lst])))

    df_formated['duration'] = df_formated['duration'].apply(duration_to_minutes)

    df_formated['date'] = df_formated['date'].apply(convert_months_FR_to_EN)
    df_formated['date'] = pd.to_datetime(df_formated['date'], format='mixed')

    df_formated['notes'] = df_formated['notes'].apply(int)
    df_formated['reviews'] = df_formated['reviews'].astype(int)
    df_formated['star_rating'] = df_formated['star_rating'].apply(lambda x : float(x.replace(',', '.')))

    return df_formated

In [478]:
def fill_in_movie_table(df_movies_formatted, connector, cursor):
    '''
       Fill in the 'movies' tables and all related tables (infos + 5 pivot tables)
       
       Args:
        - df_movies_formatted: Pandas dataframed with formatted data,
        - connector: MySQL connector connected to the relevant database,
        - cursor: MySQL cursor to execute SQL statements.
    '''
    
    for movie in df_movies_formatted.itertuples():

        # ----------- #
        #    infos    #
        # ----------- #

        info_id = generate_ID()
        sql = "INSERT INTO infos (info_id, summary, url_thumbnail) VALUES (%s, %s, %s)"
        val = (info_id, 
               movie[13],  # summary 
               movie[14])  # url_thumbnail
        
        if pd.isna(movie[13]) or pd.isna(movie[14]):
            print(movie)
            
        cursor.execute(sql, val)

        # ----------- #
        #    movies   #
        # ----------- #

        movie_id = generate_ID()

        sql = "INSERT INTO movies (movie_id, title, original_title, release_date, duration, nb_notes, \
                                   nb_reviews, info_id, star_rating) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)"
        val = (movie_id, 
               movie[1], # title
               movie[2], # original_title
               movie[3], # release_date
               movie[4], # duration
               movie[8], # nb_notes
               movie[9], # nb_reviews
               info_id,  
               movie[7]) # star_rating
        if pd.isna(movie[1]) or pd.isna(movie[2]) or pd.isna(movie[3]) or pd.isna(movie[4]) or pd.isna(movie[8]) or pd.isna(movie[9]) or pd.isna(movie[7]):
            print(movie)

        cursor.execute(sql, val)

        # -------------------- #
        #     Pivot tables     #   
        # -------------------- #
        #    category_movie    #
        #    country_movie     #
        #    director_movie    #
        #    actor_movie       #
        #    composer_movie    #
        # -------------------- #

        # Fill in pivot table: category_movie
        lst_categories = movie[5]
        fill_in_pivot_table('category_movie', 'category_id', lst_categories, movie_id, cursor)

        # Fill in pivot table: country_movie
        lst_countries = movie[6]
        fill_in_pivot_table('country_movie', 'country_id', lst_countries, movie_id, cursor)

        # Fill in pivot table: director_movie
        lst_directors = movie[10]
        fill_in_pivot_table('director_movie', 'director_id', lst_directors, movie_id, cursor)

        # Fill in pivot table: actor_movie
        lst_actors = movie[11]
        fill_in_pivot_table('actor_movie', 'actor_id', lst_actors, movie_id, cursor)

        # Fill in pivot table: composer_movie
        lst_composers = movie[12]
        fill_in_pivot_table('composer_movie', 'composer_id', lst_composers, movie_id, cursor)
        
        connector.commit()

## **Création de la base de données MySQL**
On lance **MySQL Shell** puis on passe en mode **SQ** avec l'instruction <code>\sql</code>

On peut ensuite lancer la création de notre base de données et des tables en lançant le script "movies.sql"

<code> "C:\Program Files\MySQL\MySQL Server 8.0\bin\mysql.exe" < movies.sql -u root -p</code><br>

Une fois la base 'movies' créée, on peut créer un connecteur sur la base.

In [24]:
connector = mysql.connector.connect(user='root', password='admin', \
                              host = '127.0.0.1', database='movies')
cursor = connector.cursor(buffered=True)

In [481]:
connector.disconnect()

## **Lecture des données à partir des fichiers csv**

In [None]:
# ds_categories = pd.read_csv('csv/categories.csv', delimiter = ',')
# ds_categories = ds_categories[ds_categories.columns[0]]
# ds_countries = pd.read_csv('csv/countries.csv', delimiter = ',')
# ds_countries = ds_countries[ds_countries.columns[0]]
# print("Categories :", ds_categories.shape)
# print("Countries :", ds_countries.shape)

# DONE
csv_files =  ['csv/movies_year_1960_to_1970.csv',
              'csv/movies_year_1970_to_1980.csv',
              'csv/movies_year_1980_to_1990.csv',
              'csv/movies_year_1990_to_1995.csv',
              'csv/movies_year_1995_to_2000.csv',
              'csv/movies_year_2000_to_2003.csv',
              'csv/movies_year_2003_to_2006.csv',
              'csv/movies_year_2006_to_2010.csv',
              'csv/movies_year_2010_to_2015.csv',
              'csv/movies_year_2015_to_2019.csv',
              'csv/movies_year_2019_to_2022.csv',
              'csv/movies_year_2022_to_2025.csv',
              'csv/movies_year_2025_week_1_to_5.csv'  # Weeks 1 to 5 (included)
              ]

In [None]:
df_movies.head()
df_movies_2 = df_movies[df_.columns]

Index(['title', 'original_title', 'date', 'duration', 'categories',
       'countries', 'star_rating', 'notes', 'reviews', 'directors', 'actors',
       'composers', 'summary', 'url_thumbnail', 'url_reviews',
       'url_similar_movies'],
      dtype='object')

In [476]:
file_name = csv_files[0]
df_movies = pd.read_csv(file_name, delimiter = ',')
df_movies['title'].values

array(['Totto-Chan, la petite fille à la fenêtre', 'Tout ira bien',
       'Quiet Life', 'Maja, une épopée finlandaise ', 'Pepe', 'La Source',
       'Eephus, le dernier tour de piste', 'VidaaMuyarchi',
       'La Clepsydre', 'Grande Maison Paris', 'Gosses de Tokyo',
       'Game Changer', 'Les Extraordinaires aventures de Morph',
       "Les Contes d'Hoffmann (The Royal Opera)", 'Toutes pour une',
       'La Voyageuse', 'L’Espion de Dieu', 'Retour en Alexandrie',
       'On the Go', 'God Save the Tuche', 'Paddington au Pérou',
       'Une guitare à la mer', 'Pororo et le Dragon Géant', 'Beurk !'],
      dtype=object)

## **Remplissage de la base de données**

In [479]:
def fill_in_db(csv_files, connector, cursor):
    ''' Fill in the database from csv files

        Args:
         - csv_files: list of csv files containing all movie informations,
         - connector: MySQL connector connected to the relevant database,
         - cursor: MySQL cursor to execute SQL statements.
    '''

    for file_name in csv_files:
        df_movies = pd.read_csv(file_name, delimiter = ',')
        
        tup_dict = fill_in_categorial_tables(df_movies, connector, cursor)
        
        df_formatted = formatting_data(df_movies, tup_dict)
        fill_in_movie_table(df_formatted, connector, cursor)

fill_in_db(csv_files, connector, cursor)

In [487]:
cursor.execute("SELECT * FROM movies;")
# print('Nb movies:', len(cursor.fetchall()))
df_temp = pd.DataFrame(cursor.fetchall())
df_temp

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,0008bdd8-c9e3-46f2-9762-e4468be52f9d,Une vie meilleure,Une vie meilleure,111,2012-01-04,2905,393,791a5669-9621-40bc-8ed1-42d25550c62d,3.1
1,000f6964-a5f1-47f4-b3cb-ed3616cbd98c,Les SEGPA,Les SEGPA,99,2022-04-20,1284,217,f4f11e31-7ddc-4633-afa0-859b3e2dd9df,1.7
2,0019962a-b8c8-42af-839c-8759d4f78fc7,Knock Knock,Knock Knock,100,2015-09-23,2835,320,c59e46ea-c738-4ae2-986d-4354cd33ba30,2.0
3,002173ed-b16c-45c4-b42c-6ff8de02bcab,Bobby : seul contre tous,Prayers for Bobby,90,2012-03-06,809,86,80fbd674-5881-4cdd-9b89-c7e6b5107647,4.3
4,0023340e-287c-4f91-88e2-1de0fb500ae1,Austerlitz,Austerlitz,166,1960-06-02,159,28,f1517dd1-8767-4cc7-ac49-9fee37af48ce,3.7
...,...,...,...,...,...,...,...,...,...
8835,ffa5f7ae-e988-42ea-9c4e-790af704a8ae,Laisse-moi entrer,Let Me In,112,2010-10-06,2420,466,03748e74-ec02-4ce1-87ce-c74b003c6de6,3.3
8836,ffae8bc1-abfd-44ed-a659-c786da9d2daa,La Piel que Habito,La Piel que Habito,117,2011-08-17,12003,839,19e22b51-d6b2-48df-ac7f-4e5a8ac9fde5,3.9
8837,ffb0e5dc-0a74-4a8d-b3e5-dec6777c3b89,American Outlaws,American Outlaws,93,2021-05-22,230,28,c74db164-0b5c-437e-bdd4-d56d39032124,2.6
8838,ffcfc388-76da-4271-bf09-1f88a97a6d96,Thor,Thor,115,2011-04-27,43552,2150,e55fd35b-b3b2-48d3-b123-8be081b69380,3.4


## **Creation d'un csv avec tous les films**

## **Quelques requêtes SQL**

**Requête** : Vérifier le nombre de données en base.

In [15]:
cursor.execute("SELECT COUNT(*) FROM movies;")
print('Nb movies:', cursor.fetchall()[0][0])

cursor.execute("SELECT COUNT(DISTINCT actor_name) FROM actors;")
print('Nb actors:', cursor.fetchall()[0][0])

cursor.execute("SELECT COUNT(DISTINCT director_name) FROM directors;")
print('Nb directors:', cursor.fetchall()[0][0])

cursor.execute("SELECT COUNT(DISTINCT composer_name) FROM composers;")
print('Nb composers:', cursor.fetchall()[0][0])

Nb movies: 8833
Nb actors: 47094
Nb directors: 3437
Nb composers: 3030


**Requête** : Afficher les films Terminator ainsi que les acteurs de ces films. 

In [5]:
query = (" \
SELECT DISTINCT title \
FROM movies \
WHERE title LIKE '%terminator%';")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns=['title'])
for item in result:
    print(item[0])
print()

query = (" \
SELECT DISTINCT a.actor_name \
FROM actors AS a \
JOIN actor_movie AS am ON am.actor_id = a.actor_id \
JOIN movies AS m ON m.movie_id = am.movie_id \
WHERE m.title LIKE '%terminator%' \
ORDER BY a.actor_name;")
cursor.execute(query)
result = cursor.fetchall()
for item in result:
    print(item[0])

Terminator 2 : le Jugement Dernier
Terminator Renaissance
Terminator Genisys
Terminator 3 : le Soulèvement des Machines
Terminator
Terminator: Dark Fate

Aaron V. Williamson
Abdul Salaam El Razzac
Alan D. Purwin
Alicia Borrachero
Anjul Nigam
Anthony Michael Frederick
Anton Yelchin
Arlette Torres
Arnold Schwarzenegger
Babak Tafti
Benjamin Wood
Bess Motta
Beth Bailey
Bill Paxton
Björn Freiberg
Blair Jackson
Brandon Stacy
Brett Azar
Brian Reece
Brian Sites
Brian Steele
Brian Thompson
Bryant Prince
Bryce Dallas Howard
Buster Reeves
Carolyn Hennesy
Cassandra Starr
Castulo Guerra
Chopper Bernet
Chris Ashworth
Chris Browning
Chris Hardwick
Christian Bale
Christine Horn
Christopher Lawford
Claire Danes
Clàudia Trujillo
Common
Courtney B. Vance
Dan Stanton
Danny Cooksey
David Andrews
Dayo Okeniyi
DeVaughn Nixon
Dick Miller
Diego Boneta
Don Lake
Don Stanton
Douglas M. Griffin
Douglas Smith (III)
Dylan Kenin
Earl Boen
Edward Furlong
Elizabeth Morehead
Emilia Clarke
Enrique Arce
Foued Zayani
Franc

**Requête** : Afficher les acteurs ayant jouer dans plusieurs films "terminator".

In [14]:
query = (" \
SELECT DISTINCT a.actor_name, COUNT(*) \
FROM actors AS a \
JOIN actor_movie AS am ON am.actor_id = a.actor_id \
JOIN movies AS m ON m.movie_id = am.movie_id \
WHERE m.title LIKE '%terminator%' \
GROUP BY a.actor_name \
HAVING COUNT(*) > 1 \
ORDER BY a.actor_name;")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns = ['name', 'nb'])
df

Unnamed: 0,name,nb
0,Arnold Schwarzenegger,5
1,Earl Boen,3
2,Edward Furlong,2
3,Linda Hamilton,3
4,Michael Papajohn,2


**Requête** : Afficher les films dont le star rating est supérieur à 4.5. 

In [445]:
query = (" \
SELECT m.title \
FROM movies AS m \
WHERE m.star_rating > 4.4;")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns=['title'])
df

Unnamed: 0,title
0,Le Seigneur des anneaux : le retour du roi
1,Lion
2,Star Wars : Episode V - L'Empire contre-attaque
3,Princesse Mononoké
4,Intouchables
...,...
70,"Alien, le huitième passager"
71,Star Wars : Episode IV - Un nouvel espoir (La ...
72,Green Book : Sur les routes du sud
73,Spider-Man : Across The Spider-Verse


**Requête** : Afficher le nom des acteurs qui ont joué dans plusieurs films. 

In [419]:
query = (" \
SELECT a.actor_name, COUNT(*) \
FROM movies AS m \
JOIN actor_movie AS am ON am.movie_id = m.movie_id \
JOIN actors AS a ON a.actor_id = am.actor_id \
GROUP BY a.actor_name \
HAVING COUNT(*) > 1 \
ORDER BY a.actor_name;")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns=['actor', 'nb films'])
df

Unnamed: 0,actor,nb films
0,50 Cent,7
1,A. D. Miles,2
2,A.J. Buckley,3
3,A.J. Cook,4
4,A.J. Langer,2
...,...,...
21383,Zohra Sehgal,2
21384,Zoltan Butuc,2
21385,Zooey Deschanel,11
21386,Zosia Mamet,2


**Requête** : Afficher le nombre d'acteurs ayant joué dans 1 film, puis 2 films etc ... 

In [420]:
query = (" \
WITH nb_films AS ( \
    SELECT a.actor_name AS name, COUNT(*) AS nb \
    FROM movies AS m \
    JOIN actor_movie AS am ON am.movie_id = m.movie_id \
    JOIN actors AS a ON a.actor_id = am.actor_id \
    GROUP BY a.actor_name \
    HAVING COUNT(*) > 0 \
    ORDER BY a.actor_name \
    ) \
SELECT DISTINCT nb, COUNT(*) \
FROM nb_films \
GROUP BY nb \
ORDER BY nb;")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns=['nb films', "nb d'acteurs"])
df
# 914 + 2 * 74 + 3 * 14 + 6 = 1110  c'est le nombre d'acteurs en comptant les répétitions OK !!!

Unnamed: 0,nb films,nb d'acteurs
0,1,25539
1,2,8404
2,3,4264
3,4,2316
4,5,1415
...,...,...
58,64,2
59,70,1
60,74,1
61,77,1


**Les films ayant le mot 'ours'**

In [441]:
query = (""" \
SELECT * \
FROM movies \
WHERE title LIKE "%mains%";""")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,3439dbdf-675b-422b-8742-5241b14e2332,De beaux lendemains,The Sweet Hereafter,112,1997-10-08,292,25,54ba0527-8e53-4d4e-8091-33de296f5c13,3.7
1,73e8eede-4d0e-4999-b7d9-8c44e5e423e8,Mains armées,Mains armées,105,2012-07-11,1783,333,6bbfa9c0-1d27-4168-ac0a-366a8a461a46,2.3
2,aaf33142-51f6-4bce-8c93-5143baa2f0aa,Pusher 2 - Du sang sur les mains,Pusher II,96,2006-07-26,3668,213,327e490a-972f-447f-a75a-750a17161027,3.7
3,ad5a82ea-71cc-42ed-9373-55fbf70f38c5,Des mains en or,Des mains en or,90,2023-06-07,918,156,35dc15ff-8703-4217-bf8d-b99cd3fe200f,3.1
4,c480f278-6aac-41e5-9ac0-0701f4632298,Edward aux mains d'argent,Edward Scissorhands,105,1991-04-10,78904,1588,15430bf3-8137-4f8a-96ba-66a94d18b031,4.4
5,c9ec5bd1-e1ca-4b9d-91f0-db5c6b659e5a,Entre ses mains,Entre ses mains,90,2005-09-21,1794,197,48f93f93-2db1-45f0-b6c6-6593678f82a8,2.9
6,d17a6c79-7643-46ca-bc87-14a66f5c072a,Des Mains en or,Gifted Hands: The Ben Carson Story,86,2023-02-01,388,25,f143237a-4872-4992-a797-ced0ea197879,3.9
7,df380591-1c22-4274-8ac3-4fba6df773b3,Laisse tes mains sur mes hanches,Laisse tes mains sur mes hanches,111,2003-04-02,761,52,e5cd60ec-4833-4c87-b81d-d0886e8ebb0e,2.3
8,e9242bc9-d7f3-4b69-a4da-bc301fb17b3b,Petites mains,Petites mains,87,2024-05-01,866,181,766b106c-efe2-49f3-a48f-1f26abf6e225,3.5


**Afficher les films ayant le même titre**

In [457]:
query = (""" \
SELECT COUNT(*), title \
FROM movies \
GROUP BY title \
HAVING COUNT(*) > 2 \
ORDER BY title;""")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns = ['counter', 'title'])
df

Unnamed: 0,counter,title
0,3,Eva
1,3,Fahrenheit 451
2,3,La Belle et La Bête
3,5,Les Misérables
4,3,Les Trois mousquetaires
5,3,Lucky Luke
6,4,Pinocchio
7,3,Robin des Bois
8,3,The Killer


**Afficher l'acteur ayant joué dans le plus grand nombre de films**

In [None]:
query = (""" \
SELECT COUNT(*), a.actor_name \
FROM actors AS a\
JOIN actor_movie AS am ON am.actor_id = a.actor_id \
JOIN movies AS m ON m.movie_id = am.movie_id \
GROUP BY actor_name \
HAVING COUNT(*) > 20 \
ORDER BY a.actor_name;""")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns = ['counter', 'actor'])
df

**Afficher les noms d'acteurs ayant joué dans plusieurs films Jurassic Park**

In [19]:
query = (""" \
SELECT a.actor_name, COUNT(*) \
FROM movies AS m \
JOIN actor_movie AS am ON am.movie_id = m.movie_id \
JOIN actors AS a ON a.actor_id = am.actor_id \
WHERE title LIKE '%jurassic Park%' \
GROUP BY a.actor_name \
HAVING COUNT(*) > 1; \
""")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns = ['actor', 'counter'])
df

Unnamed: 0,actor,counter
0,Joseph Mazzello,2
1,Richard Attenborough,2
2,Jeff Goldblum,2
3,Ariana Richards,2
4,Laura Dern,2
5,Sam Neill,2


**Afficher les acteurs ayant joué dans un film dont la musique a été composée par James Newton Howard**

In [22]:
query = (""" \
SELECT DISTINCT a.actor_name \
FROM composers AS c \
JOIN composer_movie AS cm ON cm.composer_id = c.composer_id \
JOIN movies AS m on m.movie_id = cm.movie_id \
JOIN actor_movie AS am ON am.movie_id = m.movie_id \
JOIN actors AS a ON a.actor_id = am.actor_id \
WHERE c.composer_name LIKE '%Newton Howard%' \
ORDER BY a.actor_name; \
""")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns = ['actor'])
df

Unnamed: 0,actor
0,Aaron Eckhart
1,Aasif Mandvi
2,Abigail Breslin
3,Adam Alexi-Malle
4,Adam Baldwin
...,...
1530,Zoë Kravitz
1531,Zoe Lister-Jones
1532,Zoe Renee
1533,Zooey Deschanel


**Afficher les films réunissant Depardieu et Pierre Richard**

In [48]:
query = (""" \
SELECT m.title AS 'title', \
       m.release_date AS 'date', \
       d.director_name AS 'director', \
       c.composer_name AS 'composer' \
FROM movies AS m \
JOIN actor_movie AS am ON am.movie_id = m.movie_id \
JOIN actors AS a ON a.actor_id = am.actor_id \
JOIN composer_movie AS cm ON cm.movie_id = m.movie_id \
JOIN composers AS c ON c.composer_id = cm.composer_id \
JOIN director_movie AS dm ON dm.movie_id = m.movie_id \
JOIN directors AS d ON d.director_id = dm.director_id \
WHERE a.actor_name LIKE '%Pierre Richard%'
INTERSECT
SELECT m.title AS 'title', \
       m.release_date AS 'date', \
       d.director_name AS 'director', \
       c.composer_name AS 'composer' \
FROM movies AS m \
JOIN actor_movie AS am ON am.movie_id = m.movie_id \
JOIN actors AS a ON a.actor_id = am.actor_id \
JOIN composer_movie AS cm ON cm.movie_id = m.movie_id \
JOIN composers AS c ON c.composer_id = cm.composer_id \
JOIN director_movie AS dm ON dm.movie_id = m.movie_id \
JOIN directors AS d ON d.director_id = dm.director_id \
WHERE a.actor_name LIKE '%Depardieu%' \
ORDER BY date; \
""")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns = ['title', 'date', 'director', 'composer'])
df

Unnamed: 0,title,date,director,composer
0,La Chèvre,1981-12-09,Francis Veber,Vladimir Cosma
1,Les compères,1983-11-23,Francis Veber,Vladimir Cosma
2,Les Fugitifs,1986-12-17,Francis Veber,Vladimir Cosma
3,Les Clefs de bagnole,2003-12-10,Laurent Baffie,Ramon Pipin
4,Essaye-moi,2006-03-15,Pierre-François Martin-Laval,Pierre Van Dormael
5,Umami,2023-05-17,Slony Sow,Frédéric Holyszewski


**Afficher les films réunissant Depardieu et Pierre Richard**

In [51]:
query = (""" \
SELECT m.title AS 'title', \
       m.release_date AS 'date', \
       d.director_name AS 'director', \
       c.composer_name AS 'composer' \
FROM movies AS m \
JOIN actor_movie AS am ON am.movie_id = m.movie_id \
JOIN actors AS a ON a.actor_id = am.actor_id \
JOIN composer_movie AS cm ON cm.movie_id = m.movie_id \
JOIN composers AS c ON c.composer_id = cm.composer_id \
JOIN director_movie AS dm ON dm.movie_id = m.movie_id \
JOIN directors AS d ON d.director_id = dm.director_id \
WHERE a.actor_name LIKE '%Pierre Richard%'
INTERSECT
SELECT m.title AS 'title', \
       m.release_date AS 'date', \
       d.director_name AS 'director', \
       c.composer_name AS 'composer' \
FROM movies AS m \
JOIN actor_movie AS am ON am.movie_id = m.movie_id \
JOIN actors AS a ON a.actor_id = am.actor_id \
JOIN composer_movie AS cm ON cm.movie_id = m.movie_id \
JOIN composers AS c ON c.composer_id = cm.composer_id \
JOIN director_movie AS dm ON dm.movie_id = m.movie_id \
JOIN directors AS d ON d.director_id = dm.director_id \
WHERE c.composer_name LIKE '%Cosma%' \
ORDER BY date; \
""")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns = ['title', 'date', 'director', 'composer'])
df

Unnamed: 0,title,date,director,composer
0,Alexandre le Bienheureux,1968-02-09,Yves Robert,Vladimir Cosma
1,Le Distrait,1970-12-09,Pierre Richard,Vladimir Cosma
2,Le Grand Blond avec une chaussure noire,1972-12-05,Yves Robert,Vladimir Cosma
3,La Moutarde me monte au nez,1974-10-09,Claude Zidi,Vladimir Cosma
4,Le Retour du grand blond,1974-12-18,Yves Robert,Vladimir Cosma
...,...,...,...,...
11,Le Jumeau,1984-10-10,Yves Robert,Vladimir Cosma
12,Les Rois du gag,1985-03-06,Claude Zidi,Vladimir Cosma
13,Les Fugitifs,1986-12-17,Francis Veber,Vladimir Cosma
14,Les Malheurs d'Alfred,2011-04-01,Pierre Richard,Vladimir Cosma


**toutes les jointures**

In [None]:
query = (""" \
SELECT m.title AS 'title', \
       m.release_date AS 'date', \
       d.director_name AS 'director', \
       c.composer_name AS 'composer' \
FROM movies AS m \
JOIN actor_movie AS am ON am.movie_id = m.movie_id \
JOIN actors AS a ON a.actor_id = am.actor_id \
JOIN composer_movie AS cm ON cm.movie_id = m.movie_id \
JOIN composers AS c ON c.composer_id = cm.composer_id \
JOIN director_movie AS dm ON dm.movie_id = m.movie_id \
JOIN directors AS d ON d.director_id = dm.director_id \
WHERE a.actor_name LIKE '%Depardieu%'
ORDER BY date; \
""")
cursor.execute(query)
result = cursor.fetchall()
df = pd.DataFrame(result, columns = ['title', 'date', 'director', 'composer'])
df

In [69]:
from urllib.parse import quote

item = {'actor_name' : 'Jen Law'}

o = quote(f"http://127.0.0.1:8000/movies?actor={item['actor_name']}")
o

'http%3A//127.0.0.1%3A8000/movies%3Factor%3DJen%20Law'