## 🎥 Créez un bot de web scraping pour collecter des données sur les films

**L'objectif** de ce projet est de créer un bot capable d'extraire des données du site IMDb et d'effectuer des analyses sur les films.

**Contexte du projet :**

Vous travaillez dans une agence web spécialisé dans l'analyse de données et le web scraping.

Vous devez travailler dans un projet dans lequel le client souhaite connaître les facteurs qui déterminent le succès d'un film. Pour cela, vous devez créer une base de données de films à partir d'informations recueillies sur différents sites Web, à commencer par le top 250 films d'IMDb. Pour cela, vous devez créer un programme en Python en utilisant Beautiful Soup pour récupérer les données et les stocker dans un fichier.

Puis, vous pouvez alimenter votre base de données en utilisant d'autres sites web (par exemple, Rotten tomatoes). Vous devez travailler en équipe pour pouvoir effectuer ce travail et rendre un projet en github avec un fichier scrapy.py qui contient les fonctions qui permettent de récuper un fichier csv avec les données.

🎬 Import des librairies

In [77]:
import pandas as pd
import numpy as np

import csv #exporter les données scrappées dans un fichier CSV 
import requests #charger la page et stocker son contenu dans une variable
from bs4 import BeautifulSoup

## 🟨 IMDb movies

🎬 Création d'une liste

In [206]:
titles = []
years = []
directors = []
stars1 = []
stars2 = []
stars3 = []
stars4 = []
time = []
genres=[]
imdb_ratings = []
metascores = []
votes = []
dollar = []

🎬 Toutes les pages web

In [207]:
pages = np.arange(1, 251, 50)
pages

array([  1,  51, 101, 151, 201])

🎬 Langue

In [208]:
headers = {'Accept-Language': 'en-US, en;q=0.5'}

🎬 Données structurées de la page 

In [209]:
# Stocker chacune des urls de 50 films
for page in pages:
    # Récupérer le contenu de chaque url
    page = requests.get('https://www.imdb.com/search/title/?groups=top_250&sort=user_rating,desc&start' + str(page) + '&ref_=adv_nxt', headers=headers)
    soup = BeautifulSoup(page.text, 'html.parser')
    
    # Méthode pour extraire tous les conteneurs qui ont un attribut de : find_all() - div - classlister-item mode-advanced
    movie_div = soup.find_all('div', class_='lister-item mode-advanced')
    
    for container in movie_div:
        # Le nom du film
        name = container.h3.a.text
        titles.append(name)
        
        # L'année du film
        year = container.h3.find('span', class_='lister-item-year').text
        years.append(year)
        
        # La durée du film
        runtime = container.find('span', class_='runtime').text if container.p.find('span', class_='runtime') else '-'
        time.append(runtime)
        
        # Directeur du film
        director = container.find('p',class_='').find_all('a')[0].text
        directors.append(director)
        
        # Stars du film
        star = container.find('p',class_='').find_all('a')[1].text
        stars1.append(star)
        
        star = container.find('p',class_='').find_all('a')[2].text
        stars2.append(star)
        
        star = container.find('p',class_='').find_all('a')[3].text
        stars3.append(star)
        
        star = container.find('p',class_='').find_all('a')[4].text
        stars4.append(star)

        # Genre du film
        genre = container.find('span', class_="genre").text
        genres.append(genre)
        
        # La note du film
        imdb = float(container.strong.text)
        imdb_ratings.append(imdb)
        
        # Le metascore du film
        m_score = container.find('span', class_='metascore').text if container.find('span', class_='metascore') else '-'
        metascores.append(m_score)
        
        # Nombre de vote sur le film
        nv = container.find_all('span', attrs={'name':'nv'})
        vote = nv[0].text
        votes.append(vote)
        
        # Prix du film
        grosses = nv[1].text if len(nv) > 1 else '-'
        dollar.append(grosses)

In [210]:
page

<Response [200]>

🎬 Liens du site

In [211]:
# https://www.imdb.com/search/title/?groups=top_250&sort=user_rating,desc&ref_=adv_prv
# https://www.imdb.com/search/title/?groups=top_250&sort=user_rating,desc&start=51&ref_=adv_nxt

🎬 Viser la partie du html dont on veut obtenir l'information

In [212]:
#print(type(movie_div))
#print(len(movie_div))

🎬 Affichage du premier film en HTML

In [213]:
first_movie = movie_div[0]
first_movie

<div class="lister-item mode-advanced">
<div class="lister-top-right">
<div class="ribbonize" data-caller="filmosearch" data-tconst="tt0111161"></div>
</div>
<div class="lister-item-image float-left">
<a href="/title/tt0111161/"> <img alt="The Shawshank Redemption" class="loadlate" data-tconst="tt0111161" height="98" loadlate="https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmNhMC00ZDIzLWFmNTEtODM1ZmRlYWMwMWFmXkEyXkFqcGdeQXVyMTMxODk2OTU@._V1_UX67_CR0,0,67,98_AL_.jpg" src="https://m.media-amazon.com/images/S/sash/4FyxwxECzL-U1J8.png" width="67"/>
</a> </div>
<div class="lister-item-content">
<h3 class="lister-item-header">
<span class="lister-item-index unbold text-primary">1.</span>
<a href="/title/tt0111161/">The Shawshank Redemption</a>
<span class="lister-item-year text-muted unbold">(1994)</span>
</h3>
<p class="text-muted">
<span class="certificate">R</span>
<span class="ghost">|</span>
<span class="runtime">142 min</span>
<span class="ghost">|</span>
<span class="genre">
Dram

🎬 Création du DataFrame

In [214]:
movies = pd.DataFrame({'Films':titles,
                       'Année':years,
                       'Directeur':directors,
                       'Stars 1': stars1,
                       'Stars 2': stars2,
                       'Stars 3': stars3,
                       'Stars 4': stars4,
                       'Durée':time,
                       'Genre':genres,
                       'Note':imdb_ratings,
                       'Metascore':metascores,
                       'Votes':votes,
                       'Prix':dollar}).replace("\n","", regex=True)
movies.head()

Unnamed: 0,Films,Année,Directeur,Stars 1,Stars 2,Stars 3,Stars 4,Durée,Genre,Note,Metascore,Votes,Prix
0,The Shawshank Redemption,(1994),Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142 min,Drama,9.3,80,2481165,$28.34M
1,The Godfather,(1972),Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175 min,"Crime, Drama",9.2,100,1713628,$134.97M
2,The Dark Knight,(2008),Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,152 min,"Action, Crime, Drama",9.0,84,2435522,$534.86M
3,The Godfather: Part II,(1974),Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,202 min,"Crime, Drama",9.0,90,1190311,$57.30M
4,12 Angry Men,(1957),Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,96 min,"Crime, Drama",9.0,96,734346,$4.36M


🎬 Data type

In [215]:
movies.dtypes

Films         object
Année         object
Directeur     object
Stars 1       object
Stars 2       object
Stars 3       object
Stars 4       object
Durée         object
Genre         object
Note         float64
Metascore     object
Votes         object
Prix          object
dtype: object

🎬 Suppression des parenthèses sur l'année

In [216]:
movies['Année'] = movies['Année'].str.extract('(\d+)').astype(int)
movies.head()

Unnamed: 0,Films,Année,Directeur,Stars 1,Stars 2,Stars 3,Stars 4,Durée,Genre,Note,Metascore,Votes,Prix
0,The Shawshank Redemption,1994,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142 min,Drama,9.3,80,2481165,$28.34M
1,The Godfather,1972,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175 min,"Crime, Drama",9.2,100,1713628,$134.97M
2,The Dark Knight,2008,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,152 min,"Action, Crime, Drama",9.0,84,2435522,$534.86M
3,The Godfather: Part II,1974,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,202 min,"Crime, Drama",9.0,90,1190311,$57.30M
4,12 Angry Men,1957,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,96 min,"Crime, Drama",9.0,96,734346,$4.36M


🎬 Suppression de min sur Durée

In [217]:
movies['Durée'] = movies['Durée'].str.extract('(\d+)').astype(int)
movies.head()

Unnamed: 0,Films,Année,Directeur,Stars 1,Stars 2,Stars 3,Stars 4,Durée,Genre,Note,Metascore,Votes,Prix
0,The Shawshank Redemption,1994,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142,Drama,9.3,80,2481165,$28.34M
1,The Godfather,1972,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175,"Crime, Drama",9.2,100,1713628,$134.97M
2,The Dark Knight,2008,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,152,"Action, Crime, Drama",9.0,84,2435522,$534.86M
3,The Godfather: Part II,1974,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,202,"Crime, Drama",9.0,90,1190311,$57.30M
4,12 Angry Men,1957,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,96,"Crime, Drama",9.0,96,734346,$4.36M


🎬 Suppression de la virgule sur Votes

In [218]:
movies['Votes'] = movies['Votes'].str.replace(',', '').astype(int)
movies.head()

Unnamed: 0,Films,Année,Directeur,Stars 1,Stars 2,Stars 3,Stars 4,Durée,Genre,Note,Metascore,Votes,Prix
0,The Shawshank Redemption,1994,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142,Drama,9.3,80,2481165,$28.34M
1,The Godfather,1972,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175,"Crime, Drama",9.2,100,1713628,$134.97M
2,The Dark Knight,2008,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,152,"Action, Crime, Drama",9.0,84,2435522,$534.86M
3,The Godfather: Part II,1974,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,202,"Crime, Drama",9.0,90,1190311,$57.30M
4,12 Angry Men,1957,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,96,"Crime, Drama",9.0,96,734346,$4.36M


🎬 Metascore en float

In [219]:
movies['Metascore'] = movies['Metascore'].str.extract('(\d+)')
movies['Metascore'] = pd.to_numeric(movies['Metascore'], errors='coerce')
movies.head()

Unnamed: 0,Films,Année,Directeur,Stars 1,Stars 2,Stars 3,Stars 4,Durée,Genre,Note,Metascore,Votes,Prix
0,The Shawshank Redemption,1994,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142,Drama,9.3,80.0,2481165,$28.34M
1,The Godfather,1972,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175,"Crime, Drama",9.2,100.0,1713628,$134.97M
2,The Dark Knight,2008,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,152,"Action, Crime, Drama",9.0,84.0,2435522,$534.86M
3,The Godfather: Part II,1974,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,202,"Crime, Drama",9.0,90.0,1190311,$57.30M
4,12 Angry Men,1957,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,96,"Crime, Drama",9.0,96.0,734346,$4.36M


🎬 Nettoyage de la colonne Prix

In [220]:
movies['Prix'] = movies['Prix'].map(lambda x: x.lstrip('$').rstrip('M'))
movies['Prix'] = pd.to_numeric(movies['Prix'], errors='coerce')
movies

Unnamed: 0,Films,Année,Directeur,Stars 1,Stars 2,Stars 3,Stars 4,Durée,Genre,Note,Metascore,Votes,Prix
0,The Shawshank Redemption,1994,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142,Drama,9.3,80.0,2481165,28.34
1,The Godfather,1972,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175,"Crime, Drama",9.2,100.0,1713628,134.97
2,The Dark Knight,2008,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,152,"Action, Crime, Drama",9.0,84.0,2435522,534.86
3,The Godfather: Part II,1974,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,202,"Crime, Drama",9.0,90.0,1190311,57.30
4,12 Angry Men,1957,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,96,"Crime, Drama",9.0,96.0,734346,4.36
...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,Once Upon a Time in the West,1968,Sergio Leone,Henry Fonda,Charles Bronson,Claudia Cardinale,Jason Robards,165,Western,8.5,80.0,315893,5.32
246,Psycho,1960,Alfred Hitchcock,Anthony Perkins,Janet Leigh,Vera Miles,John Gavin,109,"Horror, Mystery, Thriller",8.5,97.0,635037,32.00
247,Pather Panchali,1955,Satyajit Ray,Kanu Bannerjee,Karuna Bannerjee,Subir Banerjee,Chunibala Devi,125,Drama,8.5,,29010,0.54
248,Rear Window,1954,Alfred Hitchcock,James Stewart,Grace Kelly,Wendell Corey,Thelma Ritter,112,"Mystery, Thriller",8.5,100.0,468090,36.76


### 🟨 DataFrame final IMDb movies !

In [221]:
movies

Unnamed: 0,Films,Année,Directeur,Stars 1,Stars 2,Stars 3,Stars 4,Durée,Genre,Note,Metascore,Votes,Prix
0,The Shawshank Redemption,1994,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142,Drama,9.3,80.0,2481165,28.34
1,The Godfather,1972,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175,"Crime, Drama",9.2,100.0,1713628,134.97
2,The Dark Knight,2008,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,152,"Action, Crime, Drama",9.0,84.0,2435522,534.86
3,The Godfather: Part II,1974,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,202,"Crime, Drama",9.0,90.0,1190311,57.30
4,12 Angry Men,1957,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,96,"Crime, Drama",9.0,96.0,734346,4.36
...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,Once Upon a Time in the West,1968,Sergio Leone,Henry Fonda,Charles Bronson,Claudia Cardinale,Jason Robards,165,Western,8.5,80.0,315893,5.32
246,Psycho,1960,Alfred Hitchcock,Anthony Perkins,Janet Leigh,Vera Miles,John Gavin,109,"Horror, Mystery, Thriller",8.5,97.0,635037,32.00
247,Pather Panchali,1955,Satyajit Ray,Kanu Bannerjee,Karuna Bannerjee,Subir Banerjee,Chunibala Devi,125,Drama,8.5,,29010,0.54
248,Rear Window,1954,Alfred Hitchcock,James Stewart,Grace Kelly,Wendell Corey,Thelma Ritter,112,"Mystery, Thriller",8.5,100.0,468090,36.76


🎬 Data types

In [18]:
movies.dtypes

Films         object
Année          int32
Directeur     object
Durée          int32
Genre         object
Note         float64
Metascore    float64
Votes          int32
Prix         float64
dtype: object

🎬 Nombres de types

In [19]:
movies_dtype = movies.dtypes
movies_dtype.value_counts()

int32      3
object     3
float64    3
dtype: int64

🎬 Valeurs Nan

In [20]:
df_nan = pd.DataFrame({'Nan':movies.isna().sum()})
df_nan['%nan'] = df_nan['Nan']/movies.shape[0]*100
round(df_nan,2).sort_values(by='%nan' , ascending=False)

Unnamed: 0,Nan,%nan
Prix,15,6.0
Metascore,5,2.0
Films,0,0.0
Année,0,0.0
Directeur,0,0.0
Durée,0,0.0
Genre,0,0.0
Note,0,0.0
Votes,0,0.0


🎬 Suppression des valeurs Nan

In [21]:
movies = movies.dropna(axis=0)

In [22]:
df_nan = pd.DataFrame({'Nan':movies.isna().sum()})
df_nan['%nan'] = df_nan['Nan']/movies.shape[0]*100
round(df_nan,2).sort_values(by='%nan' , ascending=False)

Unnamed: 0,Nan,%nan
Films,0,0.0
Année,0,0.0
Directeur,0,0.0
Durée,0,0.0
Genre,0,0.0
Note,0,0.0
Metascore,0,0.0
Votes,0,0.0
Prix,0,0.0


### 💡 Nouveau csv

In [23]:
movies.to_csv('movies.csv')

## 🍅 Tomatoes movies

📼 url site tomatoes

In [222]:
uri = "https://www.rottentomatoes.com/m/"

In [223]:
titres = movies['Films'].replace(' ','_', regex=True)

uri_list = []

for i in titres:
    response = uri + i
    uri_list.append(response)

In [None]:
#Création d'une liste avec l'ensemble des liens potentiels pour chaque film des 250 
# uri = 'https://www.rottentomatoes.com/m/'
# urilist = []
# for i in data2["title"]:
#     response = (f'{uri}{str(i)}')  
#     urilist.append(response)

In [224]:
uri_list

['https://www.rottentomatoes.com/m/The_Shawshank_Redemption',
 'https://www.rottentomatoes.com/m/The_Godfather',
 'https://www.rottentomatoes.com/m/The_Dark_Knight',
 'https://www.rottentomatoes.com/m/The_Godfather:_Part_II',
 'https://www.rottentomatoes.com/m/12_Angry_Men',
 'https://www.rottentomatoes.com/m/The_Lord_of_the_Rings:_The_Return_of_the_King',
 'https://www.rottentomatoes.com/m/Pulp_Fiction',
 "https://www.rottentomatoes.com/m/Schindler's_List",
 'https://www.rottentomatoes.com/m/Inception',
 'https://www.rottentomatoes.com/m/Fight_Club',
 'https://www.rottentomatoes.com/m/The_Lord_of_the_Rings:_The_Fellowship_of_the_Ring',
 'https://www.rottentomatoes.com/m/Forrest_Gump',
 'https://www.rottentomatoes.com/m/The_Good,_the_Bad_and_the_Ugly',
 'https://www.rottentomatoes.com/m/The_Lord_of_the_Rings:_The_Two_Towers',
 'https://www.rottentomatoes.com/m/The_Matrix',
 'https://www.rottentomatoes.com/m/Goodfellas',
 'https://www.rottentomatoes.com/m/Star_Wars:_Episode_V_-_The_Empi

🎬 Pop score

In [225]:
scores_data=[]

# Boucler sur les urls définies plus haut
for url in uri_list:
    # récupérer le html de la page
    pag = requests.get(url)
    
    if pag.status_code == 200:
            
    # ajouter les données de la page html à BeautifulSoup   
        soup1 = BeautifulSoup(pag.content, "html.parser")
   
        audience = soup1.find("score-board")["audiencescore"]
    
        scores_data.append(audience)
        
    else: 
        audience = "Not found"
        
        scores_data.append(audience)

    
scores_data

['98',
 '98',
 '94',
 'Not found',
 'Not found',
 'Not found',
 '96',
 'Not found',
 '91',
 '96',
 'Not found',
 '95',
 'Not found',
 'Not found',
 '85',
 '97',
 'Not found',
 'Not found',
 '',
 '86',
 '97',
 '96',
 '95',
 '94',
 'Not found',
 'Not found',
 '95',
 'Not found',
 'Not found',
 'Not found',
 'Not found',
 'Not found',
 '93',
 'Not found',
 '94',
 '33',
 '87',
 '96',
 'Not found',
 'Not found',
 '93',
 'Not found',
 '97',
 '95',
 '94',
 '95',
 '95',
 '94',
 '95',
 'Not found',
 '98',
 '98',
 '94',
 'Not found',
 'Not found',
 'Not found',
 '96',
 'Not found',
 '91',
 '96',
 'Not found',
 '95',
 'Not found',
 'Not found',
 '85',
 '97',
 'Not found',
 'Not found',
 '',
 '86',
 '97',
 '96',
 '95',
 '94',
 'Not found',
 'Not found',
 '95',
 'Not found',
 'Not found',
 'Not found',
 'Not found',
 'Not found',
 '93',
 'Not found',
 '94',
 '33',
 '87',
 '96',
 'Not found',
 'Not found',
 '93',
 'Not found',
 '97',
 '95',
 '94',
 '95',
 '95',
 '94',
 '95',
 'Not found',
 '98',
 '9

In [226]:
df_scores_pop = pd.DataFrame(scores_data, columns=['pop_score'])
df_scores_pop

Unnamed: 0,pop_score
0,98
1,98
2,94
3,Not found
4,Not found
...,...
245,95
246,95
247,94
248,95


🎬 Score tomatoes

In [227]:
tomato_data=[]

# Boucler sur les urls définies plus haut
for url in uri_list:
    # récupérer le html de la page
    pag = requests.get(url)
    
    if pag.status_code == 200:
            
    # ajouter les données de la page html à BeautifulSoup   
        soup1 = BeautifulSoup(pag.content, "html.parser")
   
        audience = soup1.find("score-board")["tomatometerscore"]
    
        tomato_data.append(audience)
        
    else: 
        audience = "Not found"
        
        tomato_data.append(audience)

    
tomato_data

['91',
 '97',
 '94',
 'Not found',
 'Not found',
 'Not found',
 '92',
 'Not found',
 '87',
 '79',
 'Not found',
 '71',
 'Not found',
 'Not found',
 '88',
 '96',
 'Not found',
 'Not found',
 '',
 '72',
 '91',
 '97',
 '93',
 '78',
 'Not found',
 'Not found',
 '96',
 'Not found',
 'Not found',
 'Not found',
 'Not found',
 'Not found',
 '75',
 'Not found',
 '90',
 '',
 '77',
 '83',
 'Not found',
 'Not found',
 '93',
 'Not found',
 '90',
 '100',
 '96',
 '95',
 '96',
 '97',
 '98',
 'Not found',
 '91',
 '97',
 '94',
 'Not found',
 'Not found',
 'Not found',
 '92',
 'Not found',
 '87',
 '79',
 'Not found',
 '71',
 'Not found',
 'Not found',
 '88',
 '96',
 'Not found',
 'Not found',
 '',
 '72',
 '91',
 '97',
 '93',
 '78',
 'Not found',
 'Not found',
 '96',
 'Not found',
 'Not found',
 'Not found',
 'Not found',
 'Not found',
 '75',
 'Not found',
 '90',
 '',
 '77',
 '83',
 'Not found',
 'Not found',
 '93',
 'Not found',
 '90',
 '100',
 '96',
 '95',
 '96',
 '97',
 '98',
 'Not found',
 '91',
 '97'

In [228]:
df_scores_tomato = pd.DataFrame(tomato_data, columns=['tomato_score'])
df_scores_tomato

Unnamed: 0,tomato_score
0,91
1,97
2,94
3,Not found
4,Not found
...,...
245,95
246,96
247,97
248,98


🎬 Merge avec tous les DataFrame

In [229]:
df_inner = movies.merge(df_scores_tomato, how='inner', left_index=True, right_index=True)

In [230]:
df_to = df_inner.merge(df_scores_pop, how='inner', left_index=True, right_index=True)

## ✨ Dataset final

In [231]:
df_to

Unnamed: 0,Films,Année,Directeur,Stars 1,Stars 2,Stars 3,Stars 4,Durée,Genre,Note,Metascore,Votes,Prix,tomato_score,pop_score
0,The Shawshank Redemption,1994,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,142,Drama,9.3,80.0,2481165,28.34,91,98
1,The Godfather,1972,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,175,"Crime, Drama",9.2,100.0,1713628,134.97,97,98
2,The Dark Knight,2008,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,152,"Action, Crime, Drama",9.0,84.0,2435522,534.86,94,94
3,The Godfather: Part II,1974,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,202,"Crime, Drama",9.0,90.0,1190311,57.30,Not found,Not found
4,12 Angry Men,1957,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,96,"Crime, Drama",9.0,96.0,734346,4.36,Not found,Not found
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,Once Upon a Time in the West,1968,Sergio Leone,Henry Fonda,Charles Bronson,Claudia Cardinale,Jason Robards,165,Western,8.5,80.0,315893,5.32,95,95
246,Psycho,1960,Alfred Hitchcock,Anthony Perkins,Janet Leigh,Vera Miles,John Gavin,109,"Horror, Mystery, Thriller",8.5,97.0,635037,32.00,96,95
247,Pather Panchali,1955,Satyajit Ray,Kanu Bannerjee,Karuna Bannerjee,Subir Banerjee,Chunibala Devi,125,Drama,8.5,,29010,0.54,97,94
248,Rear Window,1954,Alfred Hitchcock,James Stewart,Grace Kelly,Wendell Corey,Thelma Ritter,112,"Mystery, Thriller",8.5,100.0,468090,36.76,98,95
