# Web Scrapping Project - Movies screening

The goal of this project is to allow users to enter their address, to check movie theaters near them (restricted to UGC, CGR, MK2 and Pathé-Gaumont) and to see some of the movies that are currently being shown.

The movie information (title, director, genres, global ratings) are taken from Letterboxd.com, while the screenings date and hour are taken from the respective movie theater's website.

The user can enter a few genres of movies that he likes, and restrict / sort the movies by their preference. The genres on Letterboxd are shown in order of importance for said movie, making it easier to give a movie a score of potential from its info, although totally arbitrary.

Another goal of this project is to promote smaller movies, French movies, art house films etc... in order to make the user want to add to their cinematographic culture.

### Getting the list of movie theaters near a specific address

In [17]:
from geopy.geocoders import Nominatim
from geopy.distance import geodesic
import pandas as pd
import numpy as np
import time
import selenium
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import Select


In [18]:
cinemasFrance = pd.read_excel("C:\\Users\\utilisateur\\Documents\\SEMESTRE 9\\Webscraping et Data Processing\\DonnéesCartographie2022Cinemas.xlsx")

In [19]:
cinemasFrance["code INSEE"] = cinemasFrance["code INSEE"].apply(lambda x: x[:2] + "0" + x[3:] if x[:2] == "75" else x)

In [20]:
cinemasFrance = cinemasFrance[(cinemasFrance["programmateur"] == "CGR") |
              (cinemasFrance["programmateur"] == "UGC") |
              (cinemasFrance["programmateur"] == "MK2") |
              (cinemasFrance["programmateur"] == "PATHE-GAUMONT")]

In [21]:
url_mk2 = pd.DataFrame(columns=['nom', 'url'])

# Data to be added line by line
rows = [
    ['MK2 BEAUBOURG', "https://www.mk2.com/salle/mk2-beaubourg"],
    ['MK2 ODEON COTE SAINT-MICHEL', "https://www.mk2.com/salle/mk2-odeon-st-germain-st-michel"],
    ['MK2 ODEON COTE SAINT-GERMAIN', "https://www.mk2.com/salle/mk2-odeon-st-germain-st-michel"],
    ['MK2 PARNASSE', "https://www.mk2.com/salle/mk2-parnasse"],
    ['MK2 QUAI DE SEINE', "https://www.mk2.com/salle/mk2-quai-seine-quai-loire"],
    ['MK2 GAMBETTA', "https://www.mk2.com/salle/mk2-gambetta"],
    ['MK2 BASTILLE COTE FAUBOURG SAINT-ANTOINE', "https://www.mk2.com/salle/mk2-bastille-beaumarchais-fg-st-antoine"],
    ['MK2 BASTILLE COTE BEAUMARCHAIS', "https://www.mk2.com/salle/mk2-bastille-beaumarchais-fg-st-antoine"],
    ['MK2 NATION', "https://www.mk2.com/salle/mk2-nation"],
    ['MK2 BIBLIOTHEQUE', "https://www.mk2.com/salle/mk2-bibliotheque"],
    ['MK2 A&E', "https://www.mk2.com/salle/mk2-bibliotheque"],
    ['MK2 QUAI DE LOIRE', "https://www.mk2.com/salle/mk2-quai-seine-quai-loire"]

]

# Add data to the DataFrame
for i, row in enumerate(rows):
    url_mk2.loc[i] = row

In [22]:
url_mk2

Unnamed: 0,nom,url
0,MK2 BEAUBOURG,https://www.mk2.com/salle/mk2-beaubourg
1,MK2 ODEON COTE SAINT-MICHEL,https://www.mk2.com/salle/mk2-odeon-st-germain...
2,MK2 ODEON COTE SAINT-GERMAIN,https://www.mk2.com/salle/mk2-odeon-st-germain...
3,MK2 PARNASSE,https://www.mk2.com/salle/mk2-parnasse
4,MK2 QUAI DE SEINE,https://www.mk2.com/salle/mk2-quai-seine-quai-...
5,MK2 GAMBETTA,https://www.mk2.com/salle/mk2-gambetta
6,MK2 BASTILLE COTE FAUBOURG SAINT-ANTOINE,https://www.mk2.com/salle/mk2-bastille-beaumar...
7,MK2 BASTILLE COTE BEAUMARCHAIS,https://www.mk2.com/salle/mk2-bastille-beaumar...
8,MK2 NATION,https://www.mk2.com/salle/mk2-nation
9,MK2 BIBLIOTHEQUE,https://www.mk2.com/salle/mk2-bibliotheque


In [23]:
cinemasFrance[cinemasFrance["programmateur"] == "MK2"]

Unnamed: 0,régionCNC,N° auto,nom,région administrative,adresse,code INSEE,commune,population de la commune,DEP,N°UU,...,nombre de films en semaine 1,PdM en entrées des films français,PdM en entrées des films américains,PdM en entrées des films européens,PdM en entrées des autres films,films Art et Essai,part des séances de films Art et Essai,PdM en entrées des films Art et Essai,latitude,longitude
11,1,531,MK2 BEAUBOURG,ILE-DE-FRANCE,50 RUE RAMBUTEAU,75003,Paris 3e Arrondissement,33651,75,851,...,110,54.524733,9.285017,14.927946,21.262304,183,96.674203,94.919975,48.861555,2.352217
13,1,721,MK2 ODEON COTE SAINT-MICHEL,ILE-DE-FRANCE,7 RUE HAUTEFEUILLE,75006,Paris 6e Arrondissement,40452,75,851,...,73,60.638438,10.993648,13.352315,15.015599,150,87.954134,86.738836,48.852188,2.342788
14,1,731,MK2 ODEON COTE SAINT-GERMAIN,ILE-DE-FRANCE,113 BD ST GERMAIN,75006,Paris 6e Arrondissement,40452,75,851,...,105,59.64864,20.457385,13.573458,6.320517,88,66.68017,63.210473,48.852437,2.338266
18,1,801,MK2 PARNASSE,ILE-DE-FRANCE,11 RUE JULES CHAPLAIN,75006,Paris 6e Arrondissement,40452,75,851,...,6,66.194284,9.21652,13.010764,11.578432,160,81.760248,81.243041,48.842813,2.330525
37,1,4731,MK2 QUAI DE SEINE,ILE-DE-FRANCE,14 QUAI DE LA SEINE,75019,Paris 19e Arrondissement,184156,75,851,...,126,70.792512,8.255949,10.558312,10.393226,177,91.530032,92.00511,48.885073,2.371493
38,1,5691,MK2 GAMBETTA,ILE-DE-FRANCE,6 RUE BELGRAND,75020,Paris 20e Arrondissement,193044,75,851,...,107,56.158032,23.422453,13.570892,6.848624,167,68.529057,70.616069,48.864763,2.399555
39,1,5841,MK2 BASTILLE COTE FAUBOURG SAINT-ANTOINE,ILE-DE-FRANCE,5 RUE DU FAUBOURG ST ANTOINE,75011,Paris 11e Arrondissement,145124,75,851,...,58,64.323535,9.279556,8.367289,18.02962,86,94.584172,94.731949,48.853157,2.370719
41,1,5861,MK2 BASTILLE COTE BEAUMARCHAIS,ILE-DE-FRANCE,3 BD R LENOIR 4/6 BD BEAUMARCHAI,75011,Paris 11e Arrondissement,145124,75,851,...,65,62.58056,18.357678,12.321959,6.739802,111,87.507769,83.881851,48.854674,2.369381
42,1,6212,MK2 NATION,ILE-DE-FRANCE,133 BOULEVARD DIDEROT,75012,Paris 12e Arrondissement,141275,75,851,...,86,63.445312,20.094993,11.755308,4.704387,127,50.871758,50.22783,48.848213,2.39311
68,1,9600,MK2 BIBLIOTHEQUE,ILE-DE-FRANCE,128 A 162 AVENUE DE FRANCE,75013,Paris 13e Arrondissement,179013,75,851,...,294,28.853266,45.848722,16.889863,8.408149,217,38.028376,28.865777,48.833718,2.373922


In [24]:
adress_user = "12 avenue Leonard de vinci, 92400, Courbevoie"


loc = Nominatim(user_agent="Geopy Library")
getLoc = loc.geocode(adress_user)
print(getLoc.address)

# printing latitude and longitude
print("Latitude = ", getLoc.latitude, "\n")
print("Longitude = ", getLoc.longitude)

12, Avenue Léonard de Vinci, Faubourg de l'Arche, Quartier du Faubourg de l'Arche, Courbevoie, Nanterre, Hauts-de-Seine, Île-de-France, France métropolitaine, 92400, France
Latitude =  48.8964618 

Longitude =  2.2363532


In [25]:
dist_max_km = 10


coord_user = (getLoc.latitude, getLoc.longitude)

#print(geodesic(coord_user, coords_2).km)

cinemasCloseBool = cinemasFrance.apply(
    lambda x: geodesic(
        coord_user,
        (x["latitude"], x["longitude"])
        ).km < dist_max_km,
    axis = 1
    )

In [26]:
cinemasProches = cinemasFrance[cinemasCloseBool]

In [27]:
cinemasProchesUGC = cinemasProches[cinemasProches["programmateur"] == "UGC"]
cinemasProchesCGR = cinemasProches[cinemasProches["programmateur"] == "CGR"]
cinemasProchesPathe = cinemasProches[cinemasProches["programmateur"] == "PATHE-GAUMONT"]
cinemasProchesMK2 = cinemasProches[cinemasProches["programmateur"] == "MK2"]

In [28]:
cinemasProchesCGR

Unnamed: 0,régionCNC,N° auto,nom,région administrative,adresse,code INSEE,commune,population de la commune,DEP,N°UU,...,nombre de films en semaine 1,PdM en entrées des films français,PdM en entrées des films américains,PdM en entrées des films européens,PdM en entrées des autres films,films Art et Essai,part des séances de films Art et Essai,PdM en entrées des films Art et Essai,latitude,longitude
151,2,148324,MEGA CGR,ILE-DE-FRANCE,5 AVENUE DU MARECHAL JOFFRE,93031,Épinay-sur-Seine,54569,93,851,...,187,18.727098,51.203756,10.479666,19.589481,34,5.933682,1.715426,48.957859,2.301996
215,2,284502,CAP CINEMA NANTERRE,ILE-DE-FRANCE,200 ALLEE DE CORSE,92050,Nanterre,96402,92,851,...,161,18.43528,59.333328,11.56418,10.667212,19,4.608783,1.537109,48.900167,2.21295


In [29]:
cinemasProchesMK2

Unnamed: 0,régionCNC,N° auto,nom,région administrative,adresse,code INSEE,commune,population de la commune,DEP,N°UU,...,nombre de films en semaine 1,PdM en entrées des films français,PdM en entrées des films américains,PdM en entrées des films européens,PdM en entrées des autres films,films Art et Essai,part des séances de films Art et Essai,PdM en entrées des films Art et Essai,latitude,longitude
11,1,531,MK2 BEAUBOURG,ILE-DE-FRANCE,50 RUE RAMBUTEAU,75003,Paris 3e Arrondissement,33651,75,851,...,110,54.524733,9.285017,14.927946,21.262304,183,96.674203,94.919975,48.861555,2.352217
13,1,721,MK2 ODEON COTE SAINT-MICHEL,ILE-DE-FRANCE,7 RUE HAUTEFEUILLE,75006,Paris 6e Arrondissement,40452,75,851,...,73,60.638438,10.993648,13.352315,15.015599,150,87.954134,86.738836,48.852188,2.342788
14,1,731,MK2 ODEON COTE SAINT-GERMAIN,ILE-DE-FRANCE,113 BD ST GERMAIN,75006,Paris 6e Arrondissement,40452,75,851,...,105,59.64864,20.457385,13.573458,6.320517,88,66.68017,63.210473,48.852437,2.338266
18,1,801,MK2 PARNASSE,ILE-DE-FRANCE,11 RUE JULES CHAPLAIN,75006,Paris 6e Arrondissement,40452,75,851,...,6,66.194284,9.21652,13.010764,11.578432,160,81.760248,81.243041,48.842813,2.330525
37,1,4731,MK2 QUAI DE SEINE,ILE-DE-FRANCE,14 QUAI DE LA SEINE,75019,Paris 19e Arrondissement,184156,75,851,...,126,70.792512,8.255949,10.558312,10.393226,177,91.530032,92.00511,48.885073,2.371493


### The Letterboxd functions

In [60]:
def get_letterboxd_info(driver, movie):
    #We set the current URL of the driver to the Letterboxd main page
    url_letterboxd = "https://letterboxd.com/"
    driver.get(url_letterboxd)
    
    #We need to let the browser load everything, so we buffer the code for a second.
    #Somehow, the implicit_wait function does not work in this case...
    driver.implicitly_wait(1)
    time.sleep(1)
    
    #We need to discard the pop-ups of the cookies on the page
    pop_up = driver.find_elements(By.CLASS_NAME, 'fc-cta-do-not-consent')
    if pop_up:
        pop_up[0].click()

    #Now, we search for the movie in question
    search_bar = driver.find_element(By.ID, 'search-q')
    search_bar.clear()
    search_bar.send_keys(movie)
    search_bar.send_keys(Keys.RETURN)
    link_element = driver.find_element(By.XPATH, '//a[text()="Films"]')
    link_element.click()

    #We click on the first movie in the list
    results = driver.find_elements(By.CLASS_NAME, "results")
    if not results:
        return {}
    movies_details = results[0].find_elements(By.CLASS_NAME, "film-detail-content")
    header = movies_details[0].find_element(By.CLASS_NAME, "headline-2")
    movie_link = header.find_element(By.TAG_NAME, "a")
    movie_link.click()

    #Now we need to get the info on the movie: the director's name, the ratings and the genre
    #We scroll down the page to avoid the ads.
    time.sleep(1)
    footer = driver.find_element(By.TAG_NAME, "footer")
    delta_y = footer.rect['y']
    ActionChains(driver)\
        .scroll_by_amount(0, 500)\
        .perform()

    driver.find_element(By.XPATH, '//*[@id="crew"]').click()
    directors = driver.find_elements(By.XPATH, '//*[@id="tab-crew"]/div[1]/p/a')
    directors = list(map(lambda x: x.text, directors))

    ratings = driver.find_elements(By.CLASS_NAME, "average-rating")
    if ratings:
        rating = ratings[0].find_element(By.TAG_NAME, "a").text
    else:
        rating = np.nan
    
    driver.find_element(By.XPATH, '//*[@id="tabbed-content"]/header/ul/li[4]/a').click()
    genres = driver.find_elements(By.ID, "tab-genres")
    if genres:
        genres = genres[0].find_elements(By.TAG_NAME, "a")
        genres = list(map(lambda x: x.text, genres))
    else:
        genres = []

    return {'ratings': rating, 'genres': genres, 'directors': directors}

In [31]:
def update_list_of_screenings(driver, list):
    for i in range(len(list)):
        info = get_letterboxd_info(driver, list[i]['title'])
        list[i].update(info)
    return list

### Getting the info of UGC

Now, we will scrap some data from the UGC site. We'll try to find the first cinema for now, just to test.

In [10]:
#!pip install selenium

In [22]:
driver = webdriver.Chrome()

In [13]:
def find_UGC_cinema(driver, theater_name):
    #We set the current URL of the driver to the search page for UCG movie theaters
    url_ugc = "https://www.ugc.fr/cinemas.html"
    driver.get(url_ugc)
    #We need to let the browser load everything, so we buffer the code for a second.
    #Somehow, the implicit_wait function does not work in this case...
    time.sleep(1)

    #We use the search bar to search for a specific theater, which name is contained in the previous dataframes
    search_bar = driver.find_element(By.ID, 'search-cinemas-field')
    search_bar.clear()
    driver.implicitly_wait(1)
    search_bar.send_keys(theater_name)
    driver.implicitly_wait(1)
    
    #We search for the list of theaters displayed
    cinema_list = driver.find_element(By.ID, "nav-cinemas")
    
    #We then find the list of all theaters. The UGC website is engineered in a way that all theaters are still loaded after the query, but
    #only those that match the query are displayed. The others are simply hidden by modifying the style attribute.
    cinema_list_items = cinema_list.find_elements(By.CLASS_NAME, 'component--cinema-list-item')
    visible_elements = [element for element in cinema_list_items if element.get_attribute('style') == ""]
    #Now that we have the list of movie theaters, there should only be one item in the list. Either way, we take the first one, and take its website link
    #There we will be able to find all the movies that have screenings.
    if visible_elements:
        first_cinema = visible_elements[0]
        first_link = first_cinema.find_element(By.TAG_NAME, 'a')
        href_value = first_link.get_attribute('href')
        print(href_value)
        return href_value
    else:
        print("Aucun cinéma")
        return("")

In [14]:
find_UGC_cinema(driver, cinemasProchesUGC.loc[0,"nom"])

https://www.ugc.fr/cinema.html?id=2


'https://www.ugc.fr/cinema.html?id=2'

In [36]:
def get_movies_UGC(driver, theaterpage):
    #We set the current URL of the driver to the wanted UGC theater
    driver.get(theaterpage)
    #We need to let the browser load everything, so we buffer the code for a second.
    #Somehow, the implicit_wait function does not work in this case...
    time.sleep(5)
    pop_ups = driver.find_elements(By.ID, 'didomi-notice-disagree-button')
    if pop_ups:
        pop_ups[0].click()
        
    #We search for the container for all movies in the page
    movie_container = driver.find_element(By.CLASS_NAME, "dates-content")
    
    #Next, we get the list of all the containers of movie info
    list_of_movies = movie_container.find_elements(By.CLASS_NAME, 'slider-item')
    #Now that we have the list of movies, we will go in each one of them, and access their title (only their title for now)
    if list_of_movies:
        list_of_movies_screenings = []
        for movie in list_of_movies:
            if (len(movie.find_elements(By.CLASS_NAME, 'component--screening-cards')) == 0):
                continue
            if (movie.find_elements(By.CLASS_NAME, 'film-tag')) and (movie.find_elements(By.CLASS_NAME, 'film-tag')[0].text in [" Opéra ", " Ballet "]):
                print("DAS NOT A FILM")
                continue
                
            movie_info = movie.find_element(By.CLASS_NAME, 'block--title')
            movie_title = movie_info.find_element(By.CSS_SELECTOR, "a[data-film-label]")
            title = movie_title.text
            print(title)
    
            screenings = []
            screenings_list = driver.find_element(By.CLASS_NAME, 'component--screening-cards')
            screenings_list = screenings_list.find_elements(By.TAG_NAME, 'button')
    
            for s in screenings_list:
                lang = s.find_element(By.TAG_NAME, 'span').text
                start = s.find_element(By.CLASS_NAME, 'screening-start').text
                end = s.find_element(By.CLASS_NAME, 'screening-end').text
                room = s.find_element(By.CLASS_NAME, 'screening-detail').text
                screenings.append({'lang': lang, 'start': start, 'end': end, 'room': room})
    
            list_of_movies_screenings.append({'title': title, 'screenings': screenings})
        return list_of_movies_screenings
    else:
        print("Pas de film pour aujourd'hui")
        return([])

In [37]:
get_movies_UGC(driver, "https://www.ugc.fr/cinema.html?id=2")






[{'title': '',
  'screenings': [{'lang': '', 'start': '', 'end': '', 'room': ''}]},
 {'title': '',
  'screenings': [{'lang': '', 'start': '', 'end': '', 'room': ''}]},
 {'title': '',
  'screenings': [{'lang': '', 'start': '', 'end': '', 'room': ''}]}]

In [40]:
films = get_movies_UGC(driver, "https://www.ugc.fr/cinema.html?id=20")
films_updated = update_list_of_screenings(driver, films)

PRISCILLA
IRIS ET LES HOMMES
MOI CAPITAINE
NIGHT SWIM
LES TROIS MOUSQUETAIRES : MILADY
WONKA
VERMINES
HUNGER GAMES: LA BALLADE DU SERPENT ET DE L'OISEAU CHANTEUR
AQUAMAN ET LE ROYAUME PERDU
CHASSE GARDÉE
DREAM SCENARIO
LES SEGPA AU SKI
PAST LIVES - NOS VIES D'AVANT


In [41]:
pd.DataFrame(films_updated)

Unnamed: 0,title,screenings,ratings,genres,directors
0,PRISCILLA,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",3.7,"[Romance, Drama, Moving Relationship Stories, ...",[Sofia Coppola]
1,IRIS ET LES HOMMES,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",3.1,"[Drama, Comedy]",[Caroline Vignal]
2,MOI CAPITAINE,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",3.7,"[Adventure, Drama]",[Matteo Garrone]
3,NIGHT SWIM,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",,,
4,LES TROIS MOUSQUETAIRES : MILADY,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",3.3,"[Adventure, Action, Drama]",[Martin Bourboulon]
5,WONKA,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",3.4,"[Comedy, Fantasy, Family, Song And Dance, Holi...",[Paul King]
6,VERMINES,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",3.5,[Horror],[Sébastien Vaniček]
7,HUNGER GAMES: LA BALLADE DU SERPENT ET DE L'OI...,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",3.6,"[Science Fiction, Drama, Action, Song And Danc...",[Francis Lawrence]
8,AQUAMAN ET LE ROYAUME PERDU,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",2.3,"[Fantasy, Action, Adventure, Epic Heroes, Supe...",[James Wan]
9,CHASSE GARDÉE,"[{'lang': 'VOSTF', 'start': '22:00', 'end': '(...",3.1,[Comedy],"[Frédéric Forestier, Antonin Fourlon]"


### Getting the info of CGR

In [72]:
driver = webdriver.Chrome()

In [84]:
def find_CGR_cinema(driver, theater_name):
    #We set the current URL of the driver to the search page for UCG movie theaters
    url_ugc = "https://www.cgrcinemas.fr/cinema/"
    driver.get(url_ugc)
    #We need to let the browser load everything, so we buffer the code for a second.
    #Somehow, the implicit_wait function does not work in this case...
    time.sleep(1)

    #We use the search bar to search for a specific theater, which name is contained in the previous dataframes
    pop_up = driver.find_elements(By.ID, "didomi-notice-disagree-button")
    if pop_up:
        pop_up[0].click()
    
    search_bar = driver.find_element(By.CLASS_NAME, 'css-4bl6n9')
    search_bar.clear()
    driver.implicitly_wait(1)
    search_bar.send_keys(theater_name)
    driver.implicitly_wait(1)
    
    #We search for the list of theaters displayed
    cinema_list = driver.find_element(By.XPATH, "/html/body/div[2]/div[1]/div/div[1]/div[3]/div/div/div/div[3]/div/div")
    
    #We then find the list of all theaters. The UGC website is engineered in a way that all theaters are still loaded after the query, but
    #only those that match the query are displayed. The others are simply hidden by modifying the style attribute.
    cinema_list_items = cinema_list.find_elements(By.CLASS_NAME, 'css-fd6b40')
    #Now that we have the list of movie theaters, there should only be one item in the list. Either way, we take the first one, and take its website link
    #There we will be able to find all the movies that have screenings.
    if cinema_list_items:
        first_cinema = cinema_list_items[0]
        first_link = first_cinema.find_element(By.CLASS_NAME, 'css-xe0135')
        href_value = first_link.get_attribute('href')
        print(href_value)
        return href_value
    else:
        print("Aucun cinéma")
        return("")

In [85]:
#Some cinemas have names that are registered under another name
#In this case, I made it so it enters the name of the city instead of the name of the theater.
#An execption shauld be made for Paris

find_CGR_cinema(driver, CGR.loc[155,"commune"])

https://www.cgrcinemas.fr/films-a-l-affiche/?theater=B0059


'https://www.cgrcinemas.fr/films-a-l-affiche/?theater=B0059'

In [92]:
def get_movies_CGR(driver, theaterpage):
    #We set the current URL of the driver to the wanted UGC theater
    driver.get(theaterpage)
    #We need to let the browser load everything, so we buffer the code for a second.
    #Somehow, the implicit_wait function does not work in this case...
    time.sleep(5)
    pop_ups = driver.find_elements(By.ID, 'didomi-notice-disagree-button')
    if pop_ups:
        pop_ups[0].click()
        
    #We search for the container for all movies in the page
    movie_container = driver.find_element(By.CLASS_NAME, "css-1axjb46")
    films_a_l_affiche = []
    
    #Next, we get the list of all the containers of movie info
    list_of_movies = movie_container.find_elements(By.CLASS_NAME, 'css-1acoij0')
    
    #Now that we have the list of movies, we will go in each one of them, and access their title (only their title for now)
    if list_of_movies:
        for movie in list_of_movies:
            movie_title = movie.find_element(By.CLASS_NAME, "css-efkg2u")
            title = movie_title.text
            print(title)
            films_a_l_affiche.append({'title': title})

    
    #Now, we do this all over again for the next container of movies on the page
    movie_container = driver.find_element(By.CLASS_NAME, "css-eqwlce")
    list_of_movies = movie_container.find_elements(By.CLASS_NAME, 'css-1acoij0')
    
    #Now that we have the list of movies, we will go in each one of them, and access their title (only their title for now)
    if list_of_movies:
        for movie in list_of_movies:
            movie_title = movie.find_element(By.CLASS_NAME, "css-efkg2u")
            title = movie_title.text
            print(title)
            films_a_l_affiche.append({'title': title})
    
    if films_a_l_affiche:
        return films_a_l_affiche
        
    else:
        print("Pas de film à l'affiche")
        return([])

In [93]:
get_movies_CGR(driver, 'https://www.cgrcinemas.fr/films-a-l-affiche/?theater=B0059')

Night Swim
Kina & Yuk : renards de la banquise
Les SEGPA au ski
Vermines
Aquaman et le Royaume perdu
Chasse gardée
Jeff Panacloc - À la poursuite de Jean-Marc
Les Inséparables
Les Trois Mousquetaires: Milady
Wonka
Bâtiment 5
Migration
Thanksgiving : la semaine de l'horreur
Wish - Asha et la bonne étoile
Napoléon


[{'title': 'Night Swim'},
 {'title': 'Kina & Yuk : renards de la banquise'},
 {'title': 'Les SEGPA au ski'},
 {'title': 'Vermines'},
 {'title': 'Aquaman et le Royaume perdu'},
 {'title': 'Chasse gardée'},
 {'title': 'Jeff Panacloc - À la poursuite de Jean-Marc'},
 {'title': 'Les Inséparables'},
 {'title': 'Les Trois Mousquetaires: Milady'},
 {'title': 'Wonka'},
 {'title': 'Bâtiment 5'},
 {'title': 'Migration'},
 {'title': "Thanksgiving : la semaine de l'horreur"},
 {'title': 'Wish - Asha et la bonne étoile'},
 {'title': 'Napoléon'}]

In [95]:
films = get_movies_CGR(driver, "https://www.cgrcinemas.fr/films-a-l-affiche/?theater=B0059")
films_updated = update_list_of_screenings(driver, films)

Night Swim
Kina & Yuk : renards de la banquise
Les SEGPA au ski
Vermines
Aquaman et le Royaume perdu
Chasse gardée
Jeff Panacloc - À la poursuite de Jean-Marc
Les Inséparables
Les Trois Mousquetaires: Milady
Wonka
Bâtiment 5
Migration
Thanksgiving : la semaine de l'horreur
Wish - Asha et la bonne étoile
Napoléon


In [96]:
films_updated

[{'title': 'Night Swim'},
 {'title': 'Kina & Yuk : renards de la banquise',
  'ratings': nan,
  'genres': ['Family', 'Adventure'],
  'directors': ['Guillaume Maidatchevsky']},
 {'title': 'Les SEGPA au ski',
  'ratings': '3.0',
  'genres': ['Comedy'],
  'directors': ['Ali Bougheraba', 'Hakim Bougheraba']},
 {'title': 'Vermines',
  'ratings': '3.5',
  'genres': ['Horror'],
  'directors': ['Sébastien Vaniček']},
 {'title': 'Aquaman et le Royaume perdu',
  'ratings': '2.3',
  'genres': ['Fantasy',
   'Action',
   'Adventure',
   'Epic Heroes',
   'Superheroes In Action-Packed Battles With Villains',
   'Fantasy Adventure, Heroism, And Swordplay',
   'Action Comedy And Silly Heroics',
   'Sci-Fi Monster And Dinosaur Adventures',
   'Intense Combat And Martial Arts',
   'Show All…'],
  'directors': ['James Wan']},
 {'title': 'Chasse gardée',
  'ratings': '3.1',
  'genres': ['Comedy'],
  'directors': ['Frédéric Forestier', 'Antonin Fourlon']},
 {'title': 'Jeff Panacloc - À la poursuite de Jea

### Getting the info of MK2

In [51]:
driver = webdriver.Chrome()

In [52]:
def find_mk2_cinema(nom):
    result = url_mk2[url_mk2["nom"] == nom]["url"]
    return result[0] if result.any() else None

In [53]:
#Some cinemas have names that are registered under another name
#In this case, I made it so it enters the name of the city instead of the name of the theater.
#An execption shauld be made for Paris

find_mk2_cinema(cinemasProchesMK2.loc[11,"nom"])

'https://www.mk2.com/salle/mk2-beaubourg'

In [57]:
def get_movies_mk2(driver, theaterpage):
    #We set the current URL of the driver to the wanted UGC theater
    driver.get(theaterpage)
    #We need to let the browser load everything, so we buffer the code for a second.
    #Somehow, the implicit_wait function does not work in this case...
    time.sleep(1)
    pop_ups = driver.find_elements(By.ID, 'CybotCookiebotDialogBodyButtonDecline')
    if pop_ups:
        pop_ups[0].click()
    time.sleep(2)
    select_element = driver.find_elements(By.ID, "cinema-group-picker")
    if select_element:
        select = Select(select_element[0])
        select.select_by_index(1)
        time.sleep(2)
        valider_button = driver.find_element(By.XPATH, "/html/body/div[2]/div/div[2]/div/div/form/button")
        valider_button.click()

    #Next, we get the list of all the containers of movie info
    list_of_movies_selector = driver.find_element(By.XPATH, '/html/body/div[1]/div[1]/main/section[2]/div[2]/section')
    list_of_movies = list_of_movies_selector.find_elements(By.TAG_NAME, 'section')
    
    #Now that we have the list of movies, we will go in each one of them, and access their title (only their title for now)
    if list_of_movies:
        films_a_l_affiche = []
        for movie in list_of_movies:
            movie_title = movie.find_element(By.TAG_NAME, "h4")
            title = movie_title.text
            print(title)
            screenings = []
            screenings_list = driver.find_element(By.TAG_NAME, 'ol')
            screenings_list = screenings_list.find_elements(By.TAG_NAME, "a")
            for s in screenings_list:
                lang = s.find_element(By.TAG_NAME, 'h6').text
                start = s.find_element(By.TAG_NAME, 'h5').text
                screenings.append({'lang': lang, 'start': start})
    
            films_a_l_affiche.append({'title': title, 'screenings': screenings})
    
    if films_a_l_affiche:
        return films_a_l_affiche
        
    else:
        print("Pas de film à l'affiche")
        return([])

In [61]:
get_movies_mk2(driver, "https://www.mk2.com/salle/mk2-beaubourg")

Anatomie d'une chute
Chungking Express
Happy Together
L'innocence
Les Anges déchus
May December
Pauvres créatures
Perfect Days
Primadonna
Priscilla
SHTTL
Un été afghan


[{'title': "Anatomie d'une chute",
  'screenings': [{'lang': 'VF', 'start': '10:20'},
   {'lang': 'VF', 'start': '14:40'},
   {'lang': 'VF', 'start': '21:15'}]},
 {'title': 'Chungking Express',
  'screenings': [{'lang': 'VF', 'start': '10:20'},
   {'lang': 'VF', 'start': '14:40'},
   {'lang': 'VF', 'start': '21:15'}]},
 {'title': 'Happy Together',
  'screenings': [{'lang': 'VF', 'start': '10:20'},
   {'lang': 'VF', 'start': '14:40'},
   {'lang': 'VF', 'start': '21:15'}]},
 {'title': "L'innocence",
  'screenings': [{'lang': 'VF', 'start': '10:20'},
   {'lang': 'VF', 'start': '14:40'},
   {'lang': 'VF', 'start': '21:15'}]},
 {'title': 'Les Anges déchus',
  'screenings': [{'lang': 'VF', 'start': '10:20'},
   {'lang': 'VF', 'start': '14:40'},
   {'lang': 'VF', 'start': '21:15'}]},
 {'title': 'May December',
  'screenings': [{'lang': 'VF', 'start': '10:20'},
   {'lang': 'VF', 'start': '14:40'},
   {'lang': 'VF', 'start': '21:15'}]},
 {'title': 'Pauvres créatures',
  'screenings': [{'lang': 

In [62]:
films = get_movies_mk2(driver, "https://www.mk2.com/salle/mk2-beaubourg")
films_updated = update_list_of_screenings(driver, films)
df = pd.DataFrame(films_updated)

Anatomie d'une chute
Chungking Express
Happy Together
L'innocence
Les Anges déchus
May December
Pauvres créatures
Perfect Days
Primadonna
Priscilla
SHTTL
Un été afghan


In [63]:
df

Unnamed: 0,title,screenings,ratings,genres,directors
0,Anatomie d'une chute,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",4.2,"[Mystery, Drama, Thrillers And Murder Mysterie...",[Justine Triet]
1,Chungking Express,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",4.3,"[Comedy, Drama, Romance, Crime, Drugs And Gang...",[Wong Kar-wai]
2,Happy Together,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",4.2,"[Romance, Drama, Moving Relationship Stories, ...",[Wong Kar-wai]
3,L'innocence,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",4.3,"[Thriller, Drama, Moving Relationship Stories,...",[Hirokazu Kore-eda]
4,Les Anges déchus,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",4.2,"[Crime, Romance, Action, Crime, Drugs And Gang...",[Wong Kar-wai]
5,May December,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",3.8,"[Drama, Comedy, Moving Relationship Stories, I...",[Todd Haynes]
6,Pauvres créatures,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",4.2,"[Romance, Science Fiction, Comedy, Humanity An...",[Yorgos Lanthimos]
7,Perfect Days,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",4.2,[Drama],[Wim Wenders]
8,Primadonna,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",,[],[Kalju Kurepõld]
9,Priscilla,"[{'lang': 'VF', 'start': '10:20'}, {'lang': 'V...",3.6,"[Romance, Drama, Moving Relationship Stories, ...",[Sofia Coppola]
