**Selenium EXERCISE 1:** <BR>
<ul>
<li> Open a browser 

<li>Go to tripadvisor/Restaurants

<li>Find the search text box

<li>Clear it, input the query "Sant Cugat" and send it

<li>Go to "Restarurants" and get all the links and names of top 10 restaurants in Sant cugat
<li> Store them in a list of Dictionaries {name,links}
</ul>


In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

In [2]:
# Open browser using Chrome driver
driver = webdriver.Chrome('./chromedriver')

In [3]:
# Open url
tripadvisor_url = 'https://www.tripadvisor.com/Restaurants'

driver.get(tripadvisor_url)

In [4]:
# Close cookies warning
try:
    # Wait for 'Okay' button to show. This happens only once per
    # driver session, so it might've been closed beforehand
    ok_button = WebDriverWait(driver, 5).until(
        EC.presence_of_element_located((By.XPATH, '//*[@id="_evidon-accept-button"]'))
    )
    
    ok_button.click()
    print('Cookies accepted')
except TimeoutException as e:
    print(e)

Cookies accepted


In [5]:
search_text = 'Sant Cugat'

# Get search bar
search_bar = driver.find_element_by_xpath(
    '//*[@id="component_6"]/div/div/form/input[1]'
)

# Write 'Sant Cugat' in the search bar (clear it before in case
# there was something written before)
search_bar.clear()
search_bar.send_keys(search_text)

In [6]:
# Click on first option
try:
    # Wait for first option to contain 'Sant Cugat'. Otherwise, 'Nearby' would be selected
    WebDriverWait(driver, 5).until(
        EC.text_to_be_present_in_element((By.XPATH, '//*[@id="typeahead_results"]/a[1]'), search_text)
    )
    
    # Click on 'Sant Cugat' option
    sant_cugat = driver.find_element_by_xpath('//*[@id="typeahead_results"]/a[1]')
    sant_cugat.click()
    print('Searching for best restaurants in Sant Cugat...')
except TimeoutException as e:
    print(e)

Searching for best restaurants in Sant Cugat...


In [7]:
# Get top restaurants
# The xpath expression gets the <a> descendant of all div's whose data-test
# attribute contains '_list_item'
restaurants = driver.find_elements_by_xpath(
    '//*[@id="component_2"]//div[contains(@data-test, "_list_item")]/span/div[1]/div[2]/div[1]/div/span/a'
)

In [8]:
# Get top 10 restaurants
top_restaurants = [{'name': restaurants[i].text, 'link': restaurants[i].get_attribute('href')} for i in range(10)]

print(f'Top 10 restaurants in {search_text}')

for restaurant in top_restaurants:
    print(restaurant['name'])
    print(restaurant['link'])

Top 10 restaurants in Sant Cugat
1. Piaceri D’ Italia Ristorante Pizzeria
https://www.tripadvisor.com/Restaurant_Review-g1080422-d19084243-Reviews-Piaceri_D_Italia_Ristorante_Pizzeria-Sant_Cugat_del_Valles_Catalonia.html
2. Restaurant Brau
https://www.tripadvisor.com/Restaurant_Review-g1080422-d10195584-Reviews-Restaurant_Brau-Sant_Cugat_del_Valles_Catalonia.html
3. Sabatic
https://www.tripadvisor.com/Restaurant_Review-g1080422-d10167691-Reviews-Sabatic-Sant_Cugat_del_Valles_Catalonia.html
4. Nemesis
https://www.tripadvisor.com/Restaurant_Review-g1080422-d11892809-Reviews-Nemesis-Sant_Cugat_del_Valles_Catalonia.html
5. Dakidaya
https://www.tripadvisor.com/Restaurant_Review-g1080422-d4546707-Reviews-Dakidaya-Sant_Cugat_del_Valles_Catalonia.html
6. 9Reinas Sant Cugat
https://www.tripadvisor.com/Restaurant_Review-g1080422-d7155184-Reviews-9Reinas_Sant_Cugat-Sant_Cugat_del_Valles_Catalonia.html
7. Kitsune Sushi Bar
https://www.tripadvisor.com/Restaurant_Review-g1080422-d5966644-Reviews-Kit

In [9]:
# Close browser
driver.quit()

**Selenium EXERCISE 2:**

* Go to Eix Macià Cinemes web Site. ('http://www.cinemeseixmacia.com/')
* Find all the movies that are currently playing in the Cinema.
* Filter those movies that starts from at specific time (i.e. 18PM) to 1h and half later and are recomended for a specific age audience (i.e 18 y.o).
* Search in themoviedb the average ratings for those movies and select the best one. 
* Play the movie trailer.

In [10]:
import requests
import re

In [11]:
# Open browser using Chrome driver
driver = webdriver.Chrome('./chromedriver')

In [12]:
# Open url
macia_url = 'http://www.cinemeseixmacia.com'

driver.get(macia_url)

In [13]:
# Get list of movie elements
movies = driver.find_elements_by_xpath('//*[@id="rt-mainbody"]/div/div[3]/div[contains(@class, "peli-item")]')

print(f'Retrieved {len(movies)} movies')

Retrieved 11 movies


In [14]:
# NOTE 1: When using XPaths, the '.' character at the beggining means that
# the search has to be relative to the current node reference (which in the
# loops is the node containing one of the movies)

# NOTE 2: To get the text of the elements the get_attribute('textContent') method
# has been used since .text didn't work

# Get movie titles
movie_titles = [
    movie.find_element_by_xpath('.//div/div[2]/div/a/h4').get_attribute('textContent')
    for movie in movies
]

# Get timetables for each movie
movie_times = [
    [
        time.get_attribute('textContent').strip()
        for time in movie.find_elements_by_xpath('.//*[contains(@class, "horasessio")]/a/button')
    ]
    for movie in movies
]

# Get age ratings
movie_ages = [
    movie.find_element_by_xpath('.//*[@id="dadespeli"]/p[2]/span').get_attribute('textContent')
    for movie in movies
]

# Transform age ratings to numerical values
movie_ages = list(map(lambda age: int(age_match[0]) if (age_match := re.match(r'\d{1,2}', age)) else 0, movie_ages))

# Generate billboard. Save reference to DOM element for further processing
billboard = [
    {'title': title, 'times': times, 'age': age, 'ref': element}
    for title, times, age, element in zip(movie_titles, movie_times, movie_ages, movies)
]

for movie in billboard:
    print(f"{movie['title']} {movie['times']} Age: {movie['age']}")

ATMOS VENOM: HABRA MATANZA ['16:00', '18:00', '20:15', '22:30'] Age: 12
DUNE ['16:00', '19:00', '22:00'] Age: 12
EL BUEN PATRON ['16:00', '19:00', '22:00'] Age: 12
LA FAMILIA ADDAMS 2 - LA GRAN ESCAPADA ['16:00', '18:15', '20:30'] Age: 0
LA PATRULLA CANINA:LA PELICULA ['16:00', '18:00'] Age: 0
LAS LEYES DE LA FRONTERA ['22:00'] Age: 16
MADRES PARALELAS ['16:00', '19:00', '22:00'] Age: 12
NO RESPIRES 2 ['22:45'] Age: 16
SIN TIEMPO PARA MORIR ['16:00', '18:15', '20:30', '22:00'] Age: 12
VENOM: HABRA MATANZA ['17:00', '19:00'] Age: 12
¿QUIÉN ES QUIÉN? ['20:15', '22:30'] Age: 7


In [15]:
def generate_preferred_times(time):
    '''
    Function used to generate the preffered times by the users in which they
    can watch a movie.
    
    Given an initial time, it computes a new time, which is 1:30 hours after
    the first one. With these two times, it creates a range of hours in which
    the user can watch a movie.
    
    Parameters
    ----------
        time: str
            Initial time in which the user can watch a movie.
    
    Returns
    -------
        preferred_times: [str, str]
            List containing the preferred times by the user. The first
            element is the initial time. The second one is 1:30 hours later.
    '''
    time_hours, time_minutes = tuple(map(lambda t: int(t), time.split(':')))
    
    margin_minutes = (time_minutes + 30) % 60
    margin_hours = (time_hours + 1 + (time_minutes + 30) // 60) % 24

    # ljust to fill string with an additional 0 in case the minutes are 0
    preferred_times = [time, f"{margin_hours}:{str(margin_minutes).ljust(2, '0')}"]
    
    return preferred_times

# User data
age = 18
time = '18:00'
preferred_times = generate_preferred_times(time)

print(f'User is {age} years old and can watch a movie between {preferred_times[0]} and {preferred_times[1]}')

User is 18 years old and can watch a movie between 18:00 and 19:30


In [16]:
def filter_movie(movie, age, preferred_times):
    '''
    Function used to filter movies out of the billboard.
    
    Checks if a given movie satsfices the age and time restrictions
    imposed by the user.
    
    Parameters
    ----------
        movie: dict
            Dictionary containing information about the movie.
        age: int
            Age of the user.
        preferred_times: [str, str]
            List containing the preferred times by the user. These are the
            times when the user can watch a movie.
    
    Returns
    -------
        bool
            True if all requirements are satisfied and False otherwise.
    '''
    if age < movie['age']:
        return False
    
    for time in movie['times']:
        if time >= preferred_times[0] and time <= preferred_times[1]:
            return True
    
    return False

# Filter those movies and select those that the user can watch
filtered_billboard = list(filter(lambda m: filter_movie(m, age, preferred_times), billboard))

for movie in filtered_billboard:
    print(f"{movie['title']} {movie['times']} Age: {movie['age']}")

ATMOS VENOM: HABRA MATANZA ['16:00', '18:00', '20:15', '22:30'] Age: 12
DUNE ['16:00', '19:00', '22:00'] Age: 12
EL BUEN PATRON ['16:00', '19:00', '22:00'] Age: 12
LA FAMILIA ADDAMS 2 - LA GRAN ESCAPADA ['16:00', '18:15', '20:30'] Age: 0
LA PATRULLA CANINA:LA PELICULA ['16:00', '18:00'] Age: 0
MADRES PARALELAS ['16:00', '19:00', '22:00'] Age: 12
SIN TIEMPO PARA MORIR ['16:00', '18:15', '20:30', '22:00'] Age: 12
VENOM: HABRA MATANZA ['17:00', '19:00'] Age: 12


In [17]:
# Read API key
with open('api_key') as f:
    api_key = f.read().strip()

tmdb_url = 'https://api.themoviedb.org/3/search/movie'
params = {'page': 1, 'api_key': api_key, 'region': 'ES', 'language': 'es-ES'}

# Fetch data from API to get movie scores
for movie in filtered_billboard:
    # Update params. Remove the VOSE substring as no results are returned when it's included
    params['query'] = movie['title'].replace('VOSE ', '')
    
    response = requests.get(tmdb_url, params=params)
    results = response.json()['results']
    
    # Get score from the first result or give it a 0 score if the response is empty
    movie['score'] = results[0]['vote_average'] if results else 0

for movie in filtered_billboard:
    print(f"{movie['title']} {movie['times']} Age: {movie['age']}  Average user score: {movie['score']}")

ATMOS VENOM: HABRA MATANZA ['16:00', '18:00', '20:15', '22:30'] Age: 12  Average user score: 0
DUNE ['16:00', '19:00', '22:00'] Age: 12  Average user score: 8.1
EL BUEN PATRON ['16:00', '19:00', '22:00'] Age: 12  Average user score: 0
LA FAMILIA ADDAMS 2 - LA GRAN ESCAPADA ['16:00', '18:15', '20:30'] Age: 0  Average user score: 7.7
LA PATRULLA CANINA:LA PELICULA ['16:00', '18:00'] Age: 0  Average user score: 0
MADRES PARALELAS ['16:00', '19:00', '22:00'] Age: 12  Average user score: 6.6
SIN TIEMPO PARA MORIR ['16:00', '18:15', '20:30', '22:00'] Age: 12  Average user score: 7.3
VENOM: HABRA MATANZA ['17:00', '19:00'] Age: 12  Average user score: 7.2


In [18]:
# Find movie with highest score
best_movie = max(filtered_billboard, key=lambda x: x['score'])

print(f"Movie with highest score: {best_movie['title']} {best_movie['times']} Age: {best_movie['age']} Average user score: {best_movie['score']}")

Movie with highest score: DUNE ['16:00', '19:00', '22:00'] Age: 12 Average user score: 8.1


In [19]:
# Get button to open trailer and perform actions to click on it
trailer_button = best_movie['ref'].find_element_by_xpath('.//div[contains(@class, "peli-botons")]/a[1]/button')

webdriver.ActionChains(driver).move_to_element(best_movie['ref']).click(trailer_button).perform()

In [20]:
# Switch to iframe. This is necessary to play the video
driver.switch_to.frame(driver.find_element_by_xpath('/html/body/div[5]/div/div/div/div/iframe'))

try:
    play_button = WebDriverWait(driver, 5).until(
        EC.element_to_be_clickable((By.XPATH,'//*[@id="movie_player"]/div[4]/button'))
    )
        
    play_button.click()
except TimeoutException as e:
    print(e)