## Activity 2: Scrapping Datacraft's agenda with Selenium corrected

<div style="text-align: center;">
    <img src="../images/agenda_datacraft.png" width="600" height="300">
</div>

For this second activity, we're going to scrape the datacraft calendar: https://datacraft.paris/agenda/

We want to create a DataFrame with the following columns: `event_title`, `date`, `hour`, `address`.

#### Importing modules

In [1]:
# Import des bibliothèques
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import pandas as pd

#### Establishing a connection to the site

In [2]:
# create an instance of the Service object
service = Service(executable_path=ChromeDriverManager().install())
# start Chrome using the service keyword
driver = webdriver.Chrome(service=service)
# Enter the URL you wish to access
url = "https://datacraft.paris/agenda/"
driver.get(url)

#### Element identification and selection

In [3]:
event_title = driver.find_elements(By.CSS_SELECTOR, 'h3.tribe-events-calendar-list__event-title a')
event_date = driver.find_elements(By.CSS_SELECTOR, 'time.tribe-events-calendar-list__event-date-tag-datetime')
event_hour = driver.find_elements(By.CSS_SELECTOR, '.tribe-events-calendar-list__event-datetime-wrapper .timeshed')
event_address =  driver.find_elements(By.CSS_SELECTOR,'.tribe-events-calendar-list__event-venue-address')

#### DataFrame creation

In [4]:
# Collect data
title = [elem.get_attribute("title") for elem in event_title]
date = [elem.get_attribute("datetime") for elem in event_date]
hour = [elem.text for elem in event_hour]
address = [elem.text for elem in event_address]

# Create dataframe
data = {
    'Event_title': title,
    'Date': date,
    'Time': hour,
    'Address': address
}

df = pd.DataFrame(data)

# Show dataframe
df

Unnamed: 0,Event_title,Date,Time,Address
0,Atelier – Causalité : L’inférence causale,2024-06-26,14:00 - 17:00,"3 Rue Rossini, 75009 Paris"
1,ATELIER – DONNEZ VIE À VOTRE IMAGINATION GRÂCE...,2024-06-27,16:00 - 18:00,"3 Rue Rossini, 75009 Paris"
2,"IA Frugale – Spécification AFNOR : découverte,...",2024-06-27,18:00 - 19:30,"3 Rue Rossini, 75009 Paris"
3,Workshop – Exploring datasets for RAG and fine...,2024-06-28,13:00 - 17:00,"3 Rue Rossini, 75009 Paris"
4,REX – Mise en place d’une app d’optimisation d...,2024-07-01,18:00 - 19:00,"3 Rue Rossini, 75009 Paris"
5,"Table ronde GenAI : derrière la tendance, quel...",2024-07-02,08:30 - 10:30,"36 Rue La Fayette, Paris"
6,FORMATION – INITIATION À L’IA GÉNÉRATIVE : DU ...,2024-07-02,09:00 - 17:30,"4 place Jussieu, Paris"
7,Club CDO – Comment déployer sa stratégie dat...,2024-07-04,19:00 - 21:00,"3 Rue Rossini, 75009 Paris"
8,REX-Détection et qualification d’objets pour u...,2024-07-09,18:00 - 19:00,"3 Rue Rossini, 75009 Paris"
9,Atelier – Déployez et partagez votre projet d...,2024-07-11,15:00 - 18:00,"3 Rue Rossini, 75009 Paris"


###### Convert csv

In [5]:
#df.to_csv('exemple.csv', index=False)

###### Other ways to select information

In [6]:
event_title = driver.find_elements(By.XPATH, "//*[@class='tribe-events-calendar-list__event-title-link tribe-common-anchor-thin']")
event_date = driver.find_elements(By.XPATH, "//*[@class='dateshed']")
event_hour = driver.find_elements(By.XPATH, "//*[@class='timeshed']")
event_adress = driver.find_elements(By.XPATH, "//*[@class='tribe-events-calendar-list__event-venue-address']")
event_adress = driver.find_elements(By.CLASS_NAME, "tribe-events-calendar-list__event-venue-address")
event_date = driver.find_elements(By.CLASS_NAME, "dateshed")
event_hour = driver.find_elements(By.CLASS_NAME, "timeshed")