# Web Scraping with Python + Selenium

Before starting, I’m obliged to mention that web scraping is a grey area legally and ethicaly in lots of circumstances. Please consider the positive and negative effects of what you scrape before doing so!

Warning over.

## Now, let's go for it

In this project I use **Python + Selenium** to get a **.csv** file with the 21/22 season's matches from Spain's La Liga
(1° division Football League). We need to get this to make future analysis.

## What we are going to do:

We are going to scrap the data from here: https://www.adamchoi.co.uk/teamgoals/detailed

Our process for extracting the data is going to go something like this:

1. Import the libraries to use for in the project 
2. Load the webpage containing the data.
3. Locate the data within the page and extract it.
4. Organise the data into a dataframe
5. Extract the data to a .csv

Let's do it!

## 1. Importing the libraries to use for in the project 

In [1]:
from selenium import webdriver
from selenium.webdriver.support.ui import Select

import pandas as pd
import time


## 2. Load the webpage containing the data.

In [2]:
website = 'https://www.adamchoi.co.uk/teamgoals/detailed'
path = r'C:/Users/Ronald C/Downloads/Data Analyst Portfolio Porjects/Project 5 Web Scrapping/chromedriver.exe'


## 3. Locate the data within the page and extract it.

In [4]:
driver = webdriver.Chrome(path)
driver.get(website)

all_matches_button = driver.find_element_by_xpath('//label[@analytics-event="All matches"]')
all_matches_button.click()

drop_country = Select(driver.find_element_by_id('country'))
drop_country.select_by_visible_text('Spain')

drop_season = Select(driver.find_element_by_id('season'))
drop_season.select_by_visible_text('21/22')

time.sleep(5)


# Creating a list with all the matches

matches = driver.find_elements_by_tag_name('tr')

match_list = []
for match in matches:
   match_list.append(match.text)
    
driver.quit()


## 4. Organise the data into a dataframe and
## 5. Extract the data to a .csv

In [5]:
# Using Pandas to save the data for future analysis

df = pd.DataFrame({'Matches' : match_list})
print(df)
df.to_csv('LaLiga.csv', index = False)
    



                                   Matches
0      14-08-2021 Alaves 1 - 4 Real Madrid
1         21-08-2021 Alaves 0 - 1 Mallorca
2         27-08-2021 Valencia 3 - 0 Alaves
3          18-09-2021 Alaves 0 - 2 Osasuna
4          22-09-2021 Espanol 1 - 0 Alaves
..                                     ...
755     30-04-2022 Alaves 2 - 1 Villarreal
756    08-05-2022 Villarreal 1 - 1 Sevilla
757  12-05-2022 Vallecano 1 - 5 Villarreal
758   15-05-2022 Villarreal 1 - 2 Sociedad
759  22-05-2022 Barcelona 0 - 2 Villarreal

[760 rows x 1 columns]


That would be it, in the next project I'm going to use SQL to explore and organize the data and prepare it to make an amazing visualization