# Data scraping workflow
The workflow described here is divided into sections for each data provider, detailing the specific functions and processes used to collect data such as teams, players, standings, stats, shot maps, and positions. Each section includes information on how to generate and store the resulting data, ensuring a consistent and organized approach.

In [1]:
# Import libraries for data scraping
import pvd_Sofascore as sofascore
import pvd_Fotmob as fotmob
import pvd_Fbref as fbref

## Sofascore
The URL of the league to be used is specified. The link includes the ID of the league and the current season. Several functions are executed to obtain specific data. Some of them take the output from a previous function as input. In each case, a .csv file is generated and will be found in the *data* folder, starting with *sofascore_*.

In [2]:
# League URL to be used
sofascore_league_url = 'https://www.sofascore.com/es/torneo/futbol/argentina/liga-profesional-de-futbol/155#id:57478'

### Get teams
The teams from the league are obtained, a .csv file is exported, and the resulting dictionary is assigned to the variable.

In [3]:
sofascore_teams = sofascore.get_teams_from_league(sofascore_league_url)

### Get events
Just like with the teams, the events from the league are obtained.

In [4]:
sofascore_events = sofascore.get_events_from_league(sofascore_league_url)

### Get players
Based on the teams, the players that make up the roster of each are obtained. Similarly, a .csv file is exported, and the results are stored in a dictionary.

In [5]:
sofascore_players = sofascore.get_players_from_teams(sofascore_teams)

### Get heatmaps
For each player, the heatmap for the latest seasons (the current season and those from the past two years) is obtained.

In [6]:
sofascore_heatmaps = sofascore.get_heatmap_from_players(sofascore_players)

### Get lineups
For each event, the formation of each team is obtained, including the starting lineup and substitutions. Each position is assigned a code based on the line of play and lateral position. This code will then be used to assign an ordered pair for graphical representation.

In [7]:
sofascore_lineups = sofascore.get_lineups_from_events(sofascore_events)

### Get results
For each event, the result, goals, and the winning team (Home or Away) are obtained. Each event will have two records in the results, one for each involved team.

In [8]:
sofascore_results = sofascore.get_results_from_events(sofascore_events)

## Fotmob
Data retrieval from Fotmob starts with the provided league URL. Several functions are run to gather various data points, such as teams, players, shot maps, and positions. Some functions depend on data generated by others, creating a comprehensive dataset. The resulting data is exported as .csv files saved in the data folder, all prefixed with *fotmob_.*

In [9]:
fotmob_league_url = 'https://www.fotmob.com/es/leagues/112/overview/liga-profesional'

### Get teams
This function retrieves the teams from the specified league using the provided URL. A file named *fotmob_teams.csv* is generated and saved in the *data* folder. The resulting output is stored in a dictionary for further use.

In [10]:
fotmob_teams = fotmob.get_teams_from_league(fotmob_league_url)

### Get players
Based on the teams retrieved, this function gathers the players that make up each team’s roster. A .csv file is exported to the *data* folder with the filename *fotmob_players.csv*, and the results are stored in a dictionary.

In [11]:
fotmob_players = fotmob.get_players_from_teams(fotmob_teams)

### Get shotmaps
Using player data, this function retrieves the shot maps for each player. A .csv file is generated and saved in the *data* folder. The results are stored in a DataFrame.

In [12]:
fotmob_shotmaps = fotmob.get_shotmap_from_players(fotmob_players)

Error processing player https://www.fotmob.com/es/players/306387/frank-kudelka: 'NoneType' object is not subscriptable
Error processing player https://www.fotmob.com/es/players/146801/hernan-galindez: 'NoneType' object is not subscriptable
Error processing player https://www.fotmob.com/es/players/1209083/sebastian-meza: 'NoneType' object is not subscriptable
Error processing player https://www.fotmob.com/es/players/1510567/nazareno-duran: 'NoneType' object is not subscriptable
Error processing player https://www.fotmob.com/es/players/1403679/leandro-figueredo: 'NoneType' object is not subscriptable
Error processing player https://www.fotmob.com/es/players/933706/matias-gomez: 'NoneType' object is not subscriptable
Error processing player https://www.fotmob.com/es/players/1383667/franco-watson: 'NoneType' object is not subscriptable
Error processing player https://www.fotmob.com/es/players/1408581/franco-alfonso: 'NoneType' object is not subscriptable
Error processing player https://www

### Get positions
This function captures each player's on-field positions throughout various matches. The data is exported to a .csv file, and the resulting DataFrame is ready for further analysis.

In [13]:
fotmob_positions = fotmob.get_positions_from_players(fotmob_players)

No positions found for player Frank Kudelka with id 306387
No positions found for player Nazareno Duran with id 1510567
No positions found for player Leandro Figueredo with id 1403679
No positions found for player Facundo Sava with id 35328
No positions found for player Gustavo Lescano with id 1605746
Error processing player https://www.fotmob.com/es/players/1606616/alexis-rad with id 1606616: argument of type 'NoneType' is not iterable
Response data: {'id': 1606616, 'name': 'Alexis Rad', 'birthDate': {'utcTime': '2008-01-10T00:00:00.000Z', 'timezone': 'UTC'}, 'isCoach': False, 'isCaptain': False, 'primaryTeam': {'teamId': 161727, 'teamName': 'Atletico Tucuman', 'onLoan': False, 'teamColors': {'color': '#68b0de', 'colorAlternate': '#68b0de', 'colorAway': '#1B2036', 'colorAwayAlternate': '#ffffff'}}, 'positionDescription': None, 'injuryInformation': None, 'internationalDuty': None, 'playerInformation': [{'value': {'numberValue': 171, 'options': {'style': 'unit', 'unit': 'centimeter', 'u

## Fbref
Fbref data extraction also begins with a specific league URL. Multiple functions are executed to collect different types of data, including teams, standings, stats, players, and squads. These functions may use the output from others to enhance the dataset. The extracted data is saved in .csv files located in the data folder, with filenames starting with *fbref_*.

In [14]:
fbref_league_url = 'https://fbref.com/es/comps/21/historia/Temporadas-de-Liga-Profesional-Argentina'

### Get teams
This function extracts team information from the provided league URL. The data is saved in a .csv file named *fbref_teams.csv* within the *data* folder. The output is a dictionary for subsequent processing.

In [15]:
fbref_teams = fbref.get_teams_from_league(fbref_league_url)

### Get standings
This function retrieves the standings of the league, detailing the current positions of the teams. The information is exported to a .csv file. The standings data is organized for easy reference and further analysis.

In [16]:
fbref_standings = fbref.get_standings_from_league(fbref_league_url)

Error processing ID historia for league https://fbref.com/es/comps/21/historia/Temporadas-de-Liga-Profesional-Argentina: No tables found


### Get stats
This function gathers comprehensive statistical *data* for the league, including metrics for each team. A .csv file is created and stored in the data folder. The resulting DataFrame is ready for use in data analysis or visualization tasks.

In [17]:
fbref_stats = fbref.get_stats_from_league(fbref_league_url)

Error processing ID stats_squads_standard_for: No tables found
Error processing ID stats_squads_standard_against: No tables found
Error processing ID stats_squads_keeper_for: No tables found
Error processing ID stats_squads_keeper_against: No tables found
Error processing ID stats_squads_keeper_adv_for: No tables found
Error processing ID stats_squads_keeper_adv_against: No tables found
Error processing ID stats_squads_shooting_for: No tables found
Error processing ID stats_squads_shooting_against: No tables found
Error processing ID stats_squads_passing_for: No tables found
Error processing ID stats_squads_passing_against: No tables found
Error processing ID stats_squads_passing_types_for: No tables found
Error processing ID stats_squads_passing_types_against: No tables found
Error processing ID stats_squads_gca_for: No tables found
Error processing ID stats_squads_gca_against: No tables found
Error processing ID stats_squads_defense_for: No tables found
Error processing ID stats_squa

ValueError: No objects to concatenate

### Get players
Using the team information, this function compiles a list of players from each team. The *data* is saved as a .csv file and structured in a dictionary format for additional use.

In [None]:
fbref_players = fbref.get_players_from_teams(fbref_teams)

### Get squads
This function provides detailed squad information for each team, including player demographics and attributes. It outputs a .csv file for easy access, and the data is available in a DataFrame format for deeper analysis.

In [None]:
fbref_squads = fbref.get_squads_from_teams(fbref_teams)