# COMP47670 Assignment 1 - Task 1

Student ID: 24292215

Student Name: Lucas Sipos George

---

## Summary Assignment 1
The objective of this assignment is to collect a dataset from [**API Sports**](https://api-sports.io) about the players from different teams (Real Madrid, Manchester City, Paris Saint Germain, Bayern München) that play in UEFA Champions League and then use Python to prepare, analyse, and derive insights from the collected data individually and all together.

---

### Task 1
This notebook covers **Task 1 - Data Collection**. Since the API provides every statistics of an every player in a team which played in a league in a specific season, the only thing left to do is to find the _id_ of the teams (Real Madrid, Manchester City, Paris Saint Germain, Bayern München) and the _id_ of the league (UEFA Champions League) in season _2022_

In [1]:
import requests, json
from pathlib import Path
from pprint import pprint

Create a file in the same directory as the script named _**api_key.txt**_ and insert your **API key**:

In [57]:
with open("api_key.txt") as file:
    API_KEY = file.read().strip()

In [103]:
# prefix for API urls
API_URL_PREFIX = "https://v3.football.api-sports.io"
# headers needed for fetching data
HEADERS = {
    'x-rapidapi-key': API_KEY,
    'x-rapidapi-host': 'v3.football.api-sports.io'
}

In [72]:
# the teams we want information on
FOOTBALL_TEAMS = ["Real Madrid", "Manchester City", "Paris Saint Germain", "Bayern München"]
# the league they played in
FOOTBALL_LEAGUE = "UEFA Champions League"
# the season we want information on
SEASON = 2022
# dictionary with key -> football team and value -> football team id
FOOTBALL_TEAMS_ID = dict(zip(FOOTBALL_TEAMS, [0] * len(FOOTBALL_TEAMS)))
# dictionary with key -> league and value league id
FOOTBALL_LEAGUE_ID = {FOOTBALL_LEAGUE: 0}

Create directory for raw data storage, if it does not already exist:

In [86]:
raw_data_directory = Path("raw")
raw_data_directory.mkdir(parents=True, exist_ok=True)

### Data Collection

We will define a function to fetch a team's id from API Sports...

In [66]:
# find the teams id
def fetch_team_id(team_name: str) -> int:
    print(f"Fetching team id for {team_name}")
    # construct url
    request_url = f"{API_URL_PREFIX}/teams"
    # data needed to be sent
    payload = {"name": team_name}
    # fetch the information
    response = requests.request("GET", request_url, headers=HEADERS, params=payload)
    # returning only the id as an int
    return response.json()["response"][0]["team"]["id"]

...and one for fetching a league's id:

In [79]:
def fetch_league_id(league: str) -> int:
    print(f"Fetching league id for {league}")
    # construct url
    request_url = f"{API_URL_PREFIX}/leagues"
    # data needed to be sent
    payload = {
        "name": league,
        "season": SEASON,
    }
    # fetch the information
    response = requests.request("GET", request_url, headers=HEADERS, params=payload)
    # returning only the id as an int
    return response.json()["response"][0]["league"]["id"]

Loop through football teams and store every team's _id_ in a dictionary and also store the league's _id_

In [82]:
for team in FOOTBALL_TEAMS:
    FOOTBALL_TEAMS_ID[team] = fetch_team_id(team)
FOOTBALL_LEAGUE_ID[FOOTBALL_LEAGUE] = fetch_league_id(FOOTBALL_LEAGUE)

Fetching team id for Real Madrid
Fetching team id for Manchester City
Fetching team id for Paris Saint Germain
Fetching team id for Bayern München
Fetching league id for UEFA Champions League


This is just for reference to verify if every team's has been found and was given an _id_, as well as the league's _id_

In [83]:
pprint(FOOTBALL_TEAMS_ID)
pprint(FOOTBALL_LEAGUE_ID)

{'Bayern München': 157,
 'Manchester City': 50,
 'Paris Saint Germain': 85,
 'Real Madrid': 541}
{'UEFA Champions League': 2}


We will define a function that will store all the statistics on every **player** from a team which played in **UEFA Champions League** in season **2022**

In [100]:
def fetch_team_players_statistics(team_name: str, team_id: int) -> None:
    print(f"Fetching team players statistics for {team_name} – {team_id}")
    # construct url
    request_url = f"{API_URL_PREFIX}/players"
    # data needed to be sent
    payload = {
        "team": team_id,
        "league": FOOTBALL_LEAGUE_ID[FOOTBALL_LEAGUE],
        "season": SEASON,
    }
    # fetch the information
    response = requests.request("GET", request_url, headers=HEADERS, params=payload)
    # construct file name
    file_name = f"{team_name}_{team_id}_{FOOTBALL_LEAGUE}_{FOOTBALL_LEAGUE_ID[FOOTBALL_LEAGUE]}.json"
    # construct file output 
    output_path = raw_data_directory / file_name
    print(f"Writing to {output_path}")
    with open(output_path, "w") as o_file:
        # printing the whole response raw
        json.dump(response.json()["response"], o_file, indent=4, sort_keys=True)

Loop through all football teams to get the appropriate information:

In [102]:
for team, t_id in FOOTBALL_TEAMS_ID.items():
    fetch_team_players_statistics(team, t_id)

Fetching team players statistics for Real Madrid – 541
Writing to raw/Real Madrid_541_UEFA Champions League_2.json
Fetching team players statistics for Manchester City – 50
Writing to raw/Manchester City_50_UEFA Champions League_2.json
Fetching team players statistics for Paris Saint Germain – 85
Writing to raw/Paris Saint Germain_85_UEFA Champions League_2.json
Fetching team players statistics for Bayern München – 157
Writing to raw/Bayern München_157_UEFA Champions League_2.json
