## Step 1: Imports
In this step, all the essential libraries are imported:
- `asyncio`: To run asynchronous operations.
- `Understat`: Python library to scrape data from the Understat website.
- `nest_asyncio`: Allows nested event loops (required to run async code inside Jupyter).
- `pandas`: For handling and analyzing the data.
- `aiohttp`: Asynchronous HTTP client to make network requests.

In [9]:
import asyncio
from understat import Understat
import nest_asyncio
import pandas as pd
import aiohttp

---
## Step 2: Prepare Event Loop
Jupyter Notebook can't run asynchronous loops directly like a normal Python script.
To fix this, `nest_asyncio` was applied which patches Jupyter to support running asynchronous tasks inside cells.

In [10]:
nest_asyncio.apply()

---
## Step 3: Define Scraper Function
An **asynchronous function** was created to scrape player-level data from the Premier League for the 2024/25 season.
This function:
- Starts an aiohttp session.
- Connects to the Understat API.
- Retrieves player data for the **EPL** in the **2024 season** (which is the 2024/25 season on Understat).

In [11]:
async def get_understat_data():
    async with aiohttp.ClientSession() as session:
        understat = Understat(session)
        players = await understat.get_league_players(
            league_name="EPL",    
            season="2024"       
        )
        return players

---
## Step 4: Run Scraper
The async function was run using `asyncio.run()` which executes the scraping process and stores the returned data.

In [12]:
players_data = asyncio.run(get_understat_data())

---
## Step 5: Convert to DataFrame
The scraped data is in a dictionary format.
It was converted into a pandas **DataFrame** to make it easy to inspect, clean, and analyze.

In [13]:
df = pd.DataFrame(players_data)

---
## Step 6: Preview the Dataset
Here, the **first five rows** were displayed to check the structure.

In [14]:
df.head()

Unnamed: 0,id,player_name,games,time,goals,xG,assists,xA,shots,key_passes,yellow_cards,red_cards,position,team_title,npg,npxG,xGChain,xGBuildup
0,1250,Mohamed Salah,38,3392,29,27.70626749098301,18,15.858334187418222,130,89,1,0,F M,Liverpool,20,20.855747912079096,48.53588879853487,16.20675839856267
1,5232,Alexander Isak,34,2822,23,22.356988068670034,6,5.44870379474014,99,42,1,0,F,Newcastle United,19,19.312312599271536,26.209551103413105,3.579237926751375
2,8260,Erling Haaland,31,2749,22,23.95459282770753,3,3.5812273556366563,109,29,2,0,F,Manchester City,19,20.90991743281484,22.845245644450188,3.5353690683841705
3,4456,Chris Wood,36,3024,20,15.638655036687853,3,3.044111367315054,68,22,1,0,F S,Nottingham Forest,17,13.355148404836656,14.72016467526555,1.6692094188183546
4,6552,Bryan Mbeumo,38,3419,20,13.63216146454215,7,10.376488702371716,86,70,3,0,D F M,Brentford,15,9.06514835730195,24.377113293856382,9.351834732107818


---
## Step 7: Save Dataset Locally
Finally, the cleaned dataset is saved as a **CSV file**.
This makes it easy to use in future analysis without needing to scrape again.

In [15]:
df.to_csv('EPL_2024_25_Understat.csv', index=False)
print('Dataset saved successfully!')

Dataset saved successfully!
