# **How the Top Ladder has evolved since the S2 - Hearthstone Battlegrounds ?**

## **Introduction**

This Jupyter notebook aims to trace the history of each Hearthstone Battlegrounds season by analyzing players who have achieved a rating of at least **8000 MMR**.

> By fetching these leaderboards via the Blizzard API and displaying them side by side, we will be able to:
>
> * Compare the distribution of top‑tier players across seasons
> * Track the number of participants crossing the **8000 MMR** threshold
> * Analyze regional dominance within the top ladder
> * Highlight additional dynamics and evolutions of the ladder
>

This holistic approach will provide a clear view of how the top ladder has evolved since Season 2, identifying key trends and standout moments from each period.


## **Part 1: Data Concatenation**

### **Objective**

In this section, we will implement reusable functions to load, clean and merge our leaderboard data into a single consolidated dataset. By the end of this step, all regional files from each season will be concatenated in a uniform format, ready for downstream analysis.

### **Directory Structure**
 
Our raw data are organized under a top‑level `data/` folder, with one subfolder per season (`s3`, `s8`, `s9`), and within each season folder, one subfolder per region (`EU`, `NA`, `AP`):

> ```text
> data/
> ├── s2/
> │   ├── EU/
> │   ├── NA/
> │   └── AP/
> .
> .
> .
> ├── s8/
> │   ├── EU/
> │   ├── NA/
> │   └── AP/
> └── s9/
>     ├── EU/
>     ├── NA/
>     └── AP/
> ```

We will walk through the creation of functions that traverse this structure, read each file, and concatenate them into a master DataFrame.

At the end we want to keep the informations related to the region and the season of each player. We will add two columns related to the name of the folder in our final dataframe. We will aslo filter the data by keeping the player that are above 8000 MMR (during the s3 all the players have been included in the leaderboard)


In [4]:
import pandas as pd
from pathlib import Path

In [5]:
def load_all_seasons_filtered(data_root: str, min_rating: int = 8000) -> pd.DataFrame:
    root = Path(data_root)
    all_dfs = []

    for season_dir in root.glob("s*"):
        season = season_dir.name  # ex. "s3", "s8", "s9"
        for region_dir in season_dir.iterdir():
            region = region_dir.name  # ex. "eu", "na", "ap"
            csv_path = region_dir / "battlegrounds.csv"
            if csv_path.exists():
                df = pd.read_csv(csv_path)
                df = df[df["rating"] >= min_rating]
                df["season"] = season
                df["region"] = region
                all_dfs.append(df)

    return pd.concat(all_dfs, ignore_index=True)

In [6]:
df_top = load_all_seasons_filtered("data", min_rating=8000)
df_top.head()

Unnamed: 0,rank,accountid,rating,season,region
0,1,LOUDER,17527,s3,eu
1,2,huyagaoshou,17245,s3,eu
2,3,Sevel,16688,s3,eu
3,4,douyumxjf,16054,s3,eu
4,5,wtybill,16034,s3,eu


## **Part 2: Evolution of the Number of Players on the Top Leaderboard**