# Fetching charts for May 2023

In this notebook, we make use of our custom script for fetching Spotify charts to retrieve recent charts for the 50 regions that also have good data coverage for the Kaggle Spotify Charts dataset (for details see the notebooks `0_process_initial_data.ipynb` and `1_analyze_data_completeness.ipynb`).

## Load region names

In [1]:
from helpers import create_data_path

def read_lines_from_file(filename):
    with open(create_data_path(filename), "r") as f:
        return [line.strip() for line in f.readlines()]

regions = read_lines_from_file(create_data_path("regions.txt"))
regions[:5]

['Slovakia', 'Netherlands', 'Portugal', 'Panama', 'Finland']

In [2]:
len(regions)

50

## Convert region names to region codes

We need to use a two-letter ISO codes for the countries instead of the raw region names. `global` should be used for the "global region".

In [3]:
import pandas as pd
iso_codes = pd.read_csv("https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv")

In [4]:
iso_codes.head()

Unnamed: 0,name,alpha-2,alpha-3,country-code,iso_3166-2,region,sub-region,intermediate-region,region-code,sub-region-code,intermediate-region-code
0,Afghanistan,AF,AFG,4,ISO 3166-2:AF,Asia,Southern Asia,,142.0,34.0,
1,Åland Islands,AX,ALA,248,ISO 3166-2:AX,Europe,Northern Europe,,150.0,154.0,
2,Albania,AL,ALB,8,ISO 3166-2:AL,Europe,Southern Europe,,150.0,39.0,
3,Algeria,DZ,DZA,12,ISO 3166-2:DZ,Africa,Northern Africa,,2.0,15.0,
4,American Samoa,AS,ASM,16,ISO 3166-2:AS,Oceania,Polynesia,,9.0,61.0,


In [5]:
iso_alpha2 = iso_codes[["name", "alpha-2"]].rename(columns={"alpha-2": "iso_alpha2"})
iso_alpha2

Unnamed: 0,name,iso_alpha2
0,Afghanistan,AF
1,Åland Islands,AX
2,Albania,AL
3,Algeria,DZ
4,American Samoa,AS
...,...,...
244,Wallis and Futuna,WF
245,Western Sahara,EH
246,Yemen,YE
247,Zambia,ZM


In [6]:
pd.DataFrame(regions, columns=["name"])
region_mappings = (
    pd.merge(pd.DataFrame(regions, columns=["name"]), iso_alpha2, on="name", how="left")
    .rename(columns={"iso_alpha2": "mapping"})
    .set_index("name")
)
region_mappings

Unnamed: 0_level_0,mapping
name,Unnamed: 1_level_1
Slovakia,SK
Netherlands,NL
Portugal,PT
Panama,PA
Finland,FI
Germany,DE
Peru,PE
Poland,PL
Chile,CL
Norway,NO


In [7]:
# store names of regions with wrong mappings for later tests
regions_without_mapping = region_mappings[region_mappings.mapping.isna()].index.tolist()
regions_without_mapping

['Taiwan',
 'Czech Republic',
 'United Kingdom',
 'Global',
 'United States',
 'Bolivia']

Let's fix the erroneous mappings:

In [8]:
region_mappings.loc["Taiwan", "mapping"] = "tw"
region_mappings.loc["Czech Republic", "mapping"] = "cz"
region_mappings.loc["United Kingdom", "mapping"] = "gb"
region_mappings.loc["Global", "mapping"] = "global"
region_mappings.loc["United States", "mapping"] = "us"
region_mappings.loc["Bolivia", "mapping"] = "bo"


Now, we need to convert all the mappings to lowercase:

In [9]:
region_mappings.mapping = region_mappings.mapping.str.lower()

Test if we remapped the missing mappings to the correct keys by calling our data fetching helper code once for each region (with the same date, fetching daily charts each time):

In [15]:
from helpers import setup_webdriver_for_download, download_region_chart_csv, create_data_path

driver = setup_webdriver_for_download(create_data_path("scraper_downloads"))

In [16]:
for region in regions_without_mapping:
    mapping = region_mappings.loc[region, "mapping"]
    download_region_chart_csv(driver, "2023-05-01", mapping, "daily")

driver.quit()

If we didn't get errors up until this point, everything is fine.

Let's store the region mappings:

In [22]:
mapping_out_path = create_data_path("region_mappings.csv")
region_mappings.to_csv(mapping_out_path)

We also write the list of region codes to a file so that we can pass it to our `download_charts.py` command line script.

In [23]:
region_mappings.to_csv(create_data_path("region_codes.txt"), index=False, header=False)