https://retepelyod2.files.wordpress.com/2024/01/right-primary.pdf
Peter Doyle Algo <br> 

1. find average gdp from 1990 to 2017, average annual growth of gdp per capita, average primary balance 1990 - 2019 in 2017 international dollars 
2. find 15 gdp neighbors each side of each country 
3. take the top 5, drop the top, 
4. take the bottom 5, drop the bottom
5. the 2nd highest is the higher band
6. the 2nd lowest is the lower band
7. the average of all 8 is the synthete per metric

the 2nd highest map

Question: <br>
1. why does the author omit oil producing countries? 
2. which primary balance to use? pb , GGCBP_G01_PGDP_PT

TODO: <br>
1. read https://www2.econ.iastate.edu/tesfatsi/Auyang.ComplexSystemsTheories.htm 

Ideas <br>
- can we do this kind of analysis for every metric?
- what if instead of droping the top, and bottom we just use the 99th and 1st percentile for each span of 15 
-  what if we form the neighbors in a different way, instead of 15 each way we just use percentiles 
- in general what is the correlation btw primary balance and gdp ? 
- could aim to filter out outliers better 
- relationship between growth rates and primary balance?
- take k means algo find clusters btw gdp and growth rate then look at the primary balance within those countries
- divide the data into four quadrants and look for clusters in the quandrants 
- scikit lear has cool functions to deal with imputation 
- this might be a nearest neighbor problem
- need to fix the colors


Notes<br>
- The analytic notion underlying the best peer/synthete analysis applied to Jamaica above is that at each
    level of GDP per capita there is some optimal balance between borrowing and taxation to deliver the
    quantum of public goods necessary for development at that level of income.
- Thus, the analysis applies the best-peer framework illustrated for Jamaica to every country for which
    data are available for 1990-2019 from the Fall 2023 IMF WEO, and aggregates the results globally.
- If I want to use clustering https://towardsdatascience.com/common-mistakes-in-cluster-analysis-and-how-to-avoid-them-eb960116d773  need to clean up the data 

In [2]:
import requests
import numpy as np
import polars as pl
import polars.selectors as cs
from dataclasses import dataclass
from typing import Optional

In [3]:
all_countries = requests.get(
    "https://www.imf.org/external/datamapper/api/v1/countries"
).json()

In [4]:
indicators = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/indicators"
    ).json()["indicators"]

In [5]:
indicators

{'NGDP_RPCH': {'label': 'Real GDP growth',
  'description': "Gross domestic product is the most commonly used single measure of a country's overall economic activity. It represents the total value at constant prices of final goods and services produced within a country during a specified time period, such as one year.",
  'source': 'World Economic Outlook (April 2025)',
  'unit': 'Annual percent change',
  'dataset': 'WEO'},
 'NGDPD': {'label': 'GDP, current prices',
  'description': "Gross domestic product is the most commonly used single measure of a country's overall economic activity. It represents the total value at current prices of final goods and services produced within a country during a specified time period, such as one year.",
  'source': 'World Economic Outlook (April 2025)',
  'unit': 'Billions of U.S. dollars',
  'dataset': 'WEO'},
 'NGDPDPC': {'label': 'GDP per capita, current prices\n',
  'description': "Gross domestic product is the most commonly used single measure 

In [43]:
groups = requests.get("https://www.imf.org/external/datamapper/api/v1/groups").json()['groups']

In [1]:
METRICS_FOR_ANALYSIS = {
    "PPPGDP", # GDP, current prices
    "NGDPDPC", # GDP per capita, current prices
    "NGDPRPC_PCH", # Real Per Capita GDP Growth
    "NGDP_RPCH", # Real GDP growth
    "pb", # Government primary balance, percent of GDP
    "GGXONLB_G01_GDP_PT", # Primary net lending/borrowing (also referred as primary balance)
    "GGXWDG_GDP", # Government Debt (% of GDP)
    "GGXWDG_NGDP", # General government gross debt
    'd', # Gross public debt, percent of GDP
}

In [4]:
# might want to generalize this any metrics instead of being hard coded
@dataclass
class Country:
    name: str
    abbreviation: str
    average_gdp_per_capita: float
    average_gdp_growth_rate: float
    average_primary_balance: float
    synthete_average_gdp_per_capita: float
    synthete_average_gdp_growth_rate: float
    synthete_average_primary_balance: float
    higher_band: 'Country'
    lower_band: 'Country'

In [5]:
def get_metric(metric: str) -> dict[str, dict[str,float]]:
    return requests.get(f"https://www.imf.org/external/datamapper/api/v1/{metric}").json()['values'][metric]

In [6]:
def get_mean_from_imf_dict(imf_dict: dict[str, dict[str,float]]) -> dict[str, float]:
    return {k: sum(v.values()) / len(v) for k, v in imf_dict.items()}

In [7]:
# likely could generalize this for any metric.
def find_synthete_average(countries: list[Country], position: int) -> tuple[float]:
    return (
        sum([
            countries[position-14].average_gdp_per_capita, 
            countries[position-13].average_gdp_per_capita, 
            countries[position-12].average_gdp_per_capita, 
            countries[position-11].average_gdp_per_capita,
            countries[position+14].average_gdp_per_capita, 
            countries[position+13].average_gdp_per_capita, 
            countries[position+12].average_gdp_per_capita, 
            countries[position+11].average_gdp_per_capita
        ]) / 8), (sum([
            countries[position-14].average_gdp_growth_rate, 
            countries[position-13].average_gdp_growth_rate, 
            countries[position-12].average_gdp_growth_rate, 
            countries[position-11].average_gdp_growth_rate,
            countries[position+14].average_gdp_growth_rate, 
            countries[position+13].average_gdp_growth_rate, 
            countries[position+12].average_gdp_growth_rate, 
            countries[position+11].average_gdp_growth_rate
        ]) / 8),(sum([
            countries[position-14].average_primary_balance, 
            countries[position-13].average_primary_balance, 
            countries[position-12].average_primary_balance, 
            countries[position-11].average_primary_balance,
            countries[position+14].average_primary_balance, 
            countries[position+13].average_primary_balance, 
            countries[position+12].average_primary_balance, 
            countries[position+11].average_primary_balance
        ]) / 8)

In [18]:
gdp_per_capita = get_metric('PPPPC')

In [19]:
remove_years =  ['1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989',  '2020', '2021', '2022', '2023', '2024', '2025', '2026', '2027', '2028'] # likely can change query from imf to avoid this step

In [20]:
[[data.pop(year)for year in remove_years if year in data.keys()] for data in gdp_per_capita.values()]

[[12181.979,
  14654.968,
  17898,
  20427.295,
  30304.505,
  40750.606,
  48340.816,
  51352.313,
  53195.646,
  54988.496,
  56759.103,
  58538.788,
  60409.691],
 [10260.371,
  11362.422,
  12004.287,
  12787.003,
  13803.545,
  14680.932,
  15371.884,
  16251.779,
  17508.761,
  18778.864,
  52244.911,
  57580.231,
  62968.081,
  66008.084,
  68275,
  70734.531,
  73266.403,
  75754.988,
  78301.4],
 [2459.197, 1990.994],
 [1857.772,
  1968.754,
  2088.655,
  2137.808,
  2215.408,
  2282.243,
  2301.189,
  2314.779,
  2414.988,
  2521.274,
  5237.51,
  5596.901,
  6075.719,
  6340.756,
  6571.455,
  6825.053,
  7075.879,
  7336.102,
  7613.531],
 [1316.536,
  1341.427,
  1387.605,
  1464.173,
  1566.622,
  1484.421,
  1514.941,
  1575.53,
  1685.404,
  1705.452,
  6362.111,
  6517.945,
  6943.912,
  7077.302,
  7256.908,
  7430.436,
  7607.938,
  7787.971,
  7980.622],
 [2155.361,
  2444.301,
  2614.754,
  2689.389,
  2783.011,
  2770.393,
  2927.299,
  2916.869,
  2921.8,
  3245.

In [21]:
average_gdp_per_capita = dict(sorted(get_mean_from_imf_dict(gdp_per_capita).items(),key =lambda x:x[1])) # sorted always returns a list

In [22]:
# since the gdps are sorted, we use this list to establish the rank in gdp 
countries = [
    Country(
        name= ALL_COUNTRIES['countries'][abbreviation]['label'],
        abbreviation=abbreviation,
        average_gdp_per_capita=average_gdp_per_capita,
        average_gdp_growth_rate=None,
        average_primary_balance=None,
        synthete_average_gdp_per_capita=None,
        synthete_average_gdp_growth_rate=None,
        synthete_average_primary_balance=None,
        higher_band=None,
        lower_band=None
    ) for abbreviation, average_gdp_per_capita in average_gdp_per_capita.items() if abbreviation in ALL_COUNTRIES['countries']]

In [23]:
gdp_growth_rate = get_metric('NGDP_RPCH')
primary_balance = get_metric('GGXCNL_NGDP')
average_gdp_growth_rate = get_mean_from_imf_dict(gdp_growth_rate)
average_primary_balance = get_mean_from_imf_dict(primary_balance)

In [31]:
primary_balance['JAM']

{'1990': 2.1,
 '1991': 3.2,
 '1992': 2.9,
 '1993': 2.4,
 '1994': 2.5,
 '1995': 1.6,
 '1996': -5.4,
 '1997': -6.6,
 '1998': -5.9,
 '1999': -3.5,
 '2000': -0.8,
 '2001': -4.9,
 '2002': -6.7,
 '2003': -5.6,
 '2004': -4.7,
 '2005': -3.3,
 '2006': -4.9,
 '2007': -3.8,
 '2008': -7.5,
 '2009': -11.1,
 '2010': -6.3,
 '2011': -6.4,
 '2012': -4.1,
 '2013': 0.1,
 '2014': -0.5,
 '2015': -0.3,
 '2016': -0.2,
 '2017': 0.5,
 '2018': 1.2,
 '2019': 0.9,
 '2020': -3.1,
 '2021': 0.9,
 '2022': 0.3,
 '2023': 0.3,
 '2024': 0.3,
 '2025': 0.6,
 '2026': 1,
 '2027': 1,
 '2028': 1.4}

In [24]:
for country in countries:
    country.average_gdp_growth_rate = average_gdp_growth_rate[country.abbreviation]
    country.average_primary_balance = average_primary_balance[country.abbreviation]

In [25]:
for i,country in enumerate(countries):
    if (i - 14) < 0 or (i + 14) >= len(countries):
        continue
    country.higher_band = countries[i+14] 
    country.lower_band = countries[i-14]
    country.synthete_average_gdp_per_capita, country.synthete_average_gdp_growth_rate, country.synthete_average_primary_balance = find_synthete_average(countries, i)

In [26]:
for i, country in enumerate(countries):
    if country.name=='Jamaica':
        jamaica = country
        print(i)
        break

91


In [28]:
jamaica.average_primary_balance 

-1.8564102564102563

- issue average gdp, annual growth , and primary balance is different for jamaica here vs the report
- government revenue and non-interest expenditure—the primary balance.
- can't find his source of data https://www.imf.org/external/datamapper/datasets/WEO  

In [None]:
jamaica.synthete_average_gdp_growth_rate - jamaica.average_gdp_growth_rate 

In [None]:
jamaica.synthete_average_gdp_per_capita - jamaica.average_gdp_per_capita 

In [None]:
jamaica.synthete_average_primary_balance - jamaica.average_primary_balance

# POLARS DRAFT

In [None]:
df = pl.from_dicts(
            data=[{"Country": country, **gdp_per_capita[country]} for country in gdp_per_capita],
            schema=[
                "Country",
                "1980",
                "1981",
                "1982",
                "1983",
                "1984",
                "1985",
                "1986",
                "1987",
                "1988",
                "1989",
                "1990",
                "1991",
                "1992",
                "1993",
                "1994",
                "1995",
                "1996",
                "1997",
                "1998",
                "1999",
                "2000",
                "2001",
                "2002",
                "2003",
                "2004",
                "2005",
                "2006",
                "2007",
                "2008",
                "2009",
                "2010",
                "2011",
                "2012",
                "2013",
                "2014",
                "2015",
                "2016",
                "2017",
                "2018",
                "2019",
                "2020",
                "2021",
                "2022",
                "2023",
            ],
        ).melt(
            id_vars="Country",
            value_vars=cs.numeric(),
            variable_name='Year',
            value_name='GDP per cap',
        ).group_by("Country", maintain_order=True).agg(pl.col("GDP per cap").mean()).sort("GDP per cap")

In [None]:
df

# Clustering Draft

In [None]:
import requests
import numpy as np
import polars as pl
import polars.selectors as cs
import sklearn.cluster as cluster
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.palettes import Category10
output_notebook()

In [None]:
indicators = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/indicators"
    ).json()["indicators"]

In [None]:
indicators

In [None]:
average_gdp_per_capita = {country: sum(gdp_values.values()) / len(gdp_values) for country, gdp_values in gdp_per_capita.items()}


In [None]:
growth_rate = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/NGDP_RPCH"
    ).json()['values']['NGDP_RPCH']

In [None]:
average_growth_rate = {country: sum(growth_rates.values()) / len(growth_rates) for country, growth_rates in growth_rate.items()}

In [None]:
average_gdp_per_capitas = list(average_gdp_per_capita.values())
average_growth_rates = list(average_growth_rate.values())
features = np.array(list(zip(average_gdp_per_capitas, average_growth_rates)))

In [None]:
X = np.array(features)

In [None]:
kmeans = cluster.KMeans(n_clusters=15, algorithm='elkan')  # Adjust the number of clusters as needed
kmeans.fit(X)

In [None]:
labels = kmeans.labels_

In [None]:
p = figure(output_backend="webgl", title='test', width=400, height=400)
colors = np.array([x for x in ('#00f', '#0f0', '#f00', '#0ff', '#f0f', '#ff0')])
colors = np.hstack([colors] * 20)
p = figure(title='K-Means Clustering', x_axis_label='GDP', y_axis_label='Growth Rate')
p.scatter(X[:, 0], X[:, 1], color=colors[labels].tolist(), legend_field='Cluster', size=10)
show(p)

In [None]:
primary_balance = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/GGXONLB_G01_GDP_PT"
    ).json()['values']['GGXONLB_G01_GDP_PT']

In [None]:
all_countries = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/countries"
    ).json()

In [None]:
all_countries

In [None]:
mapping_countries = {k : v["label"] for k, v in all_countries["countries"].items()}

In [None]:
mapping_countries

In [None]:
formatted_gdp_per_capita = {mapping_countries.get(k, None): v for k, v in gdp_per_capita.items() if mapping_countries.get(k, None)}


In [None]:
formatted_growth_rate = {mapping_countries.get(k, None): v for k, v in growth_rate.items() if mapping_countries.get(k, None)}


In [None]:
formatted_primary_balance = {mapping_countries.get(k, None): v for k, v in primary_balance.items() if mapping_countries.get(k, None)}


In [None]:
gdp_per_capita_df = pl.from_dicts(
    data=[{"Country": country, **formatted_gdp_per_capita[country]} for country in formatted_gdp_per_capita],
    schema=[
        "Country",
        # "1980",
        # "1981",
        # "1982",
        # "1983",
        # "1984",
        # "1985",
        # "1986",
        # "1987",
        # "1988",
        # "1989",
        "1990",
        "1991",
        "1992",
        "1993",
        "1994",
        "1995",
        "1996",
        "1997",
        "1998",
        "1999",
        "2000",
        "2001",
        "2002",
        "2003",
        "2004",
        "2005",
        "2006",
        "2007",
        "2008",
        "2009",
        "2010",
        "2011",
        "2012",
        "2013",
        "2014",
        "2015",
        "2016",
        "2017",
        "2018",
        "2019",
        # "2020",
        # "2021",
        # "2022",
        # "2023",
    ],
)

In [None]:
growth_rate_df = pl.from_dicts(
    data=[{"Country": country, **formatted_growth_rate[country]} for country in formatted_growth_rate],
    schema=[
        "Country",
        "1980",
        "1981",
        "1982",
        "1983",
        "1984",
        "1985",
        "1986",
        "1987",
        "1988",
        "1989",
        "1990",
        "1991",
        "1992",
        "1993",
        "1994",
        "1995",
        "1996",
        "1997",
        "1998",
        "1999",
        "2000",
        "2001",
        "2002",
        "2003",
        "2004",
        "2005",
        "2006",
        "2007",
        "2008",
        "2009",
        "2010",
        "2011",
        "2012",
        "2013",
        "2014",
        "2015",
        "2016",
        "2017",
        "2018",
        "2019",
        "2020",
        "2021",
        "2022",
        "2023",
    ],
)

In [None]:
primary_balance_df = pl.from_dicts(
    data=[{"Country": country, **formatted_primary_balance[country]} for country in formatted_primary_balance],
    schema=[
        "Country",
        "1980",
        "1981",
        "1982",
        "1983",
        "1984",
        "1985",
        "1986",
        "1987",
        "1988",
        "1989",
        "1990",
        "1991",
        "1992",
        "1993",
        "1994",
        "1995",
        "1996",
        "1997",
        "1998",
        "1999",
        "2000",
        "2001",
        "2002",
        "2003",
        "2004",
        "2005",
        "2006",
        "2007",
        "2008",
        "2009",
        "2010",
        "2011",
        "2012",
        "2013",
        "2014",
        "2015",
        "2016",
        "2017",
        "2018",
        "2019",
        "2020",
        "2021",
        "2022",
        "2023",
    ],
)

In [None]:
gdp_per_capita_df

In [None]:
growth_rate_df

In [None]:
primary_balance_df

In [None]:
gdp_per_capita_df.melt(
            id_vars="Country",
            value_vars=cs.numeric()
).group_by('Country', maintain_order=True).agg(pl.col("value")
.mean()).sort("value").filter(pl.col("Country") == "Jamaica")

In [None]:
growth_rate_df.melt(
            id_vars="Country",
            value_vars=cs.numeric()
).group_by('Country', maintain_order=True).agg(pl.col("value")
.mean()).sort("value").filter(pl.col("Country") == "Jamaica")

In [None]:
primary_balance_df.melt(
            id_vars="Country",
            value_vars=cs.numeric()
).group_by('Country', maintain_order=True).agg(pl.col("value")
.mean()).sort("value").filter(pl.col("Country") == "Jamaica")

In [None]:
pl.Config(tbl_rows=-1)

In [None]:
cluster.KMeans([4, 

In [32]:
pip install pydrive

Collecting pydrive
  Using cached PyDrive-1.3.1-py3-none-any.whl
Collecting google-api-python-client>=1.2 (from pydrive)
  Downloading google_api_python_client-2.116.0-py2.py3-none-any.whl.metadata (6.6 kB)
Collecting oauth2client>=4.0.0 (from pydrive)
  Downloading oauth2client-4.1.3-py2.py3-none-any.whl (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.2/98.2 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
Collecting httplib2<1.dev0,>=0.15.0 (from google-api-python-client>=1.2->pydrive)
  Downloading httplib2-0.22.0-py3-none-any.whl (96 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.9/96.9 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting google-auth<3.0.0.dev0,>=1.19.0 (from google-api-python-client>=1.2->pydrive)
  Downloading google_auth-2.27.0-py2.py3-none-any.whl.metadata (4.7 kB)
Collecting google-auth-httplib2>=0.1.0 (from google-api-python-client>=1.2->pydrive)
  Downloading google_auth_httplib2-0.2.0-py2.py3-none-an