https://retepelyod2.files.wordpress.com/2024/01/right-primary.pdf
Peter Doyle Algo <br> 

1. find average gdp from 1990 to 2017, average annual growth of gdp per capita, average primary balance 1990 - 2019 in 2017 international dollars 
2. find 15 gdp neighbors each side of each country 
3. take the top 5, drop the top, 
4. take the bottom 5, drop the bottom
5. the 2nd highest is the higher band
6. the 2nd lowest is the lower band
7. the average of all 8 is the synthete per metric

the 2nd highest map

Question: <br>
1. why does the author omit oil producing countries? 
2. which primary balance to use? pb , GGCBP_G01_PGDP_PT

TODO: <br>
1. read https://www2.econ.iastate.edu/tesfatsi/Auyang.ComplexSystemsTheories.htm 

Ideas <br>
- can we do this kind of analysis for every metric?
- what if instead of droping the top, and bottom we just use the 99th and 1st percentile for each span of 15 
-  what if we form the neighbors in a different way, instead of 15 each way we just use percentiles 
- in general what is the correlation btw primary balance and gdp ? 
- could aim to filter out outliers better 
- relationship between growth rates and primary balance?
- take k means algo find clusters btw gdp and growth rate then look at the primary balance within those countries
- divide the data into four quadrants and look for clusters in the quandrants 
- scikit lear has cool functions to deal with imputation 
- this might be a nearest neighbor problem
- need to fix the colors


Notes<br>
- The analytic notion underlying the best peer/synthete analysis applied to Jamaica above is that at each
    level of GDP per capita there is some optimal balance between borrowing and taxation to deliver the
    quantum of public goods necessary for development at that level of income.
- Thus, the analysis applies the best-peer framework illustrated for Jamaica to every country for which
    data are available for 1990-2019 from the Fall 2023 IMF WEO, and aggregates the results globally.
- If I want to use clustering https://towardsdatascience.com/common-mistakes-in-cluster-analysis-and-how-to-avoid-them-eb960116d773  need to clean up the data 

In [1]:
import requests
import numpy as np
import polars as pl
import polars.selectors as cs
from dataclasses import dataclass
from typing import Optional

In [2]:
ALL_COUNTRIES = requests.get(
    "https://www.imf.org/external/datamapper/api/v1/countries"
).json()

In [3]:
INDICATORS = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/indicators"
    ).json()["indicators"]

In [8]:
# might want to generalize this any metrics instead of being hard coded
@dataclass
class Country:
    name: str
    abbreviation: str
    average_gdp_per_capita: float
    average_gdp_growth_rate: float
    average_primary_balance: float
    synthete_average_gdp_per_capita: float
    synthete_average_gdp_growth_rate: float
    synthete_average_primary_balance: float
    higher_band: 'Country'
    lower_band: 'Country'

In [4]:
def get_metric(metric: str) -> dict[str, dict[str,float]]:
    return requests.get(f"https://www.imf.org/external/datamapper/api/v1/{metric}").json()['values'][metric]

In [5]:
def get_mean_from_imf_dict(imf_dict: dict[str, dict[str,float]]) -> dict[str, float]:
    return {k: sum(v.values()) / len(v) for k, v in imf_dict.items()}

In [9]:
# likely could generalize this for any metric.
def find_synthete_average(countries: list[Country], position: int) -> tuple[float]:
    return (
        sum([
            countries[position-14].average_gdp_per_capita, 
            countries[position-13].average_gdp_per_capita, 
            countries[position-12].average_gdp_per_capita, 
            countries[position-11].average_gdp_per_capita,
            countries[position+14].average_gdp_per_capita, 
            countries[position+13].average_gdp_per_capita, 
            countries[position+12].average_gdp_per_capita, 
            countries[position+11].average_gdp_per_capita
        ]) / 8), (sum([
            countries[position-14].average_gdp_growth_rate, 
            countries[position-13].average_gdp_growth_rate, 
            countries[position-12].average_gdp_growth_rate, 
            countries[position-11].average_gdp_growth_rate,
            countries[position+14].average_gdp_growth_rate, 
            countries[position+13].average_gdp_growth_rate, 
            countries[position+12].average_gdp_growth_rate, 
            countries[position+11].average_gdp_growth_rate
        ]) / 8),(sum([
            countries[position-14].average_primary_balance, 
            countries[position-13].average_primary_balance, 
            countries[position-12].average_primary_balance, 
            countries[position-11].average_primary_balance,
            countries[position+14].average_primary_balance, 
            countries[position+13].average_primary_balance, 
            countries[position+12].average_primary_balance, 
            countries[position+11].average_primary_balance
        ]) / 8)

In [10]:
gdp_per_capita = get_metric('PPPPC')

In [11]:
average_gdp_per_capita = dict(sorted(get_mean_from_imf_dict(gdp_per_capita).items(),key =lambda x:x[1])) # sorted always returns a list

In [12]:
# since the gdps are sorted, we use this list to establish the rank in gdp 
countries = [
    Country(
        name= ALL_COUNTRIES['countries'][abbreviation]['label'],
        abbreviation=abbreviation,
        average_gdp_per_capita=average_gdp_per_capita,
        average_gdp_growth_rate=None,
        average_primary_balance=None,
        synthete_average_gdp_per_capita=None,
        synthete_average_gdp_growth_rate=None,
        synthete_average_primary_balance=None,
        higher_band=None,
        lower_band=None
    ) for abbreviation, average_gdp_per_capita in average_gdp_per_capita.items() if abbreviation in ALL_COUNTRIES['countries']]

In [13]:
gdp_growth_rate = get_metric('NGDP_RPCH')
primary_balance = get_metric('GGXCNL_NGDP')
average_gdp_growth_rate = get_mean_from_imf_dict(gdp_growth_rate)
average_primary_balance = get_mean_from_imf_dict(primary_balance)

In [15]:
for country in countries:
    country.average_gdp_growth_rate = average_gdp_growth_rate[country.abbreviation]
    country.average_primary_balance = average_primary_balance[country.abbreviation]

In [16]:
for i,country in enumerate(countries):
    if (i - 14) < 0 or (i + 14) >= len(countries):
        continue
    country.higher_band = countries[i+14]
    country.lower_band = countries[i-14]
    country.synthete_average_gdp_per_capita, country.synthete_average_gdp_growth_rate,  country.synthete_average_primary_balance = find_synthete_average(countries, i)

In [17]:
countries

[Country(name='Burundi', abbreviation='BDI', average_gdp_per_capita=632.3505306122448, average_gdp_growth_rate=2.4387755102040813, average_primary_balance=-5.087179487179488, synthete_average_gdp_per_capita=None, synthete_average_gdp_growth_rate=None, synthete_average_primary_balance=None, higher_band=None, lower_band=None),
 Country(name='Mozambique', abbreviation='MOZ', average_gdp_per_capita=806.6956122448981, average_gdp_growth_rate=5.210204081632654, average_primary_balance=-3.951020408163265, synthete_average_gdp_per_capita=None, synthete_average_gdp_growth_rate=None, synthete_average_primary_balance=None, higher_band=None, lower_band=None),
 Country(name='Central African Republic', abbreviation='CAF', average_gdp_per_capita=824.6955714285715, average_gdp_growth_rate=1.5408163265306116, average_primary_balance=-2.2170731707317075, synthete_average_gdp_per_capita=None, synthete_average_gdp_growth_rate=None, synthete_average_primary_balance=None, higher_band=None, lower_band=None),

# POLARS DRAFT

In [None]:
df = pl.from_dicts(
            data=[{"Country": country, **gdp_per_capita[country]} for country in gdp_per_capita],
            schema=[
                "Country",
                "1980",
                "1981",
                "1982",
                "1983",
                "1984",
                "1985",
                "1986",
                "1987",
                "1988",
                "1989",
                "1990",
                "1991",
                "1992",
                "1993",
                "1994",
                "1995",
                "1996",
                "1997",
                "1998",
                "1999",
                "2000",
                "2001",
                "2002",
                "2003",
                "2004",
                "2005",
                "2006",
                "2007",
                "2008",
                "2009",
                "2010",
                "2011",
                "2012",
                "2013",
                "2014",
                "2015",
                "2016",
                "2017",
                "2018",
                "2019",
                "2020",
                "2021",
                "2022",
                "2023",
            ],
        ).melt(
            id_vars="Country",
            value_vars=cs.numeric(),
            variable_name='Year',
            value_name='GDP per cap',
        ).group_by("Country", maintain_order=True).agg(pl.col("GDP per cap").mean()).sort("GDP per cap")

In [None]:
df

# Clustering Draft

In [None]:
import requests
import numpy as np
import polars as pl
import polars.selectors as cs
import sklearn.cluster as cluster
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.palettes import Category10
output_notebook()

In [None]:
indicators = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/indicators"
    ).json()["indicators"]

In [None]:
indicators

In [None]:
average_gdp_per_capita = {country: sum(gdp_values.values()) / len(gdp_values) for country, gdp_values in gdp_per_capita.items()}


In [None]:
growth_rate = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/NGDP_RPCH"
    ).json()['values']['NGDP_RPCH']

In [None]:
average_growth_rate = {country: sum(growth_rates.values()) / len(growth_rates) for country, growth_rates in growth_rate.items()}

In [None]:
average_gdp_per_capitas = list(average_gdp_per_capita.values())
average_growth_rates = list(average_growth_rate.values())
features = np.array(list(zip(average_gdp_per_capitas, average_growth_rates)))

In [None]:
X = np.array(features)

In [None]:
kmeans = cluster.KMeans(n_clusters=15, algorithm='elkan')  # Adjust the number of clusters as needed
kmeans.fit(X)

In [None]:
labels = kmeans.labels_

In [None]:
p = figure(output_backend="webgl", title='test', width=400, height=400)
colors = np.array([x for x in ('#00f', '#0f0', '#f00', '#0ff', '#f0f', '#ff0')])
colors = np.hstack([colors] * 20)
p = figure(title='K-Means Clustering', x_axis_label='GDP', y_axis_label='Growth Rate')
p.scatter(X[:, 0], X[:, 1], color=colors[labels].tolist(), legend_field='Cluster', size=10)
show(p)

In [None]:
primary_balance = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/GGXONLB_G01_GDP_PT"
    ).json()['values']['GGXONLB_G01_GDP_PT']

In [None]:
all_countries = requests.get(
        "https://www.imf.org/external/datamapper/api/v1/countries"
    ).json()

In [None]:
all_countries

In [None]:
mapping_countries = {k : v["label"] for k, v in all_countries["countries"].items()}

In [None]:
mapping_countries

In [None]:
formatted_gdp_per_capita = {mapping_countries.get(k, None): v for k, v in gdp_per_capita.items() if mapping_countries.get(k, None)}


In [None]:
formatted_growth_rate = {mapping_countries.get(k, None): v for k, v in growth_rate.items() if mapping_countries.get(k, None)}


In [None]:
formatted_primary_balance = {mapping_countries.get(k, None): v for k, v in primary_balance.items() if mapping_countries.get(k, None)}


In [None]:
gdp_per_capita_df = pl.from_dicts(
    data=[{"Country": country, **formatted_gdp_per_capita[country]} for country in formatted_gdp_per_capita],
    schema=[
        "Country",
        # "1980",
        # "1981",
        # "1982",
        # "1983",
        # "1984",
        # "1985",
        # "1986",
        # "1987",
        # "1988",
        # "1989",
        "1990",
        "1991",
        "1992",
        "1993",
        "1994",
        "1995",
        "1996",
        "1997",
        "1998",
        "1999",
        "2000",
        "2001",
        "2002",
        "2003",
        "2004",
        "2005",
        "2006",
        "2007",
        "2008",
        "2009",
        "2010",
        "2011",
        "2012",
        "2013",
        "2014",
        "2015",
        "2016",
        "2017",
        "2018",
        "2019",
        # "2020",
        # "2021",
        # "2022",
        # "2023",
    ],
)

In [None]:
growth_rate_df = pl.from_dicts(
    data=[{"Country": country, **formatted_growth_rate[country]} for country in formatted_growth_rate],
    schema=[
        "Country",
        "1980",
        "1981",
        "1982",
        "1983",
        "1984",
        "1985",
        "1986",
        "1987",
        "1988",
        "1989",
        "1990",
        "1991",
        "1992",
        "1993",
        "1994",
        "1995",
        "1996",
        "1997",
        "1998",
        "1999",
        "2000",
        "2001",
        "2002",
        "2003",
        "2004",
        "2005",
        "2006",
        "2007",
        "2008",
        "2009",
        "2010",
        "2011",
        "2012",
        "2013",
        "2014",
        "2015",
        "2016",
        "2017",
        "2018",
        "2019",
        "2020",
        "2021",
        "2022",
        "2023",
    ],
)

In [None]:
primary_balance_df = pl.from_dicts(
    data=[{"Country": country, **formatted_primary_balance[country]} for country in formatted_primary_balance],
    schema=[
        "Country",
        "1980",
        "1981",
        "1982",
        "1983",
        "1984",
        "1985",
        "1986",
        "1987",
        "1988",
        "1989",
        "1990",
        "1991",
        "1992",
        "1993",
        "1994",
        "1995",
        "1996",
        "1997",
        "1998",
        "1999",
        "2000",
        "2001",
        "2002",
        "2003",
        "2004",
        "2005",
        "2006",
        "2007",
        "2008",
        "2009",
        "2010",
        "2011",
        "2012",
        "2013",
        "2014",
        "2015",
        "2016",
        "2017",
        "2018",
        "2019",
        "2020",
        "2021",
        "2022",
        "2023",
    ],
)

In [None]:
gdp_per_capita_df

In [None]:
growth_rate_df

In [None]:
primary_balance_df

In [None]:
gdp_per_capita_df.melt(
            id_vars="Country",
            value_vars=cs.numeric()
).group_by('Country', maintain_order=True).agg(pl.col("value")
.mean()).sort("value").filter(pl.col("Country") == "Jamaica")

In [None]:
growth_rate_df.melt(
            id_vars="Country",
            value_vars=cs.numeric()
).group_by('Country', maintain_order=True).agg(pl.col("value")
.mean()).sort("value").filter(pl.col("Country") == "Jamaica")

In [None]:
primary_balance_df.melt(
            id_vars="Country",
            value_vars=cs.numeric()
).group_by('Country', maintain_order=True).agg(pl.col("value")
.mean()).sort("value").filter(pl.col("Country") == "Jamaica")

In [None]:
pl.Config(tbl_rows=-1)

In [None]:
cluster.KMeans([4, 