---
title: "imfp demo"
---





# imfp demo
This data analysis project aims to explore the relationship between economic growth and gender equality using `imfp`, which allows us to download data from IMF (International Monetary Fund).

In this project, we explored the following:

1. **Data Fetching**
* Make API call to fetch 4 datasets: GII (Gender Inequality Index), Nominal GDP, GDP Deflator Index, Population series

2. **Feature Engineering**
* Cleaning: Convert GDP Deflator Index to a yearly basis and variables to numeric
* Dependent Variable: Percent Change of Gender Inequality Index
* Independent Variable: Percent Change of Real GDP per Capita 
* Transform variables to display magnitude of change 
* Merge the datasets

3. **Data Visualization**
* Scatterplot
* Time Series Line Plots
* Barplot
* Boxplot
* Heatmap

4. **Statistical Analysis**
* Descriptive Statistics
* Regression Analysis
* Time Series Analysis

Ready for some insights about if a stronger economy mean more equal opportunities for all genders? Let’s dive in!

## Suggested packages
`imfp` can be integrated with other python tools to streamline the computational process.

`pandas`: view and manipulate data frame
`matplotlib.pyplot`: make plots
`seaborn`: make plots
`numpy`: computation
`LinearRegression`: implement linear regression
`tabulate`: format data into tables

utils is a a custom module that contains helper functions, reusable code, or general-purpose tools that simplify certain tasks. `load_or_fetch_databases`, `load_or_fetch_parameters` `load_or_fetch_dataset` are used to load and retreive database, parameters, and dataset from a local or remote source. `view_dataframe_in_browser` displays dataframe in a web browser.

`statsmodels.api`, `adfuller`, `ARIMA`,`VAR`,`plot_acf`,`plot_pacf`,`mean_absolute_error`,`mean_squared_error`, and`grangercausalitytests` are specifically used for time series analysis. 


In [None]:
import pandas as pd
from utils import load_or_fetch_databases, view_dataframe_in_browser
from utils import load_or_fetch_parameters
from utils import load_or_fetch_dataset
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.linear_model import LinearRegression
from tabulate import tabulate
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.vector_ar.var_model import VAR
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.stattools import grangercausalitytests

## Data Fetching


In [None]:
# Load or fetch databases
databases = load_or_fetch_databases()

# Filter out databases that contain a year in the description
databases[
  ~databases['description'].str.contains(r"[\d]{4}", regex=True)
]

# view_dataframe_in_browser(databases)

In [None]:
datasets = ["GENDER_EQUALITY", "IFS"]
params = {}

# Fetch valid parameters for two datasets
for dataset in datasets:
    params[dataset] = load_or_fetch_parameters(dataset)

    valid_keys = list(params[dataset].keys())
    print(f"Parameters for {dataset}: ", valid_keys)

# view_dataframe_in_browser(params.get("IFS").get("indicator"))

In [None]:
datasets = {}
dsets = [("GENDER_EQUALITY", "GE_GII"), ("IFS", "NGDP_D_SA_IX"), ("IFS", "NGDP_XDC"), ("IFS", "LP_PE_NUM")]

for dset in dsets:
    datasets[dset[0] + "." + dset[1]] = load_or_fetch_dataset(dset[0], dset[1])

In [None]:
# "Gender Inequality Index"
GII = "GENDER_EQUALITY.GE_GII"

# "Gross Domestic Product, Deflator, Seasonally Adjusted, Index"
GDP_deflator = "IFS.NGDP_D_SA_IX"

# "Gross Domestic Product, Nominal, Domestic Currency"
GDP_nominal = "IFS.NGDP_XDC"

# "Population, Persons, Number of"
GDP_population = "IFS.LP_PE_NUM"

# Assign the datasets to new variables so we don't change the originals
GII_data = datasets[GII]
GDP_deflator_data = datasets[GDP_deflator]
GDP_nominal_data = datasets[GDP_nominal]
GDP_population_data = datasets[GDP_population]