In [128]:
import requests
import pandas as pd
import sqlite3
import os


ðŸ“Š Macroeconomic Indicators (U.S.)
To enrich the credit scoring model and capture the economic context at the time loans were issued, official macroeconomic indicators from the United States were incorporated.

The following variables were used:

- Inflation (CPI): Consumer Price Index, monthly frequency. It reflects changes in the general price level. This allows the model to adjust credit risk by accounting for the loss of borrowersâ€™ purchasing power.

- Gross Domestic Product (GDP): Quarterly frequency (resampled to monthly using forward fill). It indicates the overall economic health of the country. Sustained economic growth is typically associated with lower credit risk.

- Unemployment Rate: Monthly frequency. Measures the proportion of the active population that is unemployed. Higher unemployment levels increase the probability of loan default.

- Federal Funds Rate (FEDFUNDS): Monthly frequency. Represents the cost of money in the economy and influences loan interest rates. It helps model how changes in monetary policy affect borrowersâ€™ repayment capacity.

- ersonal Consumption Expenditures (PCE): Monthly frequency. Measures household spending on goods and services, adjusted for inflation. It is an indicator of internal demand and consumer solvency.

- Total Consumer Credit (TOTALSL): Quarterly frequency (resampled to monthly using forward fill). Reflects household indebtedness. High levels of total consumer credit may indicate increased exposure to default risk.

Data Source

The data were obtained through the FRED (Federal Reserve Economic Data) API, managed by the Federal Reserve Bank of St. Louis, which provides official and up-to-date economic time series from the United States government.

Data Collection Process

- Access to the FRED API was established using a personal API key.

- Historical time series corresponding to the analysis period of the Lending Club dataset were downloaded.

- The data were processed and transformed into tabular format, adapting all series to a monthly frequency when necessary (using forward fill for quarterly variables).

- Finally, the macroeconomic indicators were integrated into the project to be later joined with loan-level data based on the loan issue date.

Justification for Inclusion

Incorporating these variables allows the model to account for the overall economic environment, capturing external factors that influence borrowersâ€™ repayment capacity and financial behavior. This improves both the explanatory power and predictive performance of the credit risk analysis.

In [None]:
api_key = os.getenv("FRED_API_KEY") # para obtener la informacion tenemos que aÃ±adir nuestra api_key 

We confirm from which years we will take data, according to the main DF which is the one we handle in the EDA, we have that the initial year is 2010 and the final year is 2018


In [130]:
API_KEY = os.getenv("FRED_API_KEY")
BASE_URL = "https://api.stlouisfed.org/fred/series/observations"

# FunciÃ³n para descargar series econÃ³micas desde la API de FRED (Banco de la Reserva Federal de St. Louis)
# y convertirlas en DataFrames de pandas con columnas 'date' y 'value'.

def get_fred_series(series_id, start_date="2010-01-01"):
    params = {
        "series_id": series_id,
        "api_key": API_KEY,
        "file_type": "json",
        "observation_start": start_date
    }
    response = requests.get(BASE_URL, params=params)
    data = response.json()["observations"]
    
    df = pd.DataFrame(data)
    df["date"] = pd.to_datetime(df["date"])
    df["value"] = pd.to_numeric(df["value"], errors="coerce")
    
    return df[["date", "value"]]

# Descargamos las series

inflation = get_fred_series("CPIAUCSL")
inflation.rename(columns={"value": "inflation_cpi"}, inplace=True)

gdp = get_fred_series("GDP")
gdp.rename(columns={"value": "gdp"}, inplace=True)

unemployment = get_fred_series("UNRATE")
unemployment.rename(columns={"value": "unemployment_rate"}, inplace=True)

fedfunds = get_fred_series("FEDFUNDS")
fedfunds.rename(columns={"value": "fed_funds_rate"}, inplace=True)

pce = get_fred_series("PCE")
pce.rename(columns={"value": "pce"}, inplace=True)

total_credit = get_fred_series("TOTALSL")
total_credit.rename(columns={"value": "total_consumer_credit"}, inplace=True)

# GDP y Total Credit resampleados a mensual
gdp_monthly = gdp.set_index("date").resample("MS").ffill().reset_index()
total_credit_monthly = total_credit.set_index("date").resample("MS").ffill().reset_index()

# Merge usando las versiones mensuales
macro_df = inflation.merge(gdp_monthly, on="date", how="left")
macro_df = macro_df.merge(unemployment, on="date", how="left")
macro_df = macro_df.merge(fedfunds, on="date", how="left")
macro_df = macro_df.merge(pce, on="date", how="left")
macro_df = macro_df.merge(total_credit_monthly, on="date", how="left")

# Finalmente alinear fechas y forward-fill por si queda algÃºn NaN
macro_df = macro_df.set_index("date").resample("MS").ffill().reset_index()

print(macro_df.head(12))


         date  inflation_cpi        gdp  unemployment_rate  fed_funds_rate  \
0  2010-01-01        217.488  14764.610                9.8            0.11   
1  2010-02-01        217.281  14764.610                9.8            0.13   
2  2010-03-01        217.353  14764.610                9.9            0.16   
3  2010-04-01        217.403  14980.193                9.9            0.20   
4  2010-05-01        217.290  14980.193                9.6            0.20   
5  2010-06-01        217.199  14980.193                9.4            0.18   
6  2010-07-01        217.605  15141.607                9.4            0.18   
7  2010-08-01        217.923  15141.607                9.5            0.19   
8  2010-09-01        218.275  15141.607                9.5            0.19   
9  2010-10-01        219.035  15309.474                9.4            0.19   
10 2010-11-01        219.590  15309.474                9.8            0.19   
11 2010-12-01        220.472  15309.474                9.3      

In [None]:
db_path = "/workspaces/final_project_creditscoring/Data/credit_scoring.db"  # O donde tengas tu DB
conn = sqlite3.connect(db_path)

# Guardar el DataFrame como tabla 'macro_data', reemplazando si ya existe
macro_df.to_sql("macro_data", conn, if_exists="replace", index=False)

# Cerrar la conexiÃ³n
conn.close()

print("Macro data saved correctly in credit_scoring.db -> table 'macro_data'")

Macro data guardada correctamente en credit_scoring.db -> tabla 'macro_data'
