In [2]:
import requests
import pandas as pd
import sqlite3
import os
from datetime import datetime

ðŸ“Š Macroeconomic Indicators (U.S.)
To enrich the credit scoring model and capture the economic context at the time loans were issued, official macroeconomic indicators from the United States were incorporated.

The following variables were used:

- Inflation (CPI): Consumer Price Index, monthly frequency. It reflects changes in the general price level. This allows the model to adjust credit risk by accounting for the loss of borrowersâ€™ purchasing power.

- Gross Domestic Product (GDP): Quarterly frequency (resampled to monthly using forward fill). It indicates the overall economic health of the country. Sustained economic growth is typically associated with lower credit risk.

- Unemployment Rate: Monthly frequency. Measures the proportion of the active population that is unemployed. Higher unemployment levels increase the probability of loan default.

- Federal Funds Rate (FEDFUNDS): Monthly frequency. Represents the cost of money in the economy and influences loan interest rates. It helps model how changes in monetary policy affect borrowersâ€™ repayment capacity.

- ersonal Consumption Expenditures (PCE): Monthly frequency. Measures household spending on goods and services, adjusted for inflation. It is an indicator of internal demand and consumer solvency.

- Total Consumer Credit (TOTALSL): Quarterly frequency (resampled to monthly using forward fill). Reflects household indebtedness. High levels of total consumer credit may indicate increased exposure to default risk.

Data Source

The data were obtained through the FRED (Federal Reserve Economic Data) API, managed by the Federal Reserve Bank of St. Louis, which provides official and up-to-date economic time series from the United States government.

Data Collection Process

- Access to the FRED API was established using a personal API key.

- Historical time series corresponding to the analysis period of the Lending Club dataset were downloaded.

- The data were processed and transformed into tabular format, adapting all series to a monthly frequency when necessary (using forward fill for quarterly variables).

- Finally, the macroeconomic indicators were integrated into the project to be later joined with loan-level data based on the loan issue date.

Justification for Inclusion

Incorporating these variables allows the model to account for the overall economic environment, capturing external factors that influence borrowersâ€™ repayment capacity and financial behavior. This improves both the explanatory power and predictive performance of the credit risk analysis.

In [2]:
api_key = os.getenv("FRED_API_KEY") # para obtener la informacion tenemos que aÃ±adir nuestra api_key 

We confirm from which years we will take data, according to the main DF which is the one we handle in the EDA, we have that the initial year is 2009 and the final year is 2018


In [3]:
API_KEY = os.getenv("FRED_API_KEY")
BASE_URL = "https://api.stlouisfed.org/fred/series/observations"
conn = sqlite3.connect("/workspaces/final_project_creditscoring/Data/credit_scoring.db")

# FunciÃ³n para descargar series econÃ³micas desde la API de FRED (Banco de la Reserva Federal de St. Louis)
# y convertirlas en DataFrames de pandas con columnas 'date' y 'value'.

def get_fred_series(series_id, start_date="2009-01-01"):
    params = {
        "series_id": series_id,
        "api_key": API_KEY,
        "file_type": "json",
        "observation_start": start_date
    }
    response = requests.get(BASE_URL, params=params)
    data = response.json()["observations"]
    
    df = pd.DataFrame(data)
    df["date"] = pd.to_datetime(df["date"])
    df["value"] = pd.to_numeric(df["value"], errors="coerce")
    
    return df[["date", "value"]]

# Descargamos las series

inflation = get_fred_series("CPIAUCSL")
inflation.rename(columns={"value": "inflation_cpi"}, inplace=True)

gdp = get_fred_series("GDP")
gdp.rename(columns={"value": "gdp"}, inplace=True)

unemployment = get_fred_series("UNRATE")
unemployment.rename(columns={"value": "unemployment_rate"}, inplace=True)

fedfunds = get_fred_series("FEDFUNDS")
fedfunds.rename(columns={"value": "fed_funds_rate"}, inplace=True)

pce = get_fred_series("PCE")
pce.rename(columns={"value": "pce"}, inplace=True)

total_credit = get_fred_series("TOTALSL")
total_credit.rename(columns={"value": "total_consumer_credit"}, inplace=True)

# GDP y Total Credit resampleados a mensual
gdp_monthly = gdp.set_index("date").resample("MS").ffill().reset_index()
total_credit_monthly = total_credit.set_index("date").resample("MS").ffill().reset_index()

# Merge usando las versiones mensuales
macro_df = inflation.merge(gdp_monthly, on="date", how="left")
macro_df = macro_df.merge(unemployment, on="date", how="left")
macro_df = macro_df.merge(fedfunds, on="date", how="left")
macro_df = macro_df.merge(pce, on="date", how="left")
macro_df = macro_df.merge(total_credit_monthly, on="date", how="left")

# Finalmente alinear fechas y forward-fill por si queda algÃºn NaN
macro_df = macro_df.set_index("date").resample("MS").ffill().reset_index()

print(macro_df.head(12))


         date  inflation_cpi        gdp  unemployment_rate  fed_funds_rate  \
0  2009-01-01        211.933  14430.902                7.8            0.15   
1  2009-02-01        212.705  14430.902                8.3            0.22   
2  2009-03-01        212.495  14430.902                8.7            0.18   
3  2009-04-01        212.709  14381.236                9.0            0.15   
4  2009-05-01        213.022  14381.236                9.4            0.18   
5  2009-06-01        214.790  14381.236                9.5            0.21   
6  2009-07-01        214.726  14448.882                9.5            0.16   
7  2009-08-01        215.445  14448.882                9.6            0.16   
8  2009-09-01        215.861  14448.882                9.8            0.15   
9  2009-10-01        216.509  14651.249               10.0            0.12   
10 2009-11-01        217.234  14651.249                9.9            0.12   
11 2009-12-01        217.347  14651.249                9.9      

In [4]:
macro_df.isnull().sum() # Revisar si hay valores nulos en el DataFrame

date                     0
inflation_cpi            1
gdp                      5
unemployment_rate        1
fed_funds_rate           0
pce                      1
total_consumer_credit    1
dtype: int64

In [5]:
macro_df[macro_df.isna().any(axis=1)]  # Mostrar filas con valores nulos, si las hay

Unnamed: 0,date,inflation_cpi,gdp,unemployment_rate,fed_funds_rate,pce,total_consumer_credit
199,2025-08-01,323.364,,4.3,4.33,21123.8,5059896.38
200,2025-09-01,324.368,,4.4,4.22,21202.4,5071365.99
201,2025-10-01,,,,4.09,21301.0,5080601.87
202,2025-11-01,325.031,,4.5,3.88,21409.7,5084831.24
203,2025-12-01,326.03,,4.4,3.72,,


We can eliminate the null values â€‹â€‹since these are from 2025 and the values â€‹â€‹we will need will be up to 2018.


In [6]:
macro_df=macro_df.dropna()  # Eliminar filas con valores nulos

In [7]:
db_path = "/workspaces/final_project_creditscoring/Data/credit_scoring.db" 
conn = sqlite3.connect(db_path)

# Guardar el DataFrame como tabla 'macro_data', reemplazando si ya existe
macro_df.to_sql("macro_data", conn, if_exists="replace", index=False)

print("Macro data saved correctly in credit_scoring.db -> table 'macro_data'")

Macro data saved correctly in credit_scoring.db -> table 'macro_data'


## unificamos las tablas 'macro_data' con 'main_table

In [8]:

# Leer la tabla main_table
main_table = pd.read_sql_query("SELECT * FROM main_table", conn)

# Convertir issue_d a datetime
main_table['issue_d'] = pd.to_datetime(main_table['issue_d'])


# Hacemos merge de main_table con macro_df
merged_df = main_table.merge(
    macro_df, 
    left_on='issue_d', 
    right_on='date', 
    how='left'
)

# eliminamos la columna duplicada 'date' de macro_df
merged_df.drop(columns=['date'], inplace=True)

# Guardar el DataFrame resultante en la base de datos
merged_df.to_sql("main_table", conn, if_exists="replace", index=False)

# Cerrar conexiÃ³n
conn.close()
