## Economic Indicators

In my analysis, I collect a significant portion of economic and financial indicators from the Federal Reserve Bank of St. Louis (FRED) website. However, it's important to note that FRED imposes restrictions on the amount of data one can retrieve with each request. To address this limitation, I've developed a custom function that allows me to obtain the required data for each indicator efficiently and effectively, ensuring I have access to the necessary information for a comprehensive analysis.

In [17]:
import requests
import pandas as pd
import time

Using the FRED API is straightforward and user-friendly. Simply request an API key through this [link](https://fred.stlouisfed.org/docs/api/api_key.html), and you'll gain access to their data. Moreover, FRED offers multiple API wrappers, readily available through the [link](https://fred.stlouisfed.org/docs/api/fred/), which can simplify data retrieval without the necessity of developing custom functions. However, in my case, I found that crafting my own function aligns better with my specific objectives, allowing me to tailor the data retrieval process to suit the needs of my analysis more effectively. 

I am grateful to Christina Levengood for her insightful [post](https://lvngd.com/blog/fred-api-python/) on accessing FRED API with Python, which greatly assisted me in understanding the API and  to develop my custom function for data retrieval.


In [18]:
from credentials import fred_api_key 

In [19]:
api_key = fred_api_key

In [20]:
def fred(series_id, api_key, date_list):
    """
    Retrieve economic data from the Federal Reserve Economic Data (FRED) API and return it as a DataFrame.

    Parameters:
        series_id (str): The unique identifier of the economic series on FRED.
        api_key (str): Your personal API key to access FRED API. Get it from https://fred.stlouisfed.org/.
        date_list (list): A list of tuples containing start and end dates for data retrieval. 
                          Each tuple should be in the format ('YYYY-MM-DD', 'YYYY-MM-DD').

    Returns:
        pandas.DataFrame: A DataFrame containing the retrieved economic data, indexed by date.

    Raises:
        ValueError: If the API response status code is not 200.
    """
    
    data_frames = []  # List to store individual DataFrames for each date range
    for start_date, end_date in date_list:
        params = {
            'series_id': series_id,
            'api_key': api_key,
            'file_type': 'json',
            'limit': 2000,
            'observation_start': start_date,
            'observation_end': end_date
        }
        
        endpoint = 'https://api.stlouisfed.org/fred/series/observations'
        response = requests.get(endpoint, params=params)

        if response.status_code == 200:
            data = response.json()  # Retrieve the JSON data from the API response

        else:
            raise ValueError(f"Error: {response.status_code} - {response.text}")  # Raise an error if API request is unsuccessful
            
        df = pd.DataFrame(data['observations'])  # Convert the JSON data to a DataFrame
        data_frames.append(df)  # Append the DataFrame for the current date range to the list
        
    if data_frames:
        data = pd.concat(data_frames, axis=0, ignore_index=True)  # Combine all DataFrames into a single DataFrame
    else:
        data = pd.DataFrame()  # If no data retrieved, return an empty DataFrame
        
    data.drop(columns=['realtime_start', 'realtime_end'], inplace=True)  # Drop irrelevant columns
    data['date'] = pd.to_datetime(data['date'])  # Convert 'date' column to datetime format

    data['value'] = pd.to_numeric(data['value'], errors='coerce')  # Convert 'value' column to numeric data type
    data.set_index('date', inplace=True)  # Set 'date' as the index
    data.rename(columns={'value': f'{series_id}'}, inplace=True)  # Rename 'value' column to the series_id for clarity

    return data  # Return the final DataFrame containing the retrieved economic data


Due to FRED's daily limit of 2000 rows for many data series, I have implemented a solution to overcome this restriction. To achieve this, I split the data into smaller ranges, allowing me to collect all the required data from a series without exceeding the daily limit.

In [21]:
date_list = [
    ('1993-01-01', '1998-12-31'), ('1999-01-01', '2004-12-31'),
    ('2005-01-01', '2010-12-31'), ('2011-01-01', '2016-12-31'),
    ('2017-01-01', '2022-12-31'), ('2023-01-01', '2029-12-31'),
] 

Next, I determine the indicators that I specifically want to have in my sample. There are many many useful indicators in FRED. The following are the ones I found specially useful for this project:

- Federal Fund Effective Rate (ffr): "The federal funds rate is the interest rate at which depository institutions trade federal funds (balances held at Federal Reserve Banks) with each other overnight. When a depository institution has surplus balances in its reserve account, it lends to other banks in need of larger balances." ([link](https://fred.stlouisfed.org/series/DFF))
- Unemployment Rate (unemployment_rate): "The unemployment rate represents the number of unemployed as a percentage of the labor force. Labor force data are restricted to people 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces." ([link](https://fred.stlouisfed.org/series/UNRATE))

- Median Consumer Price Index (median_cpi): "Median Consumer Price Index (CPI) is a measure of core inflation calculated the Federal Reserve Bank of Cleveland and the Ohio State University. Median CPI was created as a different way to get a 'Core CPI' measure, or a better measure of underlying inflation trends." ([link](https://fred.stlouisfed.org/series/MEDCPIM158SFRBCLE))

- Retail Sales: Retail Trade and Food Services (retail_sales): "The most recent month's value of the advance estimate based on data from a subsample of firms" ([link](https://fred.stlouisfed.org/series/MRTSSM44X72USS))

- 10-year Government Bond Yields (10_year_treasury_yeild): "The return on capital invested in 10-year treasury bonds" ([link](https://fred.stlouisfed.org/series/T10Y2Y))

- 10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity : "The spread between 10-Year Treasury Constant Maturity and 3-Month Treasury Constant Maturity" ([link](https://fred.stlouisfed.org/series/T10Y3M))

- CBOE Volatility Index (vix): "VIX measures market expectation of near term volatility conveyed by stock index option prices." ([link](https://fred.stlouisfed.org/series/VIXCLS))

- Chinese Yuan Renminbi to U.S. Dollar Spot Exchange Rate (us_china_exchange_rate): "Noon buying rates in New York City for cable transfers payable in foreign currencies." ([link](https://fred.stlouisfed.org/series/DEXCHUS))

- Japanese Yen to U.S. Dollar Spot Exchange Rate (us_japan_exchange_rate): "Noon buying rates in New York City for cable transfers payable in foreign currencies." ([link](https://fred.stlouisfed.org/series/DEXJPUS))

- QoQ Real Growth in GDP (growth): "Quarterly growth in real growth domestic product (GDP)" (calculated by author using real GDP data from [FRED](https://fred.stlouisfed.org/series/GDPC1)) 

- MoM Inflation (inflation): "Monthly inflation" (calculated by author using data on consumer price index from [FRED](https://fred.stlouisfed.org/series/CPIAUCSL)) 

In [22]:
start_time = time.time()

# List of unique identifiers for various economic data series from FRED
series_id_list = ['DFF', 'UNRATE', 'MEDCPIM158SFRBCLE', 'MRTSSM44X72USS', 'IRLTLT01USM156N', 
                  'T10Y3M', 'VIXCLS', 'DEXCHUS', 'DEXJPUS', 'SAHMREALTIME']

# List of corresponding variable names for the economic data series
variable_names = ['ffr', 'unemployment_rate', 'median_cpi', 'retail_sales', '10_year_treasury_yeild', 
                  '10year_3month_yield_spread', 'vix', 'us_china_exchange_rate', 'us_japan_exchange_rate', 'sahm']

# Retrieve data for each series using the 'fred' function and store the resulting DataFrames in a list
df_list = [fred(series_id, api_key, date_list) for series_id in series_id_list]

# Combine all DataFrames into a single DataFrame, using the economic series identifiers as column names
economic_data = pd.concat(df_list, axis=1)

# Rename the columns of the DataFrame to use descriptive variable names
economic_data.rename(columns={k:v for k,v in zip(series_id_list, variable_names)}, inplace=True)

end_time = time.time()
execution_time = end_time - start_time
print("Execution time:", execution_time, "seconds")

Execution time: 21.197555780410767 seconds


In [23]:
economic_data.head(3)

Unnamed: 0_level_0,ffr,unemployment_rate,median_cpi,retail_sales,10_year_treasury_yeild,10year_3month_yield_spread,vix,us_china_exchange_rate,us_japan_exchange_rate,sahm
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1993-01-01,2.66,7.3,3.442924,175108.0,6.6,,,,,0.2
1993-01-02,2.66,,,,,,,,,
1993-01-03,2.66,,,,,,,,,


In [36]:
gdp = fred('GDPC1', api_key, date_list)
gdp.head()

Unnamed: 0_level_0,GDPC1
date,Unnamed: 1_level_1
1993-01-01,9857.185
1993-04-01,9914.565
1993-07-01,9961.873
1993-10-01,10097.362
1994-01-01,10195.338


#### Calculating real gdp growth

In [37]:
# Create a new column 'lag' in the 'gdp' DataFrame, which contains the GDP values shifted by one position (previous value)
gdp['lag'] = gdp['GDPC1'].shift()

# Calculate the GDP growth rate and store it in a new column 'growth'
# The growth rate is calculated as the percentage change between the current GDP value and the lagged (previous) GDP value
# (Current GDP - Previous GDP) / Previous GDP * 100
gdp['growth'] = (gdp['GDPC1'] - gdp['lag']) / gdp['lag'] * 100
gdp.rename(columns={'GDPC1':'real_gdp'}, inplace=True)

gdp.head()

Unnamed: 0_level_0,real_gdp,lag,growth
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1993-01-01,9857.185,,
1993-04-01,9914.565,9857.185,0.582113
1993-07-01,9961.873,9914.565,0.477157
1993-10-01,10097.362,9961.873,1.360076
1994-01-01,10195.338,10097.362,0.970313


In [30]:
gdp[gdp['growth']<0]

Unnamed: 0_level_0,GDPC1,lag,growth
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2001-01-01,13219.251,13262.25,-0.324221
2001-07-01,13248.142,13301.394,-0.400349
2008-01-01,15702.906,15767.146,-0.407429
2008-07-01,15709.562,15792.773,-0.526893
2008-10-01,15366.607,15709.562,-2.183097
2009-01-01,15187.475,15366.607,-1.165723
2009-04-01,15161.772,15187.475,-0.169238
2011-01-01,15769.911,15807.995,-0.240916
2011-07-01,15870.684,15876.839,-0.038767
2014-01-01,16654.247,16712.76,-0.35011


In [39]:
# Retrieve economic data for the Consumer Price Index for All Urban Consumers (CPIAUCSL) using the 'fred' function
cpi = fred('CPIAUCSL', api_key, date_list)

# Create a new column 'lag' in the 'cpi' DataFrame, which contains the CPI values shifted by one position (previous value)
cpi['lag'] = cpi['CPIAUCSL'].shift()

# Calculate the inflation rate and store it in a new column 'inflation'
# The inflation rate is calculated as the percentage change between the current CPI value and the lagged (previous) CPI value
# (Current CPI - Previous CPI) / Previous CPI * 100
cpi['inflation'] = (cpi['CPIAUCSL'] - cpi['lag']) / cpi['lag'] * 100
cpi.rename(columns={'CPIAUCSL':'cpi'}, inplace=True)

cpi.head()


Unnamed: 0_level_0,cpi,lag,inflation
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1993-01-01,142.8,,
1993-02-01,143.1,142.8,0.210084
1993-03-01,143.3,143.1,0.139762
1993-04-01,143.8,143.3,0.348918
1993-05-01,144.2,143.8,0.278164


In [41]:
economic_data = pd.concat([economic_data, cpi[['cpi', 'inflation']], gdp[['real_gdp', 'growth']]], axis=1)

In [42]:
economic_data.head(3)

Unnamed: 0_level_0,ffr,unemployment_rate,median_cpi,retail_sales,10_year_treasury_yeild,10year_3month_yield_spread,vix,us_china_exchange_rate,us_japan_exchange_rate,sahm,inflation,growth,cpi,inflation,real_gdp,growth
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1993-01-01,2.66,7.3,3.442924,175108.0,6.6,,,,,0.2,,,142.8,,9857.185,
1993-01-02,2.66,,,,,,,,,,,,,,,
1993-01-03,2.66,,,,,,,,,,,,,,,


In [43]:
economic_data.to_pickle('../data/fred.pkl')