# Data Gathering and EDA - Monthly Timeframes

This notebook deals exclusively with the series that have monthly time scales, which is our target time scale.

In [1]:
# Standard Library Modules
import csv
import json
import logging
import os
from pathlib import Path
import time

# Pip Modules
import pandas as pd
import requests

# Custom Modules
from src.utilities import new_logger, fetch_with_cache

In [2]:
# Start the logging object
logger = new_logger('monthly_series_eda', 'logs/eda')

In [3]:
# Storing a JSON configuration file at the root of the project for EDA
# will use the config.yaml with Hydra during the actual ML pipeline
config_path = Path('fred_api.conf')
abs_config_path = config_path.resolve()

# API Configuration
if os.path.exists(abs_config_path):
    logger.debug(f"Discovered {abs_config_path}, attempting to read...")
    with open(abs_config_path, 'r') as json_fp:
        logger.debug(f"Opened {abs_config_path}, attempting to load.")
        config = json.load(json_fp)
        logger.debug(f"Loaded {abs_config_path}, checking attributes...")
    if not(isinstance(config['api_uri'], str) and len(config['api_uri']) > 0):
        logger.error(f"The JSON config is missing the attribute 'api_uri', please make sure it exists.")
    elif not(isinstance(config['api_key'], str) and len(config['api_key']) > 0):
        logger.error(f"The JSON config is missing the attribute 'api_key', please make sure it exists.")
    else:
        logger.info(f"All attributes found, you may continue.")
else:
    logger.error(f"Could not find {abs_config_path}, make sure it exists before continuing.")

## Monthly Series

The monthly series are going to be the easiest to deal with. For each series, the goal is to bring down the original response and save it as a Feather file under its series name.

If the Feather file is older than 30 days, the data will be refreshed. Otherwise, the feather file on disk will be used. This is to reduce the number of times the API is called.

In [4]:
monthly_series = [
    'HSN1FNSA',
    'PERMIT1NSA',
    'HOUST1FNSA',
    'UNDCON1UNSA',
    'COMPU1UNSA',
    'ACTLISCOUUS',
    'NEWLISCOUUS',
    'MEDDAYONMARUS',
    'MNMFS',
    'EXSFHSUSM495N',
    'HOSINVUSM495N',
    'MSACSRNSA',
    'PRRESCON',
    'WPU80',
    'PPIACO',
    'WPU101',
    'WPU102',
    'WPU081',
    'WPU139902094',
    'FMNHSHPSIUS',
    'FIXHAI',
    'UNRATE',
    'ADPMINDCONNERNSA',
    'ADPMNUSNERNSA',
    'CSUSHPINSA',
    'UMCSENT',
    'CUUR0000SEHA'
]

The approach here is relatively straightforward:

1. Make an API GET request for the series in question.
2. Save the resulting JSON response in CSV format under data/orig/
3. Parse the JSON response object:
    1. Index can be found in `resp['observations'][i]['date']`, where i is the index of the observation
    2. Value can be found in `resp['observations'][i]['value']`, where i is the index of the observation
4. Create a DataFrame, convert the `date` column to a `np.datetime64[ns]` and make it the index.

In [5]:
# proof of concept using a single series
series_id = 'HSN1FNSA'
request_uri = f"{config['api_uri']}?series_id={series_id}&api_key={config['api_key']}&file_type=json"

series_df = fetch_with_cache(series_id=series_id, request_uri=request_uri, logger=logger, dest="data/orig")

## Weekly and Daily Series

We can use the FRED frequency aggregation feature to convert higher frequency data series into lower frequency data series.

This is done with the `aggregation_method` parameter in the API request.

In [None]:
hf_series = [
    'MORTGAGE30US',
    'MORTGAGE15US',
    'OBMMIVA30YF',
    'OBMMIJUMBO30YF',
    'OBMMIFHA30YF',
    'OBMMIC30YF',
    'OBMMIUSDA30YF',
    'OBMMIC30YFNA',
    'OBMMIC15YF'
]