# Data Gathering and EDA - Monthly Timeframes

This notebook deals exclusively with the series that have monthly time scales, which is our target time scale.

In [4]:
# Standard Library Modules
import json
import os
import requests

# External Modules
import wandb

# Custom Modules
from src.utilities import new_logger

In [2]:
# Start the logging object
logger = new_logger('monthly_series_eda', '../logs/eda', redirect_streams=False)

In [2]:
# Start the Weights & Biases run
run = wandb.init(
    project="wgu_capstone",
    job_type="data_gathering_cleaning",
    group="eda",
    save_code=True
)

[34m[1mwandb[0m: Currently logged in as: [33mdallas-taylor96[0m ([33mdallas-taylor96-western-governors-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [5]:
# Storing a JSON configuration file at the root of the project for EDA
# will use the config.yaml with Hydra during the actual ML pipeline
config_path = '../fred_api.conf'

# API Configuration
if os.path.exists(f'../{config_path}'):
    logger.debug(f"Discovered {config_path}, attempting to read...")
    with open(f'../{config_path}', 'r') as json_fp:
        logger.debug(f"Opened {config_path}, attempting to load.")
        config = json.load(json_fp)
        logger.debug(f"Loaded {config_path}, checking attributes...")
    if not(isinstance(config['api_uri'], str) and len(config['api_uri']) > 0):
        logger.error(f"The JSON config is missing the attribute 'api_uri', please make sure it exists.")
    elif not(isinstance(config['api_key'], str) and len(config['api_key']) > 0):
        logger.error(f"The JSON config is missing the attribute 'api_key', please make sure it exists.")
    else:
        logger.info(f"All attributes found, you may continue.")
else:
    logger.error(f"Could not find {config_path}, make sure it exists before continuing.")

NameError: name 'logger' is not defined

## Monthly Series

The monthly series are going to be the easiest to deal with. For each series, the goal is to bring down the original response and save it as a Feather file under its series name.

If the Feather file is older than 30 days, the data will be refreshed. Otherwise, the feather file on disk will be used. This is to reduce the number of times the API is called.

In [None]:
monthly_series = [
    'HSN1FNSA',
    'PERMIT1NSA',
    'HOUST1FNSA',
    'UNDCON1UNSA',
    'COMPU1UNSA',
    'ACTLISCOUUS',
    'NEWLISCOUUS',
    'MEDDAYONMARUS',
    'MNMFS',
    'EXSFHSUSM495N',
    'HOSINVUSM495N',
    'MSACSRNSA',
    'PRRESCON',
    'WPU80',
    'PPIACO',
    'WPU101',
    'WPU102',
    'WPU081',
    'WPU139902094',
    'FMNHSHPSIUS',
    'FIXHAI',
    'UNRATE',
    'ADPMINDCONNERNSA',
    'ADPMNUSNERNSA',
    'CSUSHPINSA',
    'UMCSENT',
    'CUUR0000SEHA'
]

## Weekly and Daily Series

We can use the FRED frequency aggregation feature to convert higher frequency data series into lower frequency data series.

This is done with the `aggregation_method` parameter in the API request.

In [None]:
hf_series = [
    'MORTGAGE30US',
    'MORTGAGE15US',
    'OBMMIVA30YF',
    'OBMMIJUMBO30YF',
    'OBMMIFHA30YF',
    'OBMMIC30YF',
    'OBMMIUSDA30YF',
    'OBMMIC30YFNA',
    'OBMMIC15YF'
]