# GIS Analysis of Traffic Volume and Economic Indicators

# API DATA DOWNLOAD MODULE


- [Prabin Raj Shrestha](https://prbn.info/)
- Arunava Das
- Heeyoon Shin

\\

<a href="https://colab.research.google.com/drive/ivwRw6czRWyseeRZu_mgAhDBPwZX9Za7" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

In [None]:
!pip install full-fred
!pip install gdown

Collecting full-fred
  Downloading full_fred-0.0.9a3-py3-none-any.whl (47 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.6/47.6 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: full-fred
Successfully installed full-fred-0.0.9a3


In [None]:
import requests
import json
from full_fred.fred import Fred
from google.colab import files
import pandas as pd
from tqdm import tqdm  # Import tqdm library

The below API classes leverages full_fred library to download data.

full_fred is a Python interface to FRED (Federal Reserve Economic Data). full_fred's API translates to Python every type of request FRED supports: each query for Categories, Releases, Series, Sources, and Tags found within FRED's web service has a method associated with it in full_fred.

Reference:
https://pypi.org/project/full-fred/


### API Key
Queries to FRED web service require an API key. FRED has [free API keys available with an account (also free)](https://research.stlouisfed.org/useraccount/apikey)

link: https://research.stlouisfed.org/useraccount/apikey

make at least 3 API Keys and add them to the list below

In [None]:
API_Key_l = ['<api key1>', '<api key2>', '<api key3>']

In [None]:
API_Key_l = ['701c55b555aa0f18e52671ea83e921f2', 'e5adce57ef7bf3ff207fbd31db23d780', '1093c288fe21ec41a70ce5ecee356208', 'edbca83a6016de8a386d1e349cd8f958', '3c96fe0d4719f1cca8f1f6e0dbd7255b']

In [None]:
# @title API manager

from full_fred.fred import Fred  # Import Fred class from full_fred package

class api_manager():
  def __init__(self, api_l: list):
    # Initialize api_manager with a list of API keys
    self._update_api_l(api_l)
    # Set the filename for storing the API key
    self.api_fn = 'api_key.txt'

  def _update_api_l(self, api_l: list):
    # Update list of API keys and set counters
    self.api_l = api_l
    self.n_api = len(api_l)
    self.n = 0
    self.cycle_api()

  def n_up(self):
    # Increment counter for cycling through API keys
    self.n += 1
    # Reset counter to 0 if it reaches the end of the list
    if self.n == self.n_api:
      self.n = 0

  def get_api(self, n = None):
    # Get API key by index, or return current API key if index is None
    if n is None:
      return self.api
    else:
      return self.api_l[n]

  def update_api_f(self):
    # Update the API key file with the current API key
    api = self.get_api()
    with open(self.api_fn, 'w') as file:
      file.write(api)
    return self.api_fn

  def cycle_api(self):
    # Cycle to the next API key and return it
    self.api = self.get_api(self.n)
    self.n_up()
    return (self.api)

  def cycle_api_file(self, ):
    # Cycle to the next API key and update the API key file
    self.cycle_api()
    return self.update_api_f()

  def update_fred(self):
    # Update Fred object with the current API key from the API key file
    return Fred(self.update_api_f())

  def cycle_fred(self):
    # Cycle to the next API key and update Fred object
    return Fred(self.cycle_api_file())


In [None]:
# @title Data Download Object

class fred_df_manager:
  def __init__(self, config_dict):
    # Initialize config_dict
    self.config_dict = {}
    # Update config_dict with provided configuration
    self.update_config(config_dict)

  def update_config(self, config_dict):
    # Check if anything import ant is missing
    check_l = ['api_url', 'date', 'series_id', 'metric_name', 'api_manager']
    missing_keys = [key for key in check_l if key not in list(self.config_dict.keys()) + list(config_dict.keys())]

    # If missing keys are found, raise ValueError
    if len(missing_keys) != 0:
      for key in missing_keys:
        print(f'Missing: {key} from config')
      raise ValueError('Missing keys in config')

    # Update config
    for key in config_dict:
      self.config_dict[key] = config_dict[key]

    # URL to get series id
    self.api_url_s = self.config_dict['api_url']
    # Date
    self.date = self.config_dict['date']
    # Series
    self.series_id = self.config_dict['series_id']
    # Metric Name
    self.metric_name = self.config_dict['metric_name']
    # api_manager
    self.api_manager = self.config_dict['api_manager']

  def api_json(self, url_s, payload):
    # Send API request and return JSON response
    r = requests.get(url_s, params = payload)
    return r.json()


  def get_info(self):
    # Payload
    payload = {'api_key': self.api_manager.get_api()
              , 'file_type': 'json'
              , 'series_id': self.series_id
              , 'date': self.date}

    # info dictionary
    info_dict = self.api_json(self.api_url_s, payload)['meta']
    for key in info_dict.keys():
      v = info_dict[key]
      if type(v) in [str, int, float]:
        print(f'{key}: {v}')
    return info_dict

  def get_data_fred(self, sid, tries: int = 3):
    # Attempt to download data with specified number of tries
    for n_try in range(tries):
      try:
        # Get series data DataFrame
        data_df = self.fred.get_series_df(sid)
        return data_df
      except:
        # Cycle to next API key if download fails
        self.fred = self.api_manager.cycle_fred()
        print(f"Switching Keys: {n_try}")
        pass
    raise ValueError('Unable to download data')

  def get_data(self, sid, tries: int = 3):
    # Update fred attribute and return data
    self.fred = self.api_manager.update_fred()
    return self.get_data_fred(sid, tries)

  def download_data(self, rename_dict= None, tries: int = 3):
    # Retrieve information dictionary
    self.info_dict = self.get_info()
    self.fred = self.api_manager.update_fred()

    # Define column list for DataFrame
    col_l = ['date', 'region', self.metric_name]

    # Define dictionary to rename columns in DataFrame
    rename_dict = {} if rename_dict is None else rename_dict

    # Initialize list to store DataFrames
    data_l = []

    # Get the total number of iterations
    total_iterations = len(self.info_dict['data'][self.date])

    # Iterate over data in information dictionary with tqdm for progress bar
    # for data_dict in self.info_dict['data'][self.date]:
    for data_dict in tqdm(self.info_dict['data'][self.date], total=total_iterations, desc='Downloading data'):
      # Download data using series_id and specified number of tries
      data_df = self.get_data_fred(data_dict['series_id'], tries)

      # Rename columns in DataFrame
      data_df = data_df.rename(columns=rename_dict)

      # Add additional information from data_dict to DataFrame
      for key in data_dict.keys():
          data_df[key] = data_dict[key]

      # Append DataFrame to list
      data_l.append(data_df)

    # Concatenate the list of DataFrames into a single DataFrame
    combined_df = pd.concat(data_l, ignore_index=True)

    # return
    return combined_df




**Class api_manager**

*Methods:*

`__init__(api_l: list):`
- Constructor taking a list of API keys (`api_l`) as input.
- Calls `_update_api_l` method to update the list of API keys and set counters.
- Sets the filename `api_fn` to 'api_key.txt' for storing the current API key.

`_update_api_l(api_l: list):`
- Updates the list of API keys (`self.api_l`) with the provided `api_l`.
- Sets the number of API keys (`self.n_api`) and initializes the counter (`self.n`) to 0.
- Calls `cycle_api` method to set the initial API key.

#### `n_up():`
- Increments the counter (`self.n`) for cycling through the API keys.
- If the counter reaches the end of the list (`self.n == self.n_api`), resets the counter to 0.

#### `get_api(n=None):`
- Returns the API key at the specified index `n` from the list `self.api_l`.
- If `n` is None, returns the current API key (`self.api`).

`update_api_f():`
- Updates the `api_key.txt` file with the current API key (`self.api`).
- Opens the file in write mode and writes the current API key to it.
- Returns the filename (`self.api_fn`).

`cycle_api():`
- Cycles to the next API key in the list.
- Calls `get_api` with the current counter value (`self.n`) to get the next API key.
- Calls `n_up` to increment the counter for the next cycle.
- Returns the new API key (`self.api`).

`cycle_api_file():`
- Cycles to the next API key and updates the `api_key.txt` file with the new API key.
- Calls `cycle_api` to get the next API key.
- Calls `update_api_f` to update the `api_key.txt` file with the new API key.
- Returns the filename (`self.api_fn`).

`update_fred():`
- Creates a Fred object from the `full_fred` package using the current API key.
- Calls `update_api_f` to update the `api_key.txt` file with the current API key.
- Returns a new Fred object initialized with the updated API key file.

`cycle_fred():`
- Cycles to the next API key and creates a new Fred object with the new API key.
- Calls `cycle_api_file` to cycle to the next API key and update the `api_key.txt` file.
- Returns a new Fred object initialized with the updated API key file.

---

**Class: fred_df_manager**

*Methods:*

`__init__(self, config_dict):`
- Constructor initializing an empty `config_dict` dictionary.
- Calls the `update_config` method with the provided `config_dict`.

`update_config(self, config_dict):`
- Checks if `config_dict` contains required keys: 'api_url', 'date', 'series_id', 'metric_name', and 'api_manager'.
- Raises a `ValueError` if any of these keys are missing.
- Updates `self.config_dict` with the provided `config_dict`.
- Assigns values from `config_dict` to class attributes: `api_url_s`, `date`, `series_id`, `metric_name`, and `api_manager`.

`api_json(self, url_s, payload):`
- Sends an API request to the specified `url_s` with the provided `payload` parameters.
- Returns the JSON response from the API.

`get_info(self):`
- Constructs a payload dictionary with the current API key, file type, series ID, and date.
- Calls the `api_json` method with the `api_url_s` and the constructed payload to retrieve the metadata information.
- Prints the key-value pairs from the metadata information dictionary.
- Returns the metadata information dictionary.

`get_data_fred(self, sid, tries: int = 3):`
- Attempts to download data for the specified series ID (`sid`) using the `fred.get_series_df` method.
- If download fails, cycles to the next API key using `api_manager.cycle_fred()` and retries the download.
- Number of retries controlled by the `tries` parameter (default is 3).
- If download is successful, returns the data DataFrame.
- If download fails after all retries, raises a `ValueError`.

`get_data(self, sid, tries: int = 3):`
- Updates the `fred` attribute with a new Fred object using `api_manager.update_fred()`.
- Calls the `get_data_fred` method with the specified series ID (`sid`) and number of tries (`tries`).
- Returns the data DataFrame.

`download_data(self, rename_dict=None, tries: int = 3):`
- Retrieves the information dictionary by calling `get_info()`.
- Updates the `fred` attribute with a new Fred object using `api_manager.update_fred()`.
- Defines a list of column names (`col_l`) for the resulting DataFrame.
- Initializes an empty dictionary `rename_dict` if none is provided.
- Initializes an empty list `data_l` to store the DataFrames.
- Iterates over the data in the information dictionary using tqdm for a progress bar.
- For each data dictionary, downloads the data using `get_data_fred` with the specified number of tries (`tries`).
- Renames the columns in the data DataFrame using the `rename_dict`.
- Adds additional information from the data dictionary to the DataFrame.
- Appends the DataFrame to the `data_l` list.
- After iterating over all data, concatenates the DataFrames in `data_l` into a single DataFrame `combined_df`.
- Returns the `combined_df`.

The `fred_df_manager` class provides a convenient way to download data from the FRED API by managing API key cycling, handling exceptions, and combining retrieved data into a single pandas DataFrame. It requires a configuration dictionary with specific keys ('api_url', 'date', 'series_id', 'metric_name', and 'api_manager') to function correctly.

---

## Simple Explanation:

The above class does the following:

1. Takes in the parameter.

  These include:
  * url of data
  * Snapshot date
  * the metric series id
  * metric name

2. Retrieve metadata information using FRED API

  As the data needs to be downloaded for each county, the meta data includes the county id for each countys data
  It does the following:
  - Constructs a payload with necessary parameters and sends an API request to the FRED API.
  - The response is parsed into a dictionary containing metadata information about the available data series.

3. Create instance of Fred class with current API key

  As there is a limit on the number of times data can be retrived we would cycle the api token to retrive our data

  These inclue:
  - Cycling the API key if necessary
  - creates a new instance of the Fred class using the updated API key

4. Iterate over metadata information and download data

  It itrates over county in the meta data to retrive the data for each county
  * Iterates over each county series in the meta data.
  * For each county, it:
    - Downloads the data using the Fred instance.
    - Renames columns of the resulting DataFrame using.
    - collects all the retrived data

5. Collates DataFrames into single DataFrame:

  Combines the individual DataFrames into a single DataFrame.


---

## Datasets

**Traffic Volume**
1. Portland Traffic Volume

  Link: https://shorturl.at/aiDJL

2. Virginia State Traffic Volume

  Link: https://shorturl.at/cIS89

3. New York State Traffic Volume

  Link: https://shorturl.at/yzDH4

**Econometric Data**
4. Market Hotness

  Link: https://fred.stlouisfed.org/series/MELIPRCOUNTY6059

5. Equifax Subprime Credit Population

  Link: https://fred.stlouisfed.org/series/EQFXSUBPRIME036061

6. Unemployment Rate

  Link: https://fred.stlouisfed.org/series/UNRATE

7. Annual Population

  Link: https://www.census.gov/popclock/


---

# Unemployment Rate

In [None]:
# Settings
snapshot_date = '2024-02-01'
series_id = 'WVMERC0URN'
metric_name = 'unemployment_rate'


# API Manager
am = api_manager(API_Key_l)

# CONFIG
config_dict = {'api_url': 'https://api.stlouisfed.org/geofred/series/data?'
               , 'date': snapshot_date
               , 'series_id': series_id
               , 'metric_name': metric_name
               , 'api_manager': am
              }

# download manager
dl_manager = fred_df_manager(config_dict)

# download
data_df = dl_manager.download_data({'value': metric_name})
data_df.head()

title: 2024 February Unemployment Rate by County (Percent)
region: county
seasonality: Not Seasonally Adjusted
units: Percent
frequency: Monthly


Downloading data:   4%|▍         | 119/3139 [00:33<13:12,  3.81it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  11%|█         | 344/3139 [01:31<10:39,  4.37it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  18%|█▊        | 572/3139 [02:27<09:41,  4.41it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  25%|██▌       | 797/3139 [03:34<09:40,  4.03it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  32%|███▏      | 1014/3139 [04:30<09:30,  3.72it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  39%|███▉      | 1239/3139 [05:28<08:05,  3.91it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  47%|████▋     | 1474/3139 [06:27<06:13,  4.46it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  55%|█████▍    | 1714/3139 [07:30<06:09,  3.86it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  62%|██████▏   | 1948/3139 [08:29<04:26,  4.47it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  66%|██████▌   | 2068/3139 [08:58<03:59,  4.46it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  70%|██████▉   | 2190/3139 [09:28<03:35,  4.41it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  77%|███████▋  | 2430/3139 [10:27<02:43,  4.33it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  81%|████████  | 2550/3139 [10:57<02:26,  4.02it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  85%|████████▌ | 2672/3139 [11:30<02:04,  3.75it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  93%|█████████▎| 2907/3139 [12:29<01:04,  3.59it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  96%|█████████▋| 3027/3139 [12:58<00:25,  4.35it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data: 100%|██████████| 3139/3139 [13:26<00:00,  3.89it/s]


Unnamed: 0,realtime_start,realtime_end,date,unemployment_rate,region,code,value,series_id
0,2024-04-21,2024-04-21,1990-01-01,1.8,"Manassas Park City, VA",51685,2.5,VAMANA5URN
1,2024-04-21,2024-04-21,1990-02-01,2.0,"Manassas Park City, VA",51685,2.5,VAMANA5URN
2,2024-04-21,2024-04-21,1990-03-01,1.7,"Manassas Park City, VA",51685,2.5,VAMANA5URN
3,2024-04-21,2024-04-21,1990-04-01,2.1,"Manassas Park City, VA",51685,2.5,VAMANA5URN
4,2024-04-21,2024-04-21,1990-05-01,2.0,"Manassas Park City, VA",51685,2.5,VAMANA5URN


In [None]:
# Cleaning
col_l = ['date', 'year', 'quarter', 'month', 'country', 'region', 'state', 'county', 'code', metric_name]

data_df['date'] = pd.to_datetime(data_df['date'])
data_df['year'] = data_df['date'].dt.year
data_df['quarter'] = data_df['date'].dt.quarter
data_df['month'] = data_df['date'].dt.month
data_df['date'] = data_df['date'].dt.date

data_df['country'] = 'USA'
data_df['state'] = data_df['region'].apply(lambda x: str(x).split(',')[-1].strip())
data_df['county'] = data_df['region'].apply(lambda x: str(x).split(',')[0].strip())

unemployment_rate_df = data_df[col_l].drop_duplicates()

unemployment_rate_df.head()

Unnamed: 0,date,year,quarter,month,country,region,state,county,code,unemployment_rate
0,1990-01-01,1990,1,1,USA,"Manassas Park City, VA",VA,Manassas Park City,51685,1.8
1,1990-02-01,1990,1,2,USA,"Manassas Park City, VA",VA,Manassas Park City,51685,2.0
2,1990-03-01,1990,1,3,USA,"Manassas Park City, VA",VA,Manassas Park City,51685,1.7
3,1990-04-01,1990,2,4,USA,"Manassas Park City, VA",VA,Manassas Park City,51685,2.1
4,1990-05-01,1990,2,5,USA,"Manassas Park City, VA",VA,Manassas Park City,51685,2.0


In [None]:
# download data
fn = f'{metric_name}.csv'
unemployment_rate_df.to_csv(fn, index=False)
files.download(fn)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
---
---

In [None]:
# Settings
snapshot_date = '2024-02-01'
series_id = 'MELIPRMMCOUNTY6059'
metric_name = 'market_hotness_prec_change'


# API Manager
am = api_manager(API_Key_l)

# CONFIG
config_dict = {'api_url': 'https://api.stlouisfed.org/geofred/series/data?'
               , 'date': snapshot_date
               , 'series_id': series_id
               , 'metric_name': metric_name
               , 'api_manager': am
              }

# download manager
dl_manager = fred_df_manager(config_dict)

# # test
# data_df = dl_manager.get_data(series_id)


# # download
data_df = dl_manager.download_data({'value': metric_name})

# PREVIEW
data_df.head()

title: 2024 February Market Hotness: Median Listing Price by County (Percent Change)
region: county
seasonality: Not Seasonally Adjusted
units: Percent Change
frequency: Monthly


Downloading data:  20%|██        | 203/995 [00:48<03:02,  4.33it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  32%|███▏      | 323/995 [01:17<02:56,  3.80it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  45%|████▍     | 447/995 [01:48<01:58,  4.62it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  57%|█████▋    | 567/995 [02:16<01:30,  4.71it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  70%|███████   | 701/995 [02:46<01:09,  4.24it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  83%|████████▎ | 821/995 [03:14<00:50,  3.48it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  96%|█████████▋| 960/995 [03:47<00:08,  4.27it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data: 100%|██████████| 995/995 [03:56<00:00,  4.21it/s]


Unnamed: 0,realtime_start,realtime_end,date,market_hotness_prec_change,region,code,value,series_id
0,2024-04-21,2024-04-21,2017-07-01,.,"Cumberland County, ME",23005,0.900901,MELIPRMMCOUNTY23005
1,2024-04-21,2024-04-21,2017-08-01,-0.6941327,"Cumberland County, ME",23005,0.900901,MELIPRMMCOUNTY23005
2,2024-04-21,2024-04-21,2017-09-01,-2.6290166,"Cumberland County, ME",23005,0.900901,MELIPRMMCOUNTY23005
3,2024-04-21,2024-04-21,2017-10-01,-0.1571429,"Cumberland County, ME",23005,0.900901,MELIPRMMCOUNTY23005
4,2024-04-21,2024-04-21,2017-11-01,-1.2734297,"Cumberland County, ME",23005,0.900901,MELIPRMMCOUNTY23005


In [None]:
# Cleaning
col_l = ['date', 'year', 'quarter', 'month', 'country', 'region', 'state', 'county', 'code', metric_name]

data_df['date'] = pd.to_datetime(data_df['date'])
data_df['year'] = data_df['date'].dt.year
data_df['quarter'] = data_df['date'].dt.quarter
data_df['month'] = data_df['date'].dt.month
data_df['date'] = data_df['date'].dt.date

data_df['country'] = 'USA'
data_df['state'] = data_df['region'].apply(lambda x: str(x).split(',')[-1].strip())
data_df['county'] = data_df['region'].apply(lambda x: str(x).split(',')[0].strip())

market_hotness_rate_df = data_df[col_l].drop_duplicates()

market_hotness_rate_df.head()

Unnamed: 0,date,year,quarter,month,country,region,state,county,code,market_hotness_prec_change
0,2017-07-01,2017,3,7,USA,"Cumberland County, ME",ME,Cumberland County,23005,.
1,2017-08-01,2017,3,8,USA,"Cumberland County, ME",ME,Cumberland County,23005,-0.6941327
2,2017-09-01,2017,3,9,USA,"Cumberland County, ME",ME,Cumberland County,23005,-2.6290166
3,2017-10-01,2017,4,10,USA,"Cumberland County, ME",ME,Cumberland County,23005,-0.1571429
4,2017-11-01,2017,4,11,USA,"Cumberland County, ME",ME,Cumberland County,23005,-1.2734297


In [None]:
# download data
fn = f'{metric_name}.csv'
market_hotness_rate_df.to_csv(fn, index=False)
files.download(fn)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
---
---

In [None]:
# Settings
snapshot_date = '2024-02-01'
series_id = 'MELIPRCOUNTY6059'
metric_name = 'market_hotness'


# API Manager
am = api_manager(API_Key_l)

# CONFIG
config_dict = {'api_url': 'https://api.stlouisfed.org/geofred/series/data?'
               , 'date': snapshot_date
               , 'series_id': series_id
               , 'metric_name': metric_name
               , 'api_manager': am
              }

# download manager
dl_manager = fred_df_manager(config_dict)

# # test
# market_hotness_rate_df = dl_manager.get_data(series_id)


# # download
data_df = dl_manager.download_data({'value': metric_name})

# PREVIEW
data_df.head()

title: 2024 February Market Hotness: Median Listing Price by County (U.S. Dollars)
region: county
seasonality: Not Seasonally Adjusted
units: U.S. Dollars
frequency: Monthly


Downloading data:  20%|██        | 203/995 [00:49<03:08,  4.21it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  32%|███▏      | 323/995 [01:20<02:48,  3.99it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  45%|████▍     | 445/995 [01:50<01:59,  4.59it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  57%|█████▋    | 565/995 [02:18<01:38,  4.37it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  70%|██████▉   | 696/995 [02:49<01:16,  3.90it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  71%|███████▏  | 709/995 [02:53<01:08,  4.20it/s]

Switching Keys: 0


Downloading data:  94%|█████████▍| 938/995 [03:51<00:13,  4.28it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data: 100%|██████████| 995/995 [04:07<00:00,  4.02it/s]


Unnamed: 0,realtime_start,realtime_end,date,market_hotness,region,code,value,series_id
0,2024-04-21,2024-04-21,2016-07-01,.,"Sebastian County, AR",5131,279950.0,MELIPRCOUNTY5131
1,2024-04-21,2024-04-21,2016-08-01,.,"Sebastian County, AR",5131,279950.0,MELIPRCOUNTY5131
2,2024-04-21,2024-04-21,2016-09-01,.,"Sebastian County, AR",5131,279950.0,MELIPRCOUNTY5131
3,2024-04-21,2024-04-21,2016-10-01,.,"Sebastian County, AR",5131,279950.0,MELIPRCOUNTY5131
4,2024-04-21,2024-04-21,2016-11-01,.,"Sebastian County, AR",5131,279950.0,MELIPRCOUNTY5131


In [None]:
# Cleaning
col_l = ['date', 'year', 'quarter', 'month', 'country', 'region', 'state', 'county', 'code', metric_name]

data_df['date'] = pd.to_datetime(data_df['date'])
data_df['year'] = data_df['date'].dt.year
data_df['quarter'] = data_df['date'].dt.quarter
data_df['month'] = data_df['date'].dt.month
data_df['date'] = data_df['date'].dt.date

data_df['country'] = 'USA'
data_df['state'] = data_df['region'].apply(lambda x: str(x).split(',')[-1].strip())
data_df['county'] = data_df['region'].apply(lambda x: str(x).split(',')[0].strip())


market_hotness_df = data_df[col_l].drop_duplicates()

market_hotness_df.head()

Unnamed: 0,date,year,quarter,month,country,region,state,county,code,market_hotness
0,2016-07-01,2016,3,7,USA,"Sebastian County, AR",AR,Sebastian County,5131,.
1,2016-08-01,2016,3,8,USA,"Sebastian County, AR",AR,Sebastian County,5131,.
2,2016-09-01,2016,3,9,USA,"Sebastian County, AR",AR,Sebastian County,5131,.
3,2016-10-01,2016,4,10,USA,"Sebastian County, AR",AR,Sebastian County,5131,.
4,2016-11-01,2016,4,11,USA,"Sebastian County, AR",AR,Sebastian County,5131,.


In [None]:
# download data
fn = f'{metric_name}.csv'
market_hotness_df.to_csv(fn, index=False)
files.download(fn)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
---
---

In [None]:
# Settings
snapshot_date = r'2023-01-01'
series_id = 'CASANF0POP'
metric_name = 'population_annual'


# API Manager
am = api_manager(API_Key_l)

# CONFIG
config_dict = {'api_url': 'https://api.stlouisfed.org/geofred/series/data?'
               , 'date': snapshot_date
               , 'series_id': series_id
               , 'metric_name': metric_name
               , 'api_manager': am
              }

# download manager
dl_manager = fred_df_manager(config_dict)


# # test
# data_df = dl_manager.get_data(series_id)


# # download
data_df = dl_manager.download_data({'value': metric_name})

# PREVIEW
data_df.head()

title: 2023 Resident Population by County (Thousands of Persons)
region: county
seasonality: Not Seasonally Adjusted
units: Thousands of Persons
frequency: Annual


Downloading data:   5%|▌         | 163/3127 [00:40<15:13,  3.24it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:   9%|▉         | 282/3127 [01:07<09:53,  4.80it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  13%|█▎        | 414/3127 [01:41<10:12,  4.43it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  17%|█▋        | 534/3127 [02:10<08:41,  4.97it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  21%|██        | 658/3127 [02:38<12:36,  3.26it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  25%|██▍       | 777/3127 [03:06<08:38,  4.53it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  29%|██▉       | 915/3127 [03:38<08:41,  4.24it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  33%|███▎      | 1036/3127 [04:09<09:11,  3.79it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  37%|███▋      | 1162/3127 [04:39<10:03,  3.25it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  41%|████      | 1282/3127 [05:08<06:10,  4.98it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  45%|████▌     | 1413/3127 [05:46<05:45,  4.96it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  52%|█████▏    | 1636/3127 [06:39<06:08,  4.04it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  56%|█████▌    | 1756/3127 [07:07<04:46,  4.79it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  60%|██████    | 1887/3127 [07:41<05:56,  3.48it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  64%|██████▍   | 2008/3127 [08:10<05:05,  3.66it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  68%|██████▊   | 2133/3127 [08:40<03:53,  4.26it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  72%|███████▏  | 2253/3127 [09:08<03:10,  4.58it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  76%|███████▌  | 2382/3127 [09:41<03:14,  3.82it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  80%|████████  | 2503/3127 [10:09<02:49,  3.67it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  84%|████████▍ | 2628/3127 [10:38<01:41,  4.91it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  88%|████████▊ | 2748/3127 [11:06<01:16,  4.96it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  92%|█████████▏| 2889/3127 [11:40<01:13,  3.25it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  96%|█████████▋| 3010/3127 [12:07<00:29,  3.91it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data: 100%|██████████| 3127/3127 [12:34<00:00,  4.14it/s]


Unnamed: 0,realtime_start,realtime_end,date,population_annual,region,code,value,series_id
0,2024-04-21,2024-04-21,1970-01-01,5.199,"Rappahannock County, VA",51157,7.414,VARAPP7POP
1,2024-04-21,2024-04-21,1971-01-01,5.2,"Rappahannock County, VA",51157,7.414,VARAPP7POP
2,2024-04-21,2024-04-21,1972-01-01,5.3,"Rappahannock County, VA",51157,7.414,VARAPP7POP
3,2024-04-21,2024-04-21,1973-01-01,5.2,"Rappahannock County, VA",51157,7.414,VARAPP7POP
4,2024-04-21,2024-04-21,1974-01-01,5.4,"Rappahannock County, VA",51157,7.414,VARAPP7POP


In [None]:
# Cleaning
col_l = ['date', 'year'
          #, 'quarter'
          #, 'month'
          , 'country', 'region', 'state', 'county', 'code', metric_name]

data_df['date'] = pd.to_datetime(data_df['date'])
data_df['year'] = data_df['date'].dt.year
# data_df['quarter'] = data_df['date'].dt.quarter
# data_df['month'] = data_df['date'].dt.month
data_df['date'] = data_df['date'].dt.date

data_df['country'] = 'USA'
data_df['state'] = data_df['region'].apply(lambda x: str(x).split(',')[-1].strip())
data_df['county'] = data_df['region'].apply(lambda x: str(x).split(',')[0].strip())


population_df = data_df[col_l].drop_duplicates()

population_df.head()

Unnamed: 0,date,year,country,region,state,county,code,population_annual
0,1970-01-01,1970,USA,"Rappahannock County, VA",VA,Rappahannock County,51157,5.199
1,1971-01-01,1971,USA,"Rappahannock County, VA",VA,Rappahannock County,51157,5.2
2,1972-01-01,1972,USA,"Rappahannock County, VA",VA,Rappahannock County,51157,5.3
3,1973-01-01,1973,USA,"Rappahannock County, VA",VA,Rappahannock County,51157,5.2
4,1974-01-01,1974,USA,"Rappahannock County, VA",VA,Rappahannock County,51157,5.4


In [None]:
# download data
fn = f'{metric_name}.csv'
population_df.to_csv(fn, index=False)
files.download(fn)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
---
---

In [None]:
# Settings
snapshot_date = r'2023-10-01'
series_id = 'EQFXSUBPRIME036061'
metric_name = 'Equifax_Subprime_Credit_Population'


# API Manager
am = api_manager(API_Key_l)

# CONFIG
config_dict = {'api_url': 'https://api.stlouisfed.org/geofred/series/data?'
               , 'date': snapshot_date
               , 'series_id': series_id
               , 'metric_name': metric_name
               , 'api_manager': am
              }

# download manager
dl_manager = fred_df_manager(config_dict)


# # test
# data_df = dl_manager.get_data(series_id)


# # download
data_df = dl_manager.download_data({'value': metric_name})

# PREVIEW
data_df.head()

title: 2023 Q4 Equifax Subprime Credit Population by County (Percent)
region: county
seasonality: Not Seasonally Adjusted
units: Percent
frequency: Quarterly


Downloading data:   4%|▍         | 119/3131 [00:27<13:06,  3.83it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:   8%|▊         | 266/3131 [01:03<12:08,  3.93it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  12%|█▏        | 385/3131 [01:32<12:53,  3.55it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  16%|█▋        | 509/3131 [02:02<09:24,  4.65it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  20%|██        | 629/3131 [02:30<08:43,  4.78it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  24%|██▍       | 763/3131 [03:01<09:07,  4.33it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  28%|██▊       | 883/3131 [03:28<09:13,  4.06it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  33%|███▎      | 1026/3131 [04:05<08:52,  3.95it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  37%|███▋      | 1146/3131 [04:33<08:45,  3.78it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  40%|████      | 1266/3131 [05:02<08:43,  3.56it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  44%|████▍     | 1386/3131 [05:30<06:50,  4.25it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  49%|████▊     | 1519/3131 [06:01<05:54,  4.55it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  52%|█████▏    | 1640/3131 [06:32<06:30,  3.82it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  56%|█████▋    | 1769/3131 [07:02<05:37,  4.04it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  60%|██████    | 1889/3131 [07:32<05:15,  3.94it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  64%|██████▍   | 2015/3131 [08:02<04:55,  3.77it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  68%|██████▊   | 2134/3131 [08:29<03:56,  4.22it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  73%|███████▎  | 2272/3131 [09:01<03:57,  3.61it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  76%|███████▋  | 2392/3131 [09:30<02:35,  4.76it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  81%|████████  | 2525/3131 [10:01<02:07,  4.77it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  84%|████████▍ | 2645/3131 [10:28<01:42,  4.73it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  89%|████████▉ | 2786/3131 [11:02<01:14,  4.63it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  93%|█████████▎| 2906/3131 [11:30<00:48,  4.62it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  97%|█████████▋| 3043/3131 [12:01<00:23,  3.77it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data: 100%|██████████| 3131/3131 [12:23<00:00,  4.21it/s]


Unnamed: 0,realtime_start,realtime_end,date,Equifax_Subprime_Credit_Population,region,code,value,series_id
0,2024-04-21,2024-04-21,1999-01-01,.,"Martin County, TX",48317,32.105263,EQFXSUBPRIME048317
1,2024-04-21,2024-04-21,1999-04-01,.,"Martin County, TX",48317,32.105263,EQFXSUBPRIME048317
2,2024-04-21,2024-04-21,1999-07-01,.,"Martin County, TX",48317,32.105263,EQFXSUBPRIME048317
3,2024-04-21,2024-04-21,1999-10-01,.,"Martin County, TX",48317,32.105263,EQFXSUBPRIME048317
4,2024-04-21,2024-04-21,2000-01-01,.,"Martin County, TX",48317,32.105263,EQFXSUBPRIME048317


In [None]:
# Cleaning
col_l = ['date', 'year', 'quarter'
          #, 'month'
          , 'country', 'region', 'state', 'county', 'code', metric_name]

data_df['date'] = pd.to_datetime(data_df['date'])
data_df['year'] = data_df['date'].dt.year
data_df['quarter'] = data_df['date'].dt.quarter
# data_df['month'] = data_df['date'].dt.month
data_df['date'] = data_df['date'].dt.date

data_df['country'] = 'USA'
data_df['state'] = data_df['region'].apply(lambda x: str(x).split(',')[-1].strip())
data_df['county'] = data_df['region'].apply(lambda x: str(x).split(',')[0].strip())


Equifax_Subprime_Credit_df = data_df[col_l].drop_duplicates()

Equifax_Subprime_Credit_df.head()

Unnamed: 0,date,year,quarter,country,region,state,county,code,Equifax_Subprime_Credit_Population
0,1999-01-01,1999,1,USA,"Martin County, TX",TX,Martin County,48317,.
1,1999-04-01,1999,2,USA,"Martin County, TX",TX,Martin County,48317,.
2,1999-07-01,1999,3,USA,"Martin County, TX",TX,Martin County,48317,.
3,1999-10-01,1999,4,USA,"Martin County, TX",TX,Martin County,48317,.
4,2000-01-01,2000,1,USA,"Martin County, TX",TX,Martin County,48317,.


In [None]:
# download data
fn = f'{metric_name}.csv'
Equifax_Subprime_Credit_df.to_csv(fn, index=False)
files.download(fn)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
---
---

In [None]:
# Settings
snapshot_date = r'2024-02-01'
series_id = 'ORMULT1LFN'
metric_name = 'Civilian_Labor_Force'


# API Manager
am = api_manager(API_Key_l)

# CONFIG
config_dict = {'api_url': 'https://api.stlouisfed.org/geofred/series/data?'
               , 'date': snapshot_date
               , 'series_id': series_id
               , 'metric_name': metric_name
               , 'api_manager': am
              }

# download manager
dl_manager = fred_df_manager(config_dict)


# # test
# data_df = dl_manager.get_data(series_id)


# # download
data_df = dl_manager.download_data({'value': metric_name})

# PREVIEW
data_df.head()

title: 2024 February Civilian Labor Force by County (Persons)
region: county
seasonality: Not Seasonally Adjusted
units: Persons
frequency: Monthly


Downloading data:   5%|▍         | 151/3139 [00:36<10:34,  4.71it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:   9%|▊         | 271/3139 [01:07<13:54,  3.44it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  12%|█▏        | 392/3139 [01:36<13:01,  3.51it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  16%|█▋        | 512/3139 [02:06<11:17,  3.88it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  20%|██        | 636/3139 [02:40<11:44,  3.55it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  27%|██▋       | 860/3139 [03:39<10:02,  3.79it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  35%|███▍      | 1088/3139 [04:39<08:23,  4.07it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  42%|████▏     | 1313/3139 [05:38<07:17,  4.17it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  49%|████▉     | 1535/3139 [06:40<07:32,  3.55it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  56%|█████▋    | 1766/3139 [07:37<06:11,  3.70it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  60%|██████    | 1886/3139 [08:07<04:55,  4.25it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  64%|██████▍   | 2007/3139 [08:37<04:36,  4.09it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  68%|██████▊   | 2127/3139 [09:07<04:03,  4.16it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  72%|███████▏  | 2248/3139 [09:38<03:28,  4.28it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  79%|███████▉  | 2487/3139 [10:38<03:00,  3.62it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  87%|████████▋ | 2719/3139 [11:38<01:50,  3.79it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data:  94%|█████████▍| 2955/3139 [12:40<00:51,  3.59it/s]

Error Message: Too Many Requests.  Exceeded Rate Limit
Switching Keys: 0


Downloading data: 100%|██████████| 3139/3139 [13:28<00:00,  3.88it/s]


Unnamed: 0,realtime_start,realtime_end,date,Civilian_Labor_Force,region,code,value,series_id
0,2024-04-21,2024-04-21,1990-01-01,7121,"Westmoreland County, VA",51193,9616,VAWEST3LFN
1,2024-04-21,2024-04-21,1990-02-01,7022,"Westmoreland County, VA",51193,9616,VAWEST3LFN
2,2024-04-21,2024-04-21,1990-03-01,7257,"Westmoreland County, VA",51193,9616,VAWEST3LFN
3,2024-04-21,2024-04-21,1990-04-01,7362,"Westmoreland County, VA",51193,9616,VAWEST3LFN
4,2024-04-21,2024-04-21,1990-05-01,7712,"Westmoreland County, VA",51193,9616,VAWEST3LFN


In [None]:
# Cleaning
col_l = ['date', 'year', 'quarter', 'month', 'country', 'region', 'state', 'county', 'code', metric_name]

data_df['date'] = pd.to_datetime(data_df['date'])
data_df['year'] = data_df['date'].dt.year
data_df['quarter'] = data_df['date'].dt.quarter
data_df['month'] = data_df['date'].dt.month
data_df['date'] = data_df['date'].dt.date

data_df['country'] = 'USA'
data_df['state'] = data_df['region'].apply(lambda x: str(x).split(',')[-1].strip())
data_df['county'] = data_df['region'].apply(lambda x: str(x).split(',')[0].strip())


Civilian_Labor_Force_df = data_df[col_l].drop_duplicates()

Civilian_Labor_Force_df.head()

Unnamed: 0,date,year,quarter,month,country,region,state,county,code,Civilian_Labor_Force
0,1990-01-01,1990,1,1,USA,"Westmoreland County, VA",VA,Westmoreland County,51193,7121
1,1990-02-01,1990,1,2,USA,"Westmoreland County, VA",VA,Westmoreland County,51193,7022
2,1990-03-01,1990,1,3,USA,"Westmoreland County, VA",VA,Westmoreland County,51193,7257
3,1990-04-01,1990,2,4,USA,"Westmoreland County, VA",VA,Westmoreland County,51193,7362
4,1990-05-01,1990,2,5,USA,"Westmoreland County, VA",VA,Westmoreland County,51193,7712


In [None]:
# download data
fn = f'{metric_name}.csv'
Civilian_Labor_Force_df.to_csv(fn, index=False)
files.download(fn)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
---
---

# Traffic Volume Data

In [None]:
import time
from tqdm import tqdm
import pandas as pd
import requests
import json

In [None]:
def api_json(url_s, payload):
    # Send API request and return JSON response
    r = requests.get(url_s, params = payload)
    return r.json()

In [None]:
import time
from tqdm import tqdm
import pandas as pd

# Set the API endpoint URL
url_s = r'https://data.cityofnewyork.us/resource/7ym2-wayt.json'

# Control variables
i = 50_000  # Batch size (limit) for fetching data
n = 1_600_000  # MAX number of records to fetch
n = 500_000  # Test number of records to fetch # for demo

# Define the expected data types for the columns
data_dict = {'boro': str
             , 'yr': str
             , 'm': str
             , 'd': str
             , 'segmentid': str
             , 'street': str
             , 'vol': int}

# List of column names for grouping the data
group_col_l = ['boro', 'yr', 'm', 'd', 'segmentid', 'street']

r = {}  # Initialize an empty dictionary to store the API response
data_l = []  # Initialize an empty list to store the DataFrames

# Iterate over the range of offsets
for offset in tqdm(range(0, n, i)):
    print(f'{offset} to {offset+i}')  # Print the current offset range

    # Attempt to fetch the data up to 4 times (retries)
    for retry in range(4):
        try:
            payload = {'$limit': i, '$offset': offset}  # Set the payload for the API request
            r = api_json(url_s, payload)  # Call the api_json function to fetch the data (not provided)
            data_df = pd.DataFrame(r).astype(data_dict)  # Create a DataFrame from the API response and set the data types
            data_df = data_df.groupby(group_col_l).agg(vol=('vol', 'sum')).reset_index()  # Group the data and aggregate the 'vol' column
            data_l.append(data_df)  # Append the resulting DataFrame to the list
            break  # Break out of the retry loop if the data fetch is successful

        except Exception as e:
            if (retry + 1) > 3:  # If the number of retries exceeds 3
                raise e  # Raise the exception
            print(e)  # Print the exception
            print(f'retry: {retry+1}/3 in {20*(retry+1)} seconds.')  # Print the retry count and delay time
            time.sleep(20 * (retry + 1))  # Introduce a delay before the next retry attempt (increasing with each retry)

# Concatenate all the DataFrames in the data_l list into a single DataFrame
data_df = pd.concat(data_l, ignore_index=True)

# # Display the first five rows of the resulting DataFrame
data_df.head(5)


100%|██████████| 10/10 [00:00<00:00, 2336.92it/s]

0 to 50000
50000 to 100000
100000 to 150000
150000 to 200000
200000 to 250000
250000 to 300000
300000 to 350000
350000 to 400000
400000 to 450000
450000 to 500000





Unnamed: 0,boro,yr,m,d,segmentid,street,vol
0,Bronx,2008,5,18,154955,N/B HUTCHINSON RIVER PKWY BRIDGE,0
1,Bronx,2008,5,19,154955,N/B HUTCHINSON RIVER PKWY BRIDGE,5
2,Bronx,2009,3,27,87499,BX PARK EAST NB EXIT TO ALLERTON AVE,2496
3,Bronx,2009,3,28,87499,BX PARK EAST NB EXIT TO ALLERTON AVE,1306
4,Bronx,2009,3,30,87499,BX PARK EAST NB EXIT TO ALLERTON AVE,2892


In [None]:
# download data
fn = f'NY_traffic_data.csv'
data_df.to_csv(fn, index=False)
files.download(fn)

**Explanation:**

The algorithm fetches a large dataset from an API in batches, handles potential exceptions during the data fetch process, and creates a pandas DataFrame with the retrieved data. The DataFrame is grouped by specific columns, and the 'vol' column is aggregated by summing the values within each group. The algorithm also includes retry logic with increasing delays to handle temporary failures or errors during the data fetch process.

The code implement a nested loop structure:
a. Outer loop iterates over the range of offsets from 0 to n in steps of i.
b. Inner loop attempts to fetch the data up to 3 times (retries).

  i. Set the payload for the API request with the current \$limit and \$offset.

  ii. Fetch the data from the API endpoint using the provided api_json function (or an equivalent implementation).

  iii. If the data fetch is successful:
  1. Create a DataFrame from the API response.
  2. Set the data types for the columns.
  3. Group the data by the specified columns and aggregate the 'vol' column by summing the values within each group.
  4. Append the resulting DataFrame to the list.
  5. Break out of the inner loop.

iv. If an exception occurs during the data fetch:
  1. Check if the number of retries has exceeded 3. If so, handle the exception appropriately.
  2. If the number of retries is 3 or less, introduce a delay before the next retry attempt, with the delay increasing with each retry.

After all the data has been fetched and processed, concatenate all the DataFrames in the list into a single DataFrame.

---