### Focusing on: https://aqs.epa.gov/aqsweb/documents/data_api.html  

Nice because output is in json

RATE LIMITING:
The API has the following limits imposed on request size:

* Length of time. All services (except Monitor) must have the end date (edate field) be in the same year as the begin date (bdate field).
* Number of parameters. Most services allow for the selection of multiple parameter codes (param field). A maximum of 5 parameter codes may be listed in a single request.

Please adhere to the following when using the API.
* Limit the size of q*ueries. Our database contains billions of values and you may request more than you intend. If you are unsure of the amount of data, start small and work your way up. We request that you limit queries to 1,000,000 rows of data each. You can use the "observation count" field on the annualData service to determine how much data exists for a time-parameter-geography combination. If you have any questions or need advice, please contact us.
* Limit the frequency of queries. Our system can process a limited load. If scripting requests, please wait for one request to complete before submitting another and do not make more than 10 requests per minute. Also, we request a pause of 5 seconds between requests and adjust accordingly for response time and size.

In [7]:
import requests
import requests_cache
import html

# requires ipykernel ~ for this specific environment

In [2]:
# Trouble Shooting

import os
print(os.getcwd())

# List files in the current directory to ensure personal.py is present
print(os.listdir())

# identified issues with a pychache directory ~ solve by removing

c:\Users\maxha\Documents\GitHub\air_quality_prediction
['.git', '.gitignore', '.venv', 'data', 'EDA.ipynb', 'EPA_air_quality.sqlite', 'personal.py', 'README.md', 'requirements.txt']


In [8]:
# Creating the Cache
session = requests_cache.CachedSession('EPA_air_quality')

# Install cache globally
# requests_cache.install_cache('EPA_air_quality')

In [2]:
from personal import email

In [None]:
# https://aqs.epa.gov/data/api/signup?email=myemail@example.com
# Sending Access (signup token) to email

endpoint = "https://aqs.epa.gov/data/api/signup"
param = {"email" : email}

response = session.get(endpoint, params=param)
response.raise_for_status()


In [3]:
from personal import EPA_API_KEY

Relevant Packages to Add onto the project  
**Will add packages as necessary ~ not installing the entire redundancy into the environment yet**

### What are the relevant endpoints

* __list/""__ for internal values or codes
* __monitors/__ for operational information about the samplers (monitors) used to collect the data. Includes identifying information, operational dates, operating organizations
* __sampleData/__

DATA:
* __dailyData/__
* __quarterlyData/__
* __annualData/__
* __qaAnnualPerformanceEvaluations/__ pairs of data (known and measured values) at several concentration levels for gaseous criteria pollutants
* __qaCollocatedAssessments/__flow rate checks performed by monitoring agencies
* __qaFlowRateVerifications/__ flow rate audit data
* __qaFlowRateAudits/__ measured versus actual concentration of 1 point QC checks
* __qaPepAudits/__ data related to PM2.5 monitoring system audits
* __transactionsSample/__ sample data in the submission (transaction) format for AQS.
* __transactionsQaAnnualPerformanceEvaluations/__ pairs of data QA at several concentration levels in the submission (transaction) format for AQS

~ blank samples?


## Identification of Relevant Data to Extract

In [9]:
# endpoint list/states with parameters email and key

state_endpoint = "https://aqs.epa.gov/data/api/list/states"
param = {"email" : email, "key" : EPA_API_KEY}

states = session.get(state_endpoint, params = param)
states.raise_for_status()
# going further list/countiesByState	

In [11]:
states.json()

{'Header': [{'status': 'Success',
   'request_time': '2025-01-21T21:56:28-05:00',
   'url': 'https://aqs.epa.gov/data/api/list/states?email=maxvo%40ucdavis.edu&key=rubyhare78',
   'rows': 56}],
 'Data': [{'code': '01', 'value_represented': 'Alabama'},
  {'code': '02', 'value_represented': 'Alaska'},
  {'code': '04', 'value_represented': 'Arizona'},
  {'code': '05', 'value_represented': 'Arkansas'},
  {'code': '06', 'value_represented': 'California'},
  {'code': '08', 'value_represented': 'Colorado'},
  {'code': '09', 'value_represented': 'Connecticut'},
  {'code': '10', 'value_represented': 'Delaware'},
  {'code': '11', 'value_represented': 'District Of Columbia'},
  {'code': '12', 'value_represented': 'Florida'},
  {'code': '13', 'value_represented': 'Georgia'},
  {'code': '15', 'value_represented': 'Hawaii'},
  {'code': '16', 'value_represented': 'Idaho'},
  {'code': '17', 'value_represented': 'Illinois'},
  {'code': '18', 'value_represented': 'Indiana'},
  {'code': '19', 'value_repr