# API Data Extraction Process and Code

---

Exploring processes and code to easily access data relevant to study/project.

### Project Hypothesis:
Socioeconomic status, as indicated by income levels, education attainment, and race/ethnicity, is a significant predictor of air quality and health outcomes. Communities with lower socioeconomic status are hypothesized to experience poorer air quality, which in turn leads to a higher prevalence of adverse health outcomes. This relationship is expected to persist even when controlling for potential confounding variables such as geographic location and access to healthcare services.

### Defining Data Collection Parameters
- **Geographic Scope:** Define countries or cities of interest
- **Time Frame:** Define time period coverage
- **Socioeconomic Indicators:** Define indicators of interest (e.g., median income, education level)



In [1]:
# Dependencies

import requests
import pandas as pd
import numpy as np

# Import the relevant API keys ( you will need )
from api_keys import weather_api_key
from api_keys import geoapify_key
from api_keys import aqicn_api_key
from api_keys import gho_who_api_key

# Import citipy to determine the cities based on latitude and longitude
from citipy import citipy

#### Geographic scope


In [8]:
# Empty list for holding the latitude and longitude combinations
lat_lngs = []

# Empty dictionary for holding the city names and country codes
city_details = {}

# Range of latitudes and longitudes
lat_range = (-90, 90) # Min and Max bounds for latitude range
lng_range = (-180, 180) # Min and Max bounds for longitude range

# Create a set of random lat and lng combinations
lats = np.random.uniform(lat_range[0], lat_range[1], size=1500)
lngs = np.random.uniform(lng_range[0], lng_range[1], size=1500)
lat_lngs = zip(lats, lngs) # Aggregate into tuple - pairing latitudes and longitudes

# Identify nearest city, country, and record their coordinates for each lat, lng combination
for lat, lng in lat_lngs:
    city = citipy.nearest_city(lat, lng)
    city_name = city.city_name
    country_code = city.country_code
    coords = (lat, lng)
    
    # If the city is unique, then add it along with the country code and coordinates
    if city_name not in city_details:
        city_details[city_name] = (country_code, coords)

# Print the city count to confirm sufficient count
print(f"Number of unique cities in the list: {len(city_details)}")

Number of unique cities in the list: 564


In [9]:
# Create a DataFrame from the collected data
cities_selected_df = pd.DataFrame({
    'City': [k for k in city_details.keys()],
    'Country': [v[0] for v in city_details.values()],
    'Coords': [v[1] for v in city_details.values()]
})

# Reset index to make sure it starts from 0 and acts as an index column
cities_selected_df.reset_index(inplace=True)
cities_selected_df.rename(columns={'index': 'Index'}, inplace=True)

# Print the DataFrame
print(cities_selected_df)

     Index            City Country                                     Coords
0        0  puerto natales      cl  (-74.30957051941022, -103.02341849042918)
1        1        kingston      nf   (-33.81848833131593, 168.60250925144567)
2        2    richards bay      za    (-30.21006947474273, 36.95953974714848)
3        3        waitangi      nz  (-78.44858286390013, -171.88729439578165)
4        4         samitah      sa     (16.50889833950835, 43.22043051900749)
..     ...             ...     ...                                        ...
559    559         sogndal      no     (61.60917050712115, 7.120701656781932)
560    560      walvis bay      na    (-24.580680499512837, 9.86992685320618)
561    561       kavaratti      in    (12.612625912719878, 67.71123086167287)
562    562         caarapo      br   (-22.82789077622583, -54.78460994903298)
563    563      north elba      us      (44.1890496912815, -73.9008585042401)

[564 rows x 4 columns]


#### Time frame

#### Socio-economic indicators

## API Data Extraction

### AQICN API

In [None]:
# Replace with your actual API key
api_key = 'YOUR_AQICN_API_KEY'

# Define the endpoint and parameters for your request
endpoint = 'http://api.waqi.info/feed/'
city = 'city_name'
params = {
    'token': api_key
}

# Make the request and collect the data
response = requests.get(f'{endpoint}/{city}/', params=params)
data = response.json()

# Extract data and convert to DataFrame
aq_data = data['data']['iaqi']
df_aq = pd.DataFrame(aq_data).transpose()
