# Extracting data from NHTSA's APIs.  
[NHTSA Documentation](https://www.nhtsa.gov/nhtsa-datasets-and-apis) 
Purpose: To extract vehicle safety data from NHTSA and store in detla tables for further analysis.
Approach: NHTSA APIs build on each other.  I will follow these steps:
1. Pull the Years that they provide
1. Pull the Make by Year
1. Pull the Models by Year & Make
1. Pull the Trim by Year, Make, & Model
1. Pull the Safety Ratings by Year, Make, Model and Trim 

This code block is pulling the years that the NHTSA has data available for.

In [0]:
import requests

# Define the endpoint
url = "https://api.nhtsa.gov/SafetyRatings"

# Make the request to the NHTSA API
response = requests.get(url)
data = response.json()

# Extract the years from the response
years = [item['ModelYear'] for item in data['Results']]

# Convert the years to a DataFrame
df_years = spark.createDataFrame(years, "int").toDF("Year")

# Write the DataFrame to a Delta table
df_years.write.format("delta").mode("overwrite").saveAsTable("nhtsa_safety_ratings_years")

# Display the DataFrame
display(df_years)

Now we need to pass in each year to get all available Makes by Year and write that to a delta table

In [0]:
import requests

base_url = "https://api.nhtsa.gov/SafetyRatings/modelyear/"

years = [row['Year'] for row in df_years.collect()]

results = []
for year in years:
    url = f"{base_url}{year}"
    response = requests.get(url)
    data = response.json()
    for item in data.get('Results', []):
        item['Year'] = year
        results.append(item)

df_results = spark.createDataFrame(results)
display(df_results)

df_results.write.format("delta").mode("overwrite").saveAsTable("nhtsa_safety_ratings_make_year")

Now we need to get the makes for each make and year combination from the nhtsa_safety_ratings_make_year table.

In [0]:
import requests

# Read the table into a DataFrame
df_make_year = spark.table("nhtsa_safety_ratings_make_year")

# Collect the make and year combinations
make_year_combinations = df_make_year.select("Make", "ModelYear").distinct().collect()

base_url = "https://api.nhtsa.gov/SafetyRatings/modelyear/"

results = []
for row in make_year_combinations:
    make = row['Make']
    modelyear = row['ModelYear']
    url = f"{base_url}{modelyear}/make/{make}"
ZZZZZZ   data = response.json()
    for item in data.get('Results', []):
        item['Make'] = make
        item['ModelYear'] = modelyear
        results.append(item)

df_make_year_results = spark.createDataFrame(results)
display(df_make_year_results)

df_make_year_results.write.format("delta").mode("overwrite").saveAsTable("nhtsa_safety_ratings_make_model_ppzppzppzxyear_model")

Now we need to get the trim for each make, model and year combination from the nhtsa_safety_ratings_make_year_model
nhtsa_safety_ratings_make_year table.