<a href="https://colab.research.google.com/github/JdavidRamirez/nasa-etl-project/blob/main/colab_notebooks/nasa_data_extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧹 NASA NEO Data Transformation

This notebook loads the raw JSON data collected from the NASA Near-Earth Object (NEO) API and transforms it into a structured tabular format using pandas. The main steps include:

1. **Load the JSON file** containing raw asteroid data.
2. **Extract relevant features** such as:
   - Name of the asteroid
   - Date of close approach
   - Estimated diameter (min & max in meters)
   - Whether it is potentially hazardous
   - Relative velocity (km/h)
   - Miss distance from Earth (km)
   - Orbiting body (usually Earth)
3. **Normalize nested JSON** structures into flat columns.
4. **Save the clean data** as a CSV file for further use (e.g., PostgreSQL ingestion, dashboarding).




In [1]:
#Main libraries to fletch data from the NASA API
import requests
import pandas as pd
import json
from datetime import date, timedelta

In [2]:
# Replace with your API key
API_KEY = "6socNC5nPTWKsGuhC0L0j48ejJFtCxAnFMCl0Ofj"  # or your real key
BASE_URL = "https://api.nasa.gov/neo/rest/v1/feed"


# Set your date range (max 7 days per request)
start_date = date.today() - timedelta(days=3)
end_date = date.today()

params = {
    "start_date": start_date.isoformat(),
    "end_date": end_date.isoformat(),
    "api_key": API_KEY
}

# Make the request
response = requests.get(BASE_URL, params=params)

# Check status
if response.status_code == 200:
    data = response.json()
    print(" Data retrieved successfully!")
else:
    print(" Failed to fetch data:", response.status_code)

# Save raw data to file
with open("nasa_raw_data.json", "w") as f:
    json.dump(data, f)

# Preview top-level keys
print("\nTop-level keys:", data.keys())



 Data retrieved successfully!

Top-level keys: dict_keys(['links', 'element_count', 'near_earth_objects'])


In [3]:

# Preview keys
neo_data = data["near_earth_objects"]

# Flatten into a list of NEOs across all dates
records = []

for date_str, neos in neo_data.items():
    for neo in neos:
        for approach in neo["close_approach_data"]:
            records.append({
                "name": neo["name"],
                "close_approach_date": approach["close_approach_date"],
                "is_hazardous": neo["is_potentially_hazardous_asteroid"],
                "estimated_diameter_min_m": neo["estimated_diameter"]["meters"]["estimated_diameter_min"],
                "estimated_diameter_max_m": neo["estimated_diameter"]["meters"]["estimated_diameter_max"],
                "relative_velocity_kph": float(approach["relative_velocity"]["kilometers_per_hour"]),
                "miss_distance_km": float(approach["miss_distance"]["kilometers"]),
                "orbiting_body": approach["orbiting_body"]
            })

df = pd.DataFrame(records)
df.head()


Unnamed: 0,name,close_approach_date,is_hazardous,estimated_diameter_min_m,estimated_diameter_max_m,relative_velocity_kph,miss_distance_km,orbiting_body
0,(2005 GR33),2025-04-05,False,99.209892,221.840062,91815.570036,60596020.0,Earth
1,(2007 SQ6),2025-04-05,False,96.506147,215.794305,23674.835128,4192048.0,Earth
2,(2012 FT35),2025-04-05,False,3.841979,8.590926,49217.32982,44568210.0,Earth
3,(2014 WU202),2025-04-05,False,11.602591,25.944182,21340.915303,63275610.0,Earth
4,(2014 WP362),2025-04-05,False,60.891262,136.157002,45774.178498,59369120.0,Earth
