# 🚀 Project Title: NASA Space Object Index Data Extraction + EDA 

## 📌 One-Liner
Automates the extraction, cleaning, and exploratory analysis of launch and mission data from a NASA infinite-scroll web resource to support space-tech market intelligence and infrastructure strategy.

---

## TL;DR Executive Summary
**3 Findings**:  
1. Successfully built an automated scraper for NASA’s mission listings using infinite-scroll handling.  
2. Cleaned and structured the dataset into consistent formats for mission names, launch dates, locations, and mission objectives.  
3. Conducted exploratory data analysis revealing patterns in mission frequency, geographic launch distribution, and thematic mission types.

**2 Implications**:  
1. Enables ongoing, low-effort tracking of NASA’s mission portfolio for competitive intelligence.  
2. Establishes a reusable pipeline for other space agencies’ open portals.

**1 Recommendation**:  
Extend this workflow to include launch success metrics, payload mass, and satellite type for richer strategic correlation.

---

## 🎯 Problem Statement & Decision Context
- **Business Question**: How can we systematically collect, clean, and analyze launch mission data from NASA to identify patterns relevant for market positioning in the space-tech and EO sectors?  
- **Scope**: NASA missions page, infinite scroll, structured tabular extraction, CSV/GeoJSON storage, EDA in Python.  
- **Out of Scope**: Real-time mission tracking, orbital mechanics calculations, and EO raster integration (covered in later projects).  
- **Success Criteria**: Fully automated data extraction script + cleaned dataset + EDA visualizations revealing at least three actionable patterns.

---

## 👥 Stakeholders & Use Cases
- **Primary Stakeholders**:  
  - Aerospace startups (Skyroot, Pixxel) for competitive benchmarking  
  - Space policy think tanks for mission diversity analysis  
  - Infrastructure planners for launch site capacity planning

- **Use Cases**:  
  - Regular reporting on NASA mission pipeline  
  - Comparative analysis with other agencies  
  - Foundation for EO mission overlay and downstream analytics

---

## 🗂 Data Card
- **Source**: NASA Launch/Mission website (infinite scroll endpoint)  
- **Method**: Selenium/Python requests with dynamic content loading handling  
- **License**: Public domain (US Government works)  
- **Update Frequency**: Daily/Weekly (can be scheduled)  
- **Key Attributes**:  
  - `mission_name` (string)  
  - `launch_date` (datetime)  
  - `launch_location` (string)  
  - `mission_type` (categorical)  
  - `mission_summary` (text)

- **Known Limitations**:  
  - Data may omit classified missions  
  - Inconsistent mission type labels require standardization

---

## 🔍 Method Overview
1. **Data Extraction**  
   - Automated infinite-scroll loading until all records loaded  
   - HTML parsing & structured field extraction  
2. **Data Cleaning**  
   - Standardizing dates, normalizing location names, deduplicating records  
3. **Exploratory Data Analysis**  
   - Launch frequency by year/quarter  
   - Launch sites distribution mapping  
   - Mission type breakdown  
4. **Output Preparation**  
   - CSV for tabular use  
   - GeoJSON for GIS integration

---

## ⚙️ Environment & Reproducibility
- **Python Version**: 3.10+  
- **Key Libraries**: pandas, requests, BeautifulSoup, Selenium, geopandas, matplotlib/seaborn  
- **Runtime**: ~5–8 minutes end-to-end  
- **File Structure**:


In [32]:
df_unoosa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21289 entries, 0 to 21288
Data columns (total 25 columns):
 #   Column                                                                          Non-Null Count  Dtype         
---  ------                                                                          --------------  -----         
 0   id                                                                              21289 non-null  object        
 1   uri                                                                             21289 non-null  object        
 2   international_designator                                                        21289 non-null  object        
 3   international_designator_off                                                    21289 non-null  object        
 4   national_designator                                                             7576 non-null   object        
 5   space_object_name                                                         

In [None]:
import pandas as pd
df_unoosa = pd.read_csv("/Users/aaeush/Desktop/Drive/Drive/Academics/Py Project/MyCode/OrbitIQ/exports/unoosa_index_of_objects_launched_into_space.csv")

df_unoosa.head()

In [None]:
df_unoosa.isnull().sum()

In [None]:

# 5. Summary Statistics
print('\nSummary statistics:')
display(df_unoosa.describe(include='all'))


In [None]:
unoosa_rename_map = {
    "id": "id",
    "uri": "uri",

    "values.object.internationalDesignator_s1": "international_designator", #ID of object
    "values.object.internationalDesignator@official_s1": "international_designator_off", #True or False
    "values.object.nationalDesignator_s1": "national_designator",

    "values.object.nameOfSpaceObjectIno_s1": "space_object_name",

    "values.object.nameOfSpaceObjectO_s1": "space_object_name_2",

    "values.object.launch.stateOfRegistry_s1": "state_of_registry",
    "values.object.launch.stateOfRegistry@official_s1": "state_of_registry_off",

    "values.object.launch.dateOfLaunch_s1": "date_of_launch",
    "values.object.status.gsoLocation_s1": "gso_location",
    "values.object.unRegistration.unRegistered_s1": "un_registered",
    "values.en#object.status.objectStatus_s1": "status",
    "values.object.status@official_s1": "status_off",
    "values.object.status.dateOfDecay_s1": "date_of_decay",

    "values.object.launch.dateOfLaunch@official_s1":"date_of_launch_off" ,
    "values.object.status.dateOfDecay@official_s1":"date_of_decay_off" ,

    "values.object.functionOfSpaceObject_s1": "function",
    "values.object.remark_s1": "remarks",

    "values.object.status.webSite_s1": "external_website",

    "values.object.unRegistration.registrationDocuments.document@uri_s": "values.object.unRegistration.registrationDocuments.document@uri_s",
    
    "values.object.unRegistration.registrationDocuments.document..document.symbol_s": "values.object.unRegistration.registrationDocuments.document..document.symbol_s",
    "values.object.status.gsoLocation@official_s1": "gso_location_off",
    "values.object.unRegistration.decayDocuments.document@uri_s": "decay_document_uri",
    "values.object.unRegistration.decayDocuments.document..document.symbol_s": "symbol",
}

In [None]:
df_unoosa.rename(columns=unoosa_rename_map, inplace=True)
print("Successully mapped columns")
print(list(df_unoosa.columns))

In [None]:
unoosa_delete_list = []

In [None]:
# List columns where all values are NaN
removed_columns = df_unoosa.columns[df_unoosa.isna().all()].tolist()
print("Removed columns:", removed_columns)

In [None]:

# 3. Convert Launch_Date to datetime object
df_unoosa['date_of_launch'] = pd.to_datetime(df_unoosa['date_of_launch'], errors='coerce')


In [None]:

# 4. Replace NaN with None (null)
df_unoosa = df_unoosa.where(pd.notnull(df_unoosa), None)


In [None]:
df_unoosa.info()

In [None]:
df_unoosa.describe(include='all')

In [None]:
df_unoosa.head()

In [30]:
empty_dates = df_unoosa[df_unoosa['date_of_launch'].isnull()]
empty_dates.head(10)

Unnamed: 0,id,uri,international_designator,international_designator_off,national_designator,space_object_name,space_object_name_2,state_of_registry,state_of_registry_off,date_of_launch,...,date_of_launch_off,date_of_decay_off,function,remarks,external_website,values.object.unRegistration.registrationDocuments.document@uri_s,values.object.unRegistration.registrationDocuments.document..document.symbol_s,gso_location_off,decay_document_uri,symbol
2020,"102,en,/osoindex/data/objects/2024/2022-144k_2...",/osoindex/data/objects/2024/2022-144k_21922.html,2022-144K,True,,,USA 399,USA,True,NaT,...,False,False,Spacecraft engaged in practical applications a...,Date of launch is approximate date of deployment.,,"[""/osoindex/data/documents/us/st/stsgser.e1227...","[""ST/SG/SER.E/1227""]",,,
6540,"102,en,/osoindex/data/objects/2023/2022-144f_1...",/osoindex/data/objects/2023/2022-144f_16349.html,2022-144F,True,,,USA 341,USA,True,NaT,...,False,False,Spacecraft engaged in practical applications a...,------,,"[""/osoindex/data/documents/us/st/stsgser.e1110...","[""ST/SG/SER.E/1110""]",,,
6541,"102,en,/osoindex/data/objects/2023/2022-144g_1...",/osoindex/data/objects/2023/2022-144g_16430.html,2022-144G,True,,,LINUSS1,USA,True,NaT,...,False,False,Spacecraft engaged in practical applications a...,------,,"[""/osoindex/data/documents/us/st/stsgser.e1110...","[""ST/SG/SER.E/1110""]",,,
6542,"102,en,/osoindex/data/objects/2023/2022-144h_1...",/osoindex/data/objects/2023/2022-144h_16431.html,2022-144H,True,,,LINUSS2,USA,True,NaT,...,False,False,Spacecraft engaged in practical applications a...,------,,"[""/osoindex/data/documents/us/st/stsgser.e1110...","[""ST/SG/SER.E/1110""]",,,
6543,"102,en,/osoindex/data/objects/2023/2022-144e_1...",/osoindex/data/objects/2023/2022-144e_16348.html,2022-144E,True,,,USA 340,USA,True,NaT,...,False,False,Spacecraft engaged in practical applications a...,------,,"[""/osoindex/data/documents/us/st/stsgser.e1110...","[""ST/SG/SER.E/1110""]",,,
10275,"102,en,/osoindex/data/objects/2021/2019-022p_1...",/osoindex/data/objects/2021/2019-022p_12635.html,2019-022P,True,,AC 10A PROBE (JACKIE),,USA,True,NaT,...,False,True,------,Date of launch is date of deployment of AEROCU...,,"[""/osoindex/data/documents/us/st/stsgser.e1024...","[""ST/SG/SER.E/1024""]",,"[""/osoindex/data/documents/us/st/stsgser.e1024...","[""ST/SG/SER.E/1024""]"
11891,"102,en,/osoindex/data/objects/2020/2019-022m_1...",/osoindex/data/objects/2020/2019-022m_10998.html,2019-022M,True,,AC 10 PROBE (FULLER),,USA,True,NaT,...,False,True,------,------,,"[""/osoindex/data/documents/us/st/stsgser.e964....","[""ST/SG/SER.E/964""]",,"[""/osoindex/data/documents/us/st/stsgser.e964....","[""ST/SG/SER.E/964""]"
11892,"102,en,/osoindex/data/objects/2020/2019-022n_1...",/osoindex/data/objects/2020/2019-022n_11692.html,2019-022N,True,,,AC 10 Probe (Golf),USA,True,NaT,...,False,True,Spacecraft engaged in practical applications a...,Date of launch is date of deployment of AEROCU...,,"[""/osoindex/data/documents/us/st/stsgser.e967....","[""ST/SG/SER.E/967""]",,"[""/osoindex/data/documents/us/st/stsgser.e983....","[""ST/SG/SER.E/983""]"
12453,"102,en,/osoindex/data/objects/2019/2019-022j_1...",/osoindex/data/objects/2019/2019-022j_10392.html,2019-022J,True,,AEROCUBE 10PRB,AC 10 Probe (Venturini),USA,True,NaT,...,False,,Spacecraft engaged in practical applications a...,Date of launch is date of deployment from AERO...,,"[""/osoindex/data/documents/us/st/stsgser.e928....","[""ST/SG/SER.E/928""]",,"[""/osoindex/data/documents/us/st/stsgser.e942....","[""ST/SG/SER.E/942""]"


#EDA Results

1. 