### DSCI 511 Term Project: Mission Parameters Extraction

**Author:** Phillip Roman

**Purpose:** Extract and clean mission-related attributes from raw launch data  

**Date:** November 2025

---

This notebook extracts 10 key mission attributes from the Launch Library 2 API dataset and exports them as a clean TSV file for merging.

In [20]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Import Modules

In [21]:
import json
import pandas as pd
import csv
from pprint import pprint

##Load Raw Data File

Loading the big file `raw_baseline_launches_Phillip.json` from my Google Drive.

I'm wrapping the file-opening part in a `try...except` to handle any errors. Also using `.get('launches', [])` to safely extract the launches list from the JSON structure - gives an empty list if the 'launches' key isn't found.

**Note:** Currently processing the most recent 50 launches for sample testing. USE_SAMPLE flag can be toggled False for full dataset.

In [29]:
try:
    with open('/content/drive/MyDrive/DSCI511/Term Project/raw_baseline_launches_Phillip.json', 'r', encoding='utf-8') as f:
        full_data = json.load(f)
except FileNotFoundError:
    print("ERROR: raw_baseline_launches.json not found.")
    exit()

# list of all launches is inside the 'launches' key
all_launches = full_data.get('launches', [])
print(f"Loaded {len(all_launches)} total launches.")

USE_SAMPLE = False

if USE_SAMPLE:
    data_source = all_launches[-50:] # Sample last 50
else:
    data_source = all_launches # Full dataset

print(f"Processing {len(data_source)} launches...")

Loaded 7333 total launches.
Processing 7333 launches...


## Data Reconnaissance
Quick peek at the first launch record to see what keys are available

In [30]:
# print the first launch in the sample to see all the keys
pprint(data_source[0])

{'agency_launch_attempt_count': 1,
 'agency_launch_attempt_count_year': 1,
 'failreason': '',
 'flightclub_url': None,
 'hashtag': None,
 'id': 'e3df2ecd-c239-472f-95e4-2b89b4f75800',
 'image': {'credit': None,
           'id': 1844,
           'image_url': 'https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/sputnik_8k74ps_image_20210830185541.jpg',
           'license': {'id': 1, 'link': None, 'name': 'Unknown', 'priority': 9},
           'name': '[AUTO] Sputnik 8K74PS - image',
           'single_use': True,
           'thumbnail_url': 'https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/255bauto255d__image_thumbnail_20240305193923.jpeg',
           'variants': []},
 'info_urls': [],
 'infographic': None,
 'last_updated': '2024-03-17T19:17:35Z',
 'launch_designator': '1957-001',
 'launch_service_provider': {'abbrev': 'CCCP',
                             'administrator': None,
                             'attempted_landings': 0,
                        

## Testing Data Paths - doing the Mapping
Mapping paths for 7 attributes on a single launch record before running the full loop on the sample.

Used following reference for safely unnesting dictionary using .get() chaining - a little experimenting:

 https://stackoverflow.com/questions/25833613/safe-method-to-get-value-of-nested-dictionary

In [31]:
#testing paths on the first launch record
test_launch = data_source[0]

print("Testing 'Mission' fields:")
print(f"Launch ID: {test_launch.get('id')}")
print(f"Mission Name: {test_launch.get('mission', {}).get('name')}")
print(f"Mission Type: {test_launch.get('mission', {}).get('type')}")
print(f"Orbit Name: {test_launch.get('mission', {}).get('orbit', {}).get('name')}")

print("\nTesting 'Crew/Program' field:")
print(f"Program: {test_launch.get('program')}") # shows it's a list

print("\nTesting 'Agency/LSP' fields:")
print(f"LSP Name: {test_launch.get('launch_service_provider', {}).get('name')}")
print(f"LSP Type: {test_launch.get('launch_service_provider', {}).get('type', {}).get('name')}")

Testing 'Mission' fields:
Launch ID: e3df2ecd-c239-472f-95e4-2b89b4f75800
Mission Name: Sputnik 1
Mission Type: Test Flight
Orbit Name: Low Earth Orbit

Testing 'Crew/Program' field:
Program: []

Testing 'Agency/LSP' fields:
LSP Name: Soviet Space Program
LSP Type: Government


## Extracting Mission Data

Looping through launches to extract 10 Mission attributes into a table.

Using if/else to handle missing data safely.

Note: Only taking first 'program' name for testing. TODO: if we find a launch with multiple programs.

In [32]:
mission_table = []
mission_header = [ # column names
    "launch_id",
    "mission_id",
    "mission_name",
    "mission_type",
    "mission_description",
    "orbit_name",
    "orbit_abbrev",
    "program_name",
    "lsp_name",
    "lsp_type"
]

for launch in data_source:

    launch_id = launch.get('id') # most important attribute for merging later

    # mission and orbit data
    mission_data = launch.get('mission')
    if mission_data:
        mission_id = mission_data.get('id')
        mission_name = mission_data.get('name')
        mission_type = mission_data.get('type')
        mission_desc = mission_data.get('description')

        # orbit data nested inside mission
        orbit_data = mission_data.get('orbit')
        if orbit_data:
            orbit_name = orbit_data.get('name')
            orbit_abbrev = orbit_data.get('abbrev')
        else:
            orbit_name = None
            orbit_abbrev = None
    else:
        # handle missing Mission data
        mission_id = None
        mission_name = None
        mission_type = None
        mission_desc = None
        orbit_name = None
        orbit_abbrev = None

    # extracts program data - looking for "crewed" indicator
    # stores as a list
    program_list = launch.get('program', [])
    if program_list:
        program_name = program_list[0].get('name') # TODO: handle multiple programs
    else:
        program_name = None

    # launch service provider data - aka "agency"
    lsp_data = launch.get('launch_service_provider')
    if lsp_data:
        lsp_name = lsp_data.get('name')

        # goes one more level down for nested, nested info
        lsp_type_data = lsp_data.get('type')
        if lsp_type_data:
            lsp_type = lsp_type_data.get('name')
        else:
            lsp_type = None
    else:
        lsp_name = None
        lsp_type = None

    # builds the row for this launch
    row = [
        launch_id,
        mission_id,
        mission_name,
        mission_type,
        mission_desc,
        orbit_name,
        orbit_abbrev,
        program_name,
        lsp_name,
        lsp_type
    ]
    mission_table.append(row)

print(f"Finished processing this sample. Created table with {len(mission_table)} rows.")

Finished processing this sample. Created table with 7333 rows.


## Creates DataFrame and Saves Output

Converts mission table to a pandas DataFrame with a quick preview of DataFrame.  

Checks data quality with `.info()` to identify missing values.

Saves as TSV instead of CSV since mission descriptions may contain commas.

**Group discussion topic:** What to do with missing values? (Drop rows? Fill nulls w/ NA or 0 or unknown? Keep as is?)

In [33]:
# convert extracted data to DataFrame
df_mission = pd.DataFrame(mission_table, columns=mission_header)
print("Successfully created Mission DataFrame:")

# summary of categorical columns
print("\nMission Type Distribution:")
print(df_mission['mission_type'].value_counts())

print("\nLSP Type Distribution:")
print(df_mission['lsp_type'].value_counts())

# preview DataFrame
print("\nPrinting first 3 rows:")
print(df_mission.head(3)) # displays plain text with labels

# checks data types and null counts - an important cleaning step
print("\nDataFrame Info (Checking for nulls)")
df_mission.info()

# saving as TSV because mission_description might have commas
output_filename = 'clean_mission_data.tsv'
df_mission.to_csv(output_filename, sep='\t', index=False)

print(f"\nSuccessfully saved data to {output_filename}")

Successfully created Mission DataFrame:

Mission Type Distribution:
mission_type
Government/Top Secret          2253
Communications                 1568
Earth Science                   935
Navigation                      407
Test Flight                     342
Human Exploration               314
Test Target                     205
Astrophysics                    167
Resupply                        124
Lunar Exploration                88
Robotic Exploration              88
Dedicated Rideshare              67
Planetary Science                59
Technology                       43
Heliophysics                     37
                                 26
Tourism                          25
Materials Science                24
Suborbital                       22
Biology                          18
Unknown                           3
Space Situational Awareness       1
Name: count, dtype: int64

LSP Type Distribution:
lsp_type
Government       5307
Commercial       1942
Private            72
Mu

## Preview First 10 Rows

Displaying direct as HTML table for better readability.

In [27]:
df_mission.head(10)

Unnamed: 0,launch_id,mission_id,mission_name,mission_type,mission_description,orbit_name,orbit_abbrev,program_name,lsp_name,lsp_type
0,08fbd641-9f53-4fc8-88be-41fe7c115123,7293,Starlink Group 17-10,Communications,A batch of 24 satellites for the Starlink mega...,Low Earth Orbit,LEO,,SpaceX,Commercial
1,3bfed6d5-d65e-4133-b51a-b664bbd9d006,6270,Cygnus CRS-2 NG-23 (S.S. William “Willie” C. M...,Resupply,This is the 23rd flight of the Northrop Grumma...,Low Earth Orbit,LEO,International Space Station,SpaceX,Commercial
2,57b5b880-7892-42b1-a327-1c919c033cb6,7298,SatNet test satellites,Communications,"Officially described as ""Satellite-Internet Te...",Low Earth Orbit,LEO,,China Aerospace Science and Technology Corpora...,Government
3,09678f8e-d3e5-4e26-87c1-d9b9141c1f61,7299,Starlink Group 10-61,Communications,A batch of 28 satellites for the Starlink mega...,Low Earth Orbit,LEO,Starlink,SpaceX,Commercial
4,92ec4610-4576-4077-b538-65272a5d6491,7277,NS-35,Suborbital,NS-35 is the 35th flight for the New Shepard p...,Suborbital,Sub,,Blue Origin,Commercial
5,64b19cd9-9b2a-4724-890b-6a7001eb3644,7297,Starlink Group 17-12,Communications,A batch of 24 satellites for the Starlink mega...,Low Earth Orbit,LEO,,SpaceX,Commercial
6,161ca0fe-fef2-40a3-88f8-de777c7fc895,7300,Starlink Group 10-27,Communications,A batch of 28 satellites for the Starlink mega...,Low Earth Orbit,LEO,Starlink,SpaceX,Commercial
7,86d62815-9266-481b-951e-e53670d27341,7113,NROL-48,Government/Top Secret,Eleventh batch of satellites for a reconnaissa...,Unknown,,,SpaceX,Commercial
8,10cb03bb-53df-462f-be12-e8e4b8dcca82,7304,JENNA,Government/Top Secret,Sub-orbital launch under Rocket Lab’s Hyperson...,Suborbital,Sub,,Rocket Lab,Commercial
9,b563b8fa-9f6c-4883-9a89-9cb5a073864f,6997,Geely Constellation Group 06,Communications,12 LEO communications satellites for Chinese c...,Low Earth Orbit,LEO,,China Rocket Co. Ltd.,Commercial


## Next Steps

Mission parameters extracted and saved as `clean_mission_data.tsv`

**MISSION PARAMETERS TODOs:**
- Merge with other team members extracted data
- Handle launches with multiple 'programs' (currently only taking first)
- Validate data quality across full dataset
- Create data dictionary documenting each column