# Raw Data Filtering Notebook

In this notebook we will be carrying out the inital exploration of the data to locate and select specific variables that we wish to include in our future cleaned dataset.



## Imports and Installs

In [1]:
import json
from pprint import pprint
import zipfile

## Loading the Data

In [2]:
#The path to the zipfile
zip_path = "../raw data/raw_launch_data.json.zip"
launch_data_filename = 'raw_launch_data.json'

#loading the raw launch data 
with zipfile.ZipFile(zip_path, "r") as z:
    with z.open(launch_data_filename) as f:
        raw_launch_data = json.load(f)

#Firstly what type of data is it
type(raw_launch_data)

#printing out the first entry to ensure data was loaded correctly
#pprint(raw_launch_data[:5])

dict

The first level of the data is a dictionary, so to figure out what nested levels exist within the dictionary we will have to take a look at the key variables for this level.

In [3]:
#What are the keys for the raw_launch_data dictionary
raw_launch_data.keys()

dict_keys(['collector', 'total_launches', 'collection_date', 'launches'])

In [4]:
raw_launch_data['collection_date']

'2025-11-07 05:20:58'

In [5]:
type(raw_launch_data['launches'])

list

In [6]:
#What is the length of the raw launches list?
len(raw_launch_data['launches'])

7336

Ok so we can treat every entry within the list as their own individual launches

In [7]:
#So to access the first launch data we will use the following
#decided to change discovery to most recent launch 
raw_launch_data['launches'][-1]

{'id': '6602c88f-cbff-4495-b417-a184ddb0a426',
 'url': 'https://ll.thespacedevs.com/2.3.0/launches/6602c88f-cbff-4495-b417-a184ddb0a426/',
 'name': 'Falcon 9 Block 5 | Starlink Group 11-14',
 'response_mode': 'detailed',
 'slug': 'falcon-9-block-5-starlink-group-11-14',
 'launch_designator': '2025-254',
 'status': {'id': 3,
  'name': 'Launch Successful',
  'abbrev': 'Success',
  'description': 'The launch vehicle successfully inserted its payload(s) into the target orbit(s).'},
 'last_updated': '2025-11-07T03:05:58Z',
 'net': '2025-11-06T21:13:50Z',
 'net_precision': {'id': 0,
  'name': 'Second',
  'abbrev': 'SEC',
  'description': 'The T-0 is accurate to the second.'},
 'window_end': '2025-11-07T00:56:00Z',
 'window_start': '2025-11-06T20:56:00Z',
 'image': {'id': 1296,
  'name': 'Starlink night fairing',
  'image_url': 'https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/falcon2520925_image_20221009234147.png',
  'thumbnail_url': 'https://thespacedevs-prod.nyc3.digital

In [8]:
launch_data = raw_launch_data['launches']

In [9]:
launch_data[-1]['rocket']['configuration'].keys()

dict_keys(['response_mode', 'id', 'url', 'name', 'families', 'full_name', 'variant', 'active', 'is_placeholder', 'manufacturer', 'program', 'reusable', 'image', 'info_url', 'wiki_url', 'description', 'alias', 'min_stage', 'max_stage', 'length', 'diameter', 'maiden_flight', 'launch_cost', 'launch_mass', 'leo_capacity', 'gto_capacity', 'geo_capacity', 'sso_capacity', 'to_thrust', 'apogee', 'total_launch_count', 'consecutive_successful_launches', 'successful_launches', 'failed_launches', 'pending_launches', 'attempted_landings', 'successful_landings', 'failed_landings', 'consecutive_successful_landings', 'fastest_turnaround'])

## Initial Data Capture

**In this section we will be capturing the variables we wish to use for our data set.**

We will start this process by following the breif guidline on the data we planned to collect in our project proposal



In [10]:
### Rocket Data

# Rocket Name and Company Name
print("Unique Rocket identifier : ", launch_data[-1]['id']) #Will be usefull if we plan to break data up
print("Rocket Name : ", launch_data[-1]['rocket']['configuration']['name'])
print("Company Name : ", launch_data[-1]['rocket']['configuration']['families'][0]['manufacturer'][0]['name'])
print("Company Founding Year : ", launch_data[-1]['rocket']['configuration']['families'][0]['manufacturer'][0]['founding_year'])
print("Country Affiliation : ", launch_data[-1]['rocket']['configuration']['families'][0]['manufacturer'][0]['country'][0]['name'])
print("Company Type : ", launch_data[-1]['rocket']['configuration']['families'][0]['manufacturer'][0]['type']['name'])

Unique Rocket identifier :  6602c88f-cbff-4495-b417-a184ddb0a426
Rocket Name :  Falcon 9
Company Name :  SpaceX
Company Founding Year :  2002
Country Affiliation :  United States of America
Company Type :  Commercial


In [11]:
#Rocket Related Parameters
print("Rocket related parameters")
print("---------------")
print("Is the Rocket Reusable : ", launch_data[-1]['rocket']['configuration']['reusable'])
print("Min. No. Stages : ", launch_data[-1]['rocket']['configuration']['min_stage'])
print("Max No. Stages : ", launch_data[-1]['rocket']['configuration']['max_stage'])
print("Rocket Length (meters): ", launch_data[-1]['rocket']['configuration']['length'])
print("Rocket Diameter (meters): ", launch_data[-1]['rocket']['configuration']['diameter'])
print("Rocket Launch Cost : ", launch_data[-1]['rocket']['configuration']['launch_cost'])
print("Rocket Liftoff Mass (Tons): ", launch_data[-1]['rocket']['configuration']['launch_mass'])
print("Rocket Payload Mass to LEO (kg): ", launch_data[-1]['rocket']['configuration']['leo_capacity'])
print("Rocket Payload Mass to GTO (kg): ", launch_data[-1]['rocket']['configuration']['gto_capacity'])
print("Rocket Payload Mass to GEO (kg): ", launch_data[-1]['rocket']['configuration']['geo_capacity'])  #Could be removed if all null
print("Rocket Payload Mass to SSO (kg): ", launch_data[-1]['rocket']['configuration']['sso_capacity'])  #Could be removed if all null
print("Rocket Liftoff Thrust (kN): ", launch_data[-1]['rocket']['configuration']['to_thrust'])
print("Rocket Apogee: ", launch_data[-1]['rocket']['configuration']['apogee']) #Need to figure out what unit is used for apogee


 #'geo_capacity', 'sso_capacity', 'to_thrust', 'apogee'

Rocket related parameters
---------------
Is the Rocket Reusable :  True
Min. No. Stages :  1
Max No. Stages :  2
Rocket Length (meters):  70.0
Rocket Diameter (meters):  3.65
Rocket Launch Cost :  52000000
Rocket Liftoff Mass (Tons):  549.0
Rocket Payload Mass to LEO (kg):  22800.0
Rocket Payload Mass to GTO (kg):  8300.0
Rocket Payload Mass to GEO (kg):  None
Rocket Payload Mass to SSO (kg):  None
Rocket Liftoff Thrust (kN):  7607.0
Rocket Apogee:  200.0


In [12]:
# Launch related parameters
print("Launch related parameters")
print("---------------")
print("Date of Launch : ", launch_data[-1]['net'])
print("Lauch Status : ", launch_data[-1]['status']['name'])
print("Short Form Status:", launch_data[-1]['status']['abbrev'])
print("Launch pad location name:", launch_data[-1]['pad']['location']['name'])
print("Launch pad name:", launch_data[-1]['pad']['name'])
print("Launch pad country name:", launch_data[-1]['pad']['country']['name'])
print("Launch pad country name:", launch_data[-1]['pad']['country']['id'])
print("Launch pad lattitude:", launch_data[-1]['pad']['latitude'])
print("Launch pad longitude:", launch_data[-1]['pad']['longitude'])

print()

Launch related parameters
---------------
Date of Launch :  2025-11-06T21:13:50Z
Lauch Status :  Launch Successful
Short Form Status: Success
Launch pad location name: Vandenberg SFB, CA, USA
Launch pad name: Space Launch Complex 4E
Launch pad country name: United States of America
Launch pad country name: 2
Launch pad lattitude: 34.632
Launch pad longitude: -120.611

