# SpaceX Launch Data Collection (API)

## Objective
Collect historical SpaceX Falcon 9 launch data from the public SpaceX REST API.
This dataset will serve as the foundation for exploratory data analysis,
interactive visualization, and machine learning models aimed at predicting
first-stage landing success.

## Why this matters
SpaceX reduces launch costs by reusing rocket boosters.
Understanding which factors influence successful landings is critical
for cost estimation, operational planning, and competitive analysis
in the aerospace industry.


## Data Source

- **API**: https://api.spacexdata.com/v4/launches
- **Provider**: SpaceX
- **Access**: Public (no authentication required)
- **Format**: JSON

The API returns nested JSON objects that must be normalized
before performing analysis with pandas or SQL.


In [2]:
import requests
import pandas as pd
from pathlib import Path

In [3]:
SPACEX_API_URL = "https://api.spacexdata.com/v4/launches"

response = requests.get(SPACEX_API_URL, timeout=30)
response.raise_for_status()  # fail fast if request fails

launches = response.json()

print(f"Number of launch records retrieved: {len(launches)}")


Number of launch records retrieved: 205


## Data Normalization

The API response contains deeply nested JSON structures.
To enable efficient analysis, the data must be flattened
into a tabular format using pandas.


In [4]:
df_raw = pd.json_normalize(launches)
df_raw.shape

(205, 43)

In [5]:
df_raw.head(3)

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,...,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'mer...",Engine failure at 33 seconds and loss of vehicle,[],[],...,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-fa...,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 301, 'altitude': 289, 'reason': 'har...",Successful first stage burn and transition to ...,[],[],...,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-roc...,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 140, 'altitude': 35, 'reason': 'resi...",Residual stage 1 thrust led to collision betwe...,[],[],...,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1...,https://en.wikipedia.org/wiki/Trailblazer_(sat...,


In [6]:
df_raw.columns.tolist()[:15]


['static_fire_date_utc',
 'static_fire_date_unix',
 'net',
 'window',
 'rocket',
 'success',
 'failures',
 'details',
 'crew',
 'ships',
 'capsules',
 'payloads',
 'launchpad',
 'flight_number',
 'name']

In [7]:
df_raw.isna().mean().sort_values(ascending=False).head(10)


fairings                     1.000000
launch_library_id            0.648780
fairings.recovered           0.585366
links.reddit.media           0.570732
links.presskit               0.556098
fairings.reused              0.546341
links.reddit.recovery        0.536585
fairings.recovery_attempt    0.478049
window                       0.429268
static_fire_date_unix        0.409756
dtype: float64

At this stage, the dataset contains raw launch-level information,
including rocket configuration, payloads, launch sites, and landing outcomes.

Missing values and nested fields are expected and will be addressed
during data wrangling and feature engineering in subsequent notebooks.


In [8]:
columns_of_interest = [
    "flight_number",
    "name",
    "date_utc",
    "success",
    "rocket",
    "launchpad",
    "payloads",
    "cores"
]

df_raw[columns_of_interest].head()


Unnamed: 0,flight_number,name,date_utc,success,rocket,launchpad,payloads,cores
0,1,FalconSat,2006-03-24T22:30:00.000Z,False,5e9d0d95eda69955f709d1eb,5e9e4502f5090995de566f86,[5eb0e4b5b6c3bb0006eeb1e1],"[{'core': '5e9e289df35918033d3b2623', 'flight'..."
1,2,DemoSat,2007-03-21T01:10:00.000Z,False,5e9d0d95eda69955f709d1eb,5e9e4502f5090995de566f86,[5eb0e4b6b6c3bb0006eeb1e2],"[{'core': '5e9e289ef35918416a3b2624', 'flight'..."
2,3,Trailblazer,2008-08-03T03:34:00.000Z,False,5e9d0d95eda69955f709d1eb,5e9e4502f5090995de566f86,"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006e...","[{'core': '5e9e289ef3591814873b2625', 'flight'..."
3,4,RatSat,2008-09-28T23:15:00.000Z,True,5e9d0d95eda69955f709d1eb,5e9e4502f5090995de566f86,[5eb0e4b7b6c3bb0006eeb1e5],"[{'core': '5e9e289ef3591855dc3b2626', 'flight'..."
4,5,RazakSat,2009-07-13T03:35:00.000Z,True,5e9d0d95eda69955f709d1eb,5e9e4502f5090995de566f86,[5eb0e4b7b6c3bb0006eeb1e6],"[{'core': '5e9e289ef359184f103b2627', 'flight'..."


## Persisting Raw Data

The raw dataset is saved to disk to ensure reproducibility
and to decouple data collection from downstream processing steps.


In [9]:
output_dir = Path("../data/raw")
output_dir.mkdir(parents=True, exist_ok=True)

output_path = output_dir / "spacex_launches_raw.csv"
df_raw.to_csv(output_path, index=False)

print(f"Raw data saved to: {output_path.resolve()}")


Raw data saved to: /Users/razs/Desktop/RAZS/spacex-falcon9-landing-prediction/data/raw/spacex_launches_raw.csv


## Next Steps

In the next stage of the pipeline, this dataset will be enriched
with additional information obtained via web scraping from Wikipedia,
including payload mass, orbit type, and mission outcome details.

This enrichment step will enable deeper exploratory analysis
and the development of predictive models.
