# Step 1: Data Collection
In this notebook, I will document and test the data collection process. The primary collection is done via executable scripts in the src/ folder, but I can run them here to see the output.

Key Tasks:

Fetch historical launch data from the SpaceX REST API.
Scrape a table of launch data from Wikipedia.

### 1.1: Setup and Imports

In [1]:
import sys
import os
import pandas as pd

# Add the project root to the Python path to allow importing from `src`
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

# Import my custom functions
from src.fetch_api import fetch_spacex_launch_data, save_data_as_json
from src.scrape_wiki import scrape_launch_data, save_data_as_csv

### 1.2: Fetch Data from SpaceX API

In [2]:
# Define constants for file paths
RAW_DATA_PATH = '../data/raw'
API_FILENAME = 'spacex_api_data.json'

# Fetch the launch data
spacex_df = fetch_spacex_launch_data()

# Save the raw data
save_data_as_json(spacex_df, RAW_DATA_PATH, API_FILENAME)

Fetching data from https://api.spacexdata.com/v4/launches...
Data fetched successfully.
Data saved to ../data/raw\spacex_api_data.json


In [3]:
# Display the first few rows of the API data
if spacex_df is not None:
    print("--- SpaceX API Data ---")
    print(f"Shape of the DataFrame: {spacex_df.shape}")
    display(spacex_df.head())

--- SpaceX API Data ---
Shape of the DataFrame: (205, 43)


Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,...,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'mer...",Engine failure at 33 seconds and loss of vehicle,[],[],...,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-fa...,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 301, 'altitude': 289, 'reason': 'har...",Successful first stage burn and transition to ...,[],[],...,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-roc...,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 140, 'altitude': 35, 'reason': 'resi...",Residual stage 1 thrust led to collision betwe...,[],[],...,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1...,https://en.wikipedia.org/wiki/Trailblazer_(sat...,
3,2008-09-20T00:00:00.000Z,1221869000.0,False,0.0,5e9d0d95eda69955f709d1eb,True,[],Ratsat was carried to orbit on the first succe...,[],[],...,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,0.0,5e9d0d95eda69955f709d1eb,True,[],,[],[],...,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs...,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1...,https://en.wikipedia.org/wiki/RazakSAT,


In [4]:
# Define file path for the scraped data
WIKI_FILENAME = 'wiki_falcon9_table.csv'

# Scrape the launch data
wiki_df = scrape_launch_data()

# Save the raw scraped data
save_data_as_csv(wiki_df, RAW_DATA_PATH, WIKI_FILENAME)

Scraping data from https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches...
Error scraping data: name 'url' is not defined
No data to save.


In [5]:
# Display the first few rows of the scraped Wikipedia data
if wiki_df is not None:
    print("--- Wikipedia Scraped Data ---")
    print(f"Shape of the DataFrame: {wiki_df.shape}")
    display(wiki_df.head())