
# SpaceX Data Wrangling

This notebook processes and prepares SpaceX data for analysis and modeling.

## Objectives:
- Load data from previous steps (API and Web Scraping).
- Clean and standardize columns.
- Handle missing values and data types.
- Merge datasets for unified analysis.


## Import Required Libraries

In [12]:

import pandas as pd


## Load Datasets from CSV

In [13]:

df_api = pd.read_csv("spacex_launch_data.csv")
df_scraped = pd.read_csv("spacex_wiki_launch_data.csv")

print("API data shape:", df_api.shape)
print("Scraped data shape:", df_scraped.shape)


API data shape: (205, 43)
Scraped data shape: (642, 10)


## Data Cleaning and Processing

In [14]:
# Clean API data
df_api_clean = df_api.copy()
df_api_clean = df_api_clean.dropna(axis=1, how='all')                         # Drop all-NaN columns
df_api_clean = df_api_clean.loc[:, ~df_api_clean.columns.duplicated()]       # Drop duplicate columns

# Clean Scraped data
df_scraped_clean = df_scraped.copy()
df_scraped_clean.columns = [c.lower().strip().replace(" ", "_") for c in df_scraped_clean.columns]
df_scraped_clean = df_scraped_clean.dropna(how="all")                        # Drop all-NaN rows
df_scraped_clean = df_scraped_clean.loc[:, ~df_scraped_clean.columns.duplicated()]  # Drop duplicate columns

# Optionally inspect both cleaned datasets
print("API Cleaned Columns:", df_api_clean.columns.tolist())
print("Scraped Cleaned Columns:", df_scraped_clean.columns.tolist())


API Cleaned Columns: ['static_fire_date_utc', 'static_fire_date_unix', 'net', 'window', 'rocket', 'success', 'failures', 'details', 'crew', 'ships', 'capsules', 'payloads', 'launchpad', 'flight_number', 'name', 'date_utc', 'date_unix', 'date_local', 'date_precision', 'upcoming', 'cores', 'auto_update', 'tbd', 'launch_library_id', 'id', 'fairings.reused', 'fairings.recovery_attempt', 'fairings.recovered', 'fairings.ships', 'links.patch.small', 'links.patch.large', 'links.reddit.campaign', 'links.reddit.launch', 'links.reddit.media', 'links.reddit.recovery', 'links.flickr.small', 'links.flickr.original', 'links.presskit', 'links.webcast', 'links.youtube_id', 'links.article', 'links.wikipedia']
Scraped Cleaned Columns: ['flight_no.', 'date_and_time_(utc)', 'version,_booster[h]', 'launch_site', 'payload[i]', 'payload_mass', 'orbit', 'customer', 'launch_outcome', 'booster_landing']


## Save Final Dataset

In [15]:
# Save only cleaned SCRAPED data
df_scraped_clean.to_csv("spacex_clean_data.csv", index=False)
print("Final cleaned scraped data saved to 'spacex_clean_data.csv'")

# Preview
df_scraped_clean.head()

Final cleaned scraped data saved to 'spacex_clean_data.csv'


Unnamed: 0,flight_no.,date_and_time_(utc),"version,_booster[h]",launch_site,payload[i],payload_mass,orbit,customer,launch_outcome,booster_landing
0,195,"January 3, 2023 14:56[17]",F9 B5 B1060‑15,"Cape Canaveral, SLC‑40",Transporter-6 (115 payload smallsat rideshare),Unknown[j],SSO,Various,Success,Success (LZ‑1)
1,195,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...
2,196,"January 10, 2023 04:50[23]",F9 B5 B1076‑2,"Cape Canaveral, SLC‑40",OneWeb 16 (40 satellites),"6,000 kg (13,000 lb)",Polar LEO,OneWeb,Success,Success (LZ‑1)
3,196,"Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One..."
4,FH 5,"January 15, 2023 22:56[29]",Falcon Heavy B5 B1070 (core),"Kennedy, LC‑39A",USSF-67 (CBAS-2 & LDPE-3A),"~3,750 kg (8,270 lb)",GEO,USSF,Success,No attempt
