
# SpaceX Web Scraping

In this notebook, we will scrape additional data that may not be available in the SpaceX API. The scraped data will help enrich our dataset for better analysis and model building.

## Objectives:
- Scrape SpaceX launch table data from a reliable source.
- Parse and structure the data using BeautifulSoup.
- Store the final result as a CSV for further analysis.


## Import Required Libraries

In [1]:

import requests
import pandas as pd
from bs4 import BeautifulSoup



## Perform Web Scraping

We will scrape SpaceX historical launch data from Wikipedia.


In [2]:

url = "https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

# Extract all launch tables
tables = soup.find_all("table", {"class": "wikitable"})
print(f"Found {len(tables)} tables")

# Parse the first few tables as an example
dfs = pd.read_html(str(tables[0:3]))  # Adjust slice to desired number of tables
df_combined = pd.concat(dfs, ignore_index=True)
df_combined.head()


Found 6 tables


  dfs = pd.read_html(str(tables[0:3]))  # Adjust slice to desired number of tables


Unnamed: 0,Flight No.,Date and time (UTC),"Version, booster[h]",Launch site,Payload[i],Payload mass,Orbit,Customer,Launch outcome,Booster landing
0,195,"January 3, 2023 14:56[17]",F9 B5 B1060‑15,"Cape Canaveral, SLC‑40",Transporter-6 (115 payload smallsat rideshare),Unknown[j],SSO,Various,Success,Success (LZ‑1)
1,195,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...,Dedicated SmallSat Rideshare mission to Sun-sy...
2,196,"January 10, 2023 04:50[23]",F9 B5 B1076‑2,"Cape Canaveral, SLC‑40",OneWeb 16 (40 satellites),"6,000 kg (13,000 lb)",Polar LEO,OneWeb,Success,Success (LZ‑1)
3,196,"Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One...","Following the Russian invasion of Ukraine, One..."
4,FH 5,"January 15, 2023 22:56[29]",Falcon Heavy B5 B1070 (core),"Kennedy, LC‑39A",USSF-67 (CBAS-2 & LDPE-3A),"~3,750 kg (8,270 lb)",GEO,USSF,Success,No attempt


## Save Scraped Data to CSV

In [3]:

# Save scraped data
df_combined.to_csv("spacex_wiki_launch_data.csv", index=False)
print("Scraped data saved to 'spacex_wiki_launch_data.csv'")


Scraped data saved to 'spacex_wiki_launch_data.csv'
