# **SpaceX  Falcon 9 first stage Landing Prediction**

In this segement, I will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if I can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. I will collect and make sure the data is in the correct format from an API.

### Web scraping Falcon 9 and Falcon Heavy Launches Records from Wikipedia

I will use web scraping to collect Falcon 9 historical launch records from a Wikipedia page titled List of Falcon 9 and Falcon Heavy launches

<b> https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches </b>

![](https://raw.githubusercontent.com/ANKITKUMAR-10/SpaceX-Falcon9-Landing-Prediction/main/landing_1.gif)

<b>Several examples of an unsuccessful landing are shown here:</b>

![Falcon 9 Crash](https://raw.githubusercontent.com/ANKITKUMAR-10/SpaceX-Falcon9-Landing-Prediction/main/crash.gif)

<b>Most unsuccessful landings are planned. Space X performs a controlled landing in the oceans.</b>

  ## Objectives
Web scrap Falcon 9 launch records with `BeautifulSoup`: 
- Extract a Falcon 9 launch records HTML table from Wikipedia
- Parse the table and convert it into a Pandas data frame

In [64]:
pip install beautifulsoup4 requests

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [65]:
import sys
import requests
from bs4 import BeautifulSoup
import re
import unicodedata
import pandas as pd

In [90]:
# ---------- Helper functions (as provided by you) ----------
def date_time(table_cells):
    return [data_time.strip() for data_time in list(table_cells.strings)][0:2]

def booster_version(table_cells):
    out=''.join([booster_version for i,booster_version in enumerate(table_cells.strings) if i%2==0][0:-1])
    return out

def landing_status(table_cells):
    out=[i for i in table_cells.strings][0]
    return out

def get_mass(table_cells):
    mass=unicodedata.normalize("NFKD", table_cells.text).strip()
    if mass:
        mass.find("kg")
        new_mass=mass[0:mass.find("kg")+2]
    else:
        new_mass=0
    return new_mass

def extract_column_from_header(row):
    if (row.br):
        row.br.extract()
    if row.a:
        row.a.extract()
    if row.sup:
        row.sup.extract()
    colunm_name = ' '.join(row.contents)
    if not(colunm_name.strip().isdigit()):
        colunm_name = colunm_name.strip()
        return colunm_name

To keep the tasks consistent, Scrape the data from a snapshot of the  `List of Falcon 9 and Falcon Heavy launches` Wikipage updated on
`9th June 2021`

In [91]:
# ---------- Step 1: Request page ----------
static_url = "https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches&oldid=1027686922"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/91.0.4472.124 Safari/537.36"
}

In [92]:
response = requests.get(static_url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')

In [93]:
soup.title

<title>List of Falcon 9 and Falcon Heavy launches - Wikipedia</title>

In [94]:
# ---------- Step 2: Find tables ----------
html_tables = soup.find_all('table')

In [95]:
# Assume the first wikitable is the main Falcon 9 launch table
first_launch_table = html_tables[0]

In [96]:
# ---------- Step 3: Extract column names ----------
column_names = []
for th in first_launch_table.find_all('th'):
    name = extract_column_from_header(th)
    if name is not None and len(name) > 0:
        column_names.append(name)

In [98]:
# ---------- Step 4: Initialize launch_dict ----------
launch_dict = dict.fromkeys(column_names, [])

# Clean/initialize expected keys
expected_keys = ['Flight No.', 'Date', 'Time', 'Launch site', 'Payload', 'Payload mass', 'Orbit',
                 'Customer', 'Launch outcome', 'Version Booster', 'Booster landing']

# Initialize each key with empty list
launch_dict = {key: [] for key in expected_keys}

In [99]:
# ---------- Step 5: Parse each row ----------
for row in first_launch_table.find_all('tr')[1:]:  # skip header row
    cols = row.find_all('td')
    
    if len(cols) < 8:  # Skip rows with not enough columns
        continue
    
    # Flight No.
    launch_dict['Flight No.'].append(cols[0].text.strip())
    
    # Date and Time
    try:
        date, time = date_time(cols[1])
    except:
        date, time = None, None
    launch_dict['Date'].append(date)
    launch_dict['Time'].append(time)
    
    # Launch site
    launch_dict['Launch site'].append(cols[2].text.strip())
    
    # Payload
    launch_dict['Payload'].append(cols[3].text.strip())
    
    # Payload mass
    launch_dict['Payload mass'].append(get_mass(cols[4]))
    
    # Orbit
    launch_dict['Orbit'].append(cols[5].text.strip())
    
    # Customer
    launch_dict['Customer'].append(cols[6].text.strip())
    
    # Launch outcome
    launch_dict['Launch outcome'].append(cols[7].text.strip())
    
    # Version Booster
    try:
        launch_dict['Version Booster'].append(booster_version(cols[8]))
    except:
        launch_dict['Version Booster'].append(None)
    
    # Booster landing
    try:
        launch_dict['Booster landing'].append(landing_status(cols[9]))
    except:
        launch_dict['Booster landing'].append(None)

In [103]:
# ---------- Step 6: Convert to DataFrame ----------
df = pd.DataFrame(launch_dict)

In [104]:
df.to_csv('spacex_web_scraped.csv', index=False)