# **SpaceX  Falcon 9 First Stage Landing Prediction**


## Exploring and Preparing Data


## Objectives

Perform exploratory Data Analysis and Feature Engineering using `Pandas` and `Matplotlib`

*   Exploratory Data Analysis
*   Preparing Data  Feature Engineering


### Importing Libraries and Define Auxiliary Functions


We will import the following libraries the lab


In [1]:
import piplite
await piplite.install(['numpy'])
await piplite.install(['pandas'])
await piplite.install(['seaborn'])

In [None]:
# pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
#NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np
# Matplotlib is a plotting library for python and pyplot gives us a MatLab like plotting framework. We will use this in our plotter function to plot data.
import matplotlib.pyplot as plt
#Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics
import seaborn as sns

## Exploratory Data Analysis


First, let's read the SpaceX dataset into a Pandas dataframe and print its summary


In [None]:
from js import fetch
import io

URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_2.csv"
resp = await fetch(URL)
dataset_part_2_csv = io.BytesIO((await resp.arrayBuffer()).to_py())
df=pd.read_csv(dataset_part_2_csv)
df.head(5)

In [None]:
df.tail(5)

First, let's try to see how the `FlightNumber` (indicating the continuous launch attempts.) and `Payload` variables would affect the launch outcome.

We can plot out the <code>FlightNumber</code> vs. <code>PayloadMass</code>and overlay the outcome of the launch. 'Class=1' means Successful and 'Class=0' means Failure


In [None]:
sns.catplot(y="PayloadMass", x="FlightNumber", hue="Class", data=df, aspect = 5)
plt.xlabel("Flight Number",fontsize=20)
plt.ylabel("Pay load Mass (kg)",fontsize=20)
plt.show()

We see that as the flight number increases, the first stage is more likely to land successfully. The payload mass also appears to be a factor; even with more massive payloads, the first stage often returns successfully.

Next, let's drill down to each site visualize its detailed launch records.


### Visualizing the relationship between Flight Number and Launch Site


We have use the function <code>catplot</code> to plot <code>FlightNumber</code> vs <code>LaunchSite</code>, set the <code>x</code>  parameter to <code>FlightNumber</code>,set the  <code>y</code> to <code>Launch Site</code> and set the parameter <code>hue</code> to <code>'class'</code>


In [None]:
# Plot a scatter point chart with x axis to be Flight Number and y axis to be the launch site, and hue to be the class value
sns.catplot(
    x="FlightNumber",       # X-axis: Flight Number
    y="LaunchSite",         # Y-axis: Launch Site
    hue="Class",            # Color (hue) based on class (Success/Failure)
    data=df,                # Your DataFrame
    kind="strip",           # or "swarm", "point", "box", etc.
    height=6,
    aspect=2
)

plt.title("Flight Number vs Launch Site grouped by Class")
plt.show()


As flight numbers increase, the success rate (Class 1) improves, especially at **CCAFS SLC 40**, indicating growing reliability over time. **KSC LC 39A** shows high success even with fewer launches, while **VAFB SLC 4E** has limited but mostly successful launches. 


### Visualising the relationship between Payload Mass and Launch Site


We also want to observe if there is any relationship between launch sites and their payload mass.


In [None]:
# Plot a scatter point chart with x axis to be Pay Load Mass (kg) and y axis to be the launch site, and hue to be the class value
plt.figure(figsize=(10,6))
sns.scatterplot(
    x="PayloadMass",      # X-axis: Payload Mass
    y="LaunchSite",             # Y-axis: Launch Site
    hue="Class",                # Color by success/failure class
    data=df
)

plt.title("Payload Mass vs Launch Site")
plt.xlabel("Payload Mass (kg)")
plt.ylabel("Launch Site")
plt.show()

Now if you observe Payload Mass Vs. Launch Site scatter point chart you will find for the VAFB-SLC  launchsite there are no  rockets  launched for  heavypayload mass(greater than 10000).


### Visualising the relationship between success rate of each orbit type


Next, we want to visually check if there are any relationship between success rate and orbit type.


Let's create a `bar chart` for the sucess rate of each orbit


In [None]:
# HINT use groupby method on Orbit column and get the mean of Class column
orbit_success = df.groupby('Orbit')['Class'].mean().sort_values(ascending=False).reset_index()
plt.figure(figsize=(12, 6))
sns.barplot(x='Orbit', y='Class', data=orbit_success, palette='viridis')

plt.title('Success Rate by Orbit Type')
plt.xlabel('Orbit Type')
plt.ylabel('Success Rate')
plt.xticks(rotation=45)
plt.ylim(0, 1)  # Since success rate is between 0 and 1
plt.grid(True, linestyle='--', alpha=0.6)

plt.show()

We can interpret that 'Orbit = ES-L1, GEO, HEO, SSO' has 100% Success Rate.


### Visualising the relationship between FlightNumber and Orbit type


For each orbit, we want to see if there is any relationship between FlightNumber and Orbit type.


In [None]:
# Plot a scatter point chart with x axis to be FlightNumber and y axis to be the Orbit, and hue to be the class value
plt.figure(figsize=(12, 6))
sns.scatterplot(
    x='FlightNumber',        # X-axis: Flight Number
    y='Orbit',               # Y-axis: Orbit type
    hue='Class',             # Hue: success (1) or failure (0)
    data=df,
)

plt.title('Flight Number vs Orbit Type (Colored by Class)')
plt.xlabel('Flight Number')
plt.ylabel('Orbit Type')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Launch Success (class)')
plt.show()


You can observe that in the LEO orbit, success seems to be related to the number of flights. Conversely, in the GTO orbit, there appears to be no relationship between flight number and success.


### Visualising the relationship between Payload Mass and Orbit type


Similarly, we can plot the Payload Mass vs. Orbit scatter point charts to reveal the relationship between Payload Mass and Orbit type


In [None]:
# Plot a scatter point chart with x axis to be Payload Mass and y axis to be the Orbit, and hue to be the class value
plt.figure(figsize=(12, 6))
sns.scatterplot(
    x='PayloadMass',     # X-axis: Payload Mass (in kg)
    y='Orbit',                 # Y-axis: Orbit type
    hue='Class',               # Color by success/failure
    data=df,
)

plt.title('Payload Mass vs Orbit Type (Colored by Launch Class)')
plt.xlabel('Payload Mass (kg)')
plt.ylabel('Orbit Type')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Launch Success (class)')
plt.show()


With heavy payloads the successful landing or positive landing rate are more for Polar,LEO and ISS.

However, for GTO, it's difficult to distinguish between successful and unsuccessful landings as both outcomes are present.


### Visualising the launch success yearly trend


We plot a line chart with x axis to be <code>Year</code> and y axis to be average success rate, to get the average launch success trend.


The function will help you get the year from the date:


In [None]:
# A function to Extract years from the date 
year=[]
def Extract_year():
    for i in df["Date"]:
        year.append(i.split("-")[0])
    return year
Extract_year()
df['Date'] = year
df.head()
    

In [None]:
# Plot a line chart with x axis to be the extracted year and y axis to be the success rate
yearly_success = df.groupby('Date')['Class'].mean().reset_index()
plt.figure(figsize=(10, 5))
sns.lineplot(x='Date', y='Class', data=yearly_success, marker='o')

plt.title('Yearly Launch Success Rate')
plt.xlabel('Year')
plt.ylabel('Average Success Rate')
plt.ylim(0, 1)
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()


you can observe that the sucess rate since 2013 kept increasing till 2020


## Features Engineering


We will select the features that will be used in success prediction in the future module.


In [None]:
features = df[['FlightNumber', 'PayloadMass', 'Orbit', 'LaunchSite', 'Flights', 'GridFins', 'Reused', 'Legs', 'LandingPad', 'Block', 'ReusedCount', 'Serial']]
features.head()

### Creating dummy variables to categorical columns


Use the function <code>get_dummies</code> and <code>features</code> dataframe to apply OneHotEncoder to the column <code>Orbits</code>, <code>LaunchSite</code>, <code>LandingPad</code>, and <code>Serial</code>. Assigning the value to the variable <code>features_one_hot</code>, we display the results using the method head. Our result dataframe must include all features including the encoded ones.


In [None]:
#'features' is our original dataframe and it's already defined
features_one_hot = pd.get_dummies(features, columns=['Orbit', 'LaunchSite', 'LandingPad', 'Serial'])

# Display the first few rows of the one-hot encoded dataframe
features_one_hot.head()

### Casting all numeric columns to `float64`


Now that our <code>features_one_hot</code> dataframe only contains numbers, cast the entire dataframe to variable type <code>float64</code>


In [None]:
# Cast the entire DataFrame to float64
features_one_hot = features_one_hot.astype('float64')

features_one_hot


In [None]:
# Export to CSV
features_one_hot.to_csv('dataset_part_3.csv', index=False)
