# IBM Data Science Capstone Project

## Lab 4: Exploring and Preparing Data
## Project Title: SpaceX Falcon 9 First Stage Landing Prediction
## Assignment: Exploring and Preparing Data

### Objective
#### In this lab, we aim to explore and prepare data to predict whether the Falcon 9 first stage will land successfully. SpaceX advertises its Falcon 9 launches at a cost of $62 million, while other providers often charge over $165 million. Much of SpaceX's cost savings is due to reusability of the first stage boosters.</br>

You will perform:<br/>

Exploratory Data Analysis (EDA)<br/>

Feature Engineering

## Tasks Overview

### Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


### Optionally (for online environments like JupyterLite):

In [None]:
import piplite
await piplite.install(['numpy', 'pandas', 'seaborn'])


### Load Dataset

In [None]:
from js import fetch
import io

URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_2.csv"
resp = await fetch(URL)
dataset_part_2_csv = io.BytesIO((await resp.arrayBuffer()).to_py())

df = pd.read_csv(dataset_part_2_csv)
df.head()


## 📊 Exploratory Data Analysis
### 🔸 Task 1: Flight Number vs. Launch Site

In [None]:
sns.catplot(y="LaunchSite", x="FlightNumber", hue="Class", data=df, aspect=5)
plt.xlabel("Flight Number", fontsize=20)
plt.ylabel("Launch Site", fontsize=20)
plt.show()


### 🔸 Task 2: Payload Mass vs. Launch Site

In [None]:
sns.catplot(y="LaunchSite", x="PayloadMass", hue="Class", data=df, aspect=5)
plt.xlabel("Payload Mass (kg)", fontsize=20)
plt.ylabel("Launch Site", fontsize=20)
plt.show()


### 🔸 Task 3: Orbit vs. Success Rate

In [None]:
df.groupby('Orbit')['Class'].mean().plot(kind='bar')
plt.ylabel("Success Rate")
plt.title("Success Rate per Orbit")
plt.show()


### 🔸 Task 4: Flight Number vs. Orbit

In [None]:
sns.catplot(y="Orbit", x="FlightNumber", hue="Class", data=df, aspect=5)
plt.xlabel("Flight Number", fontsize=20)
plt.ylabel("Orbit", fontsize=20)
plt.show()


### 🔸 Task 5: Payload Mass vs. Orbit

In [None]:
sns.catplot(y="Orbit", x="PayloadMass", hue="Class", data=df, aspect=5)
plt.xlabel("Payload Mass (kg)", fontsize=20)
plt.ylabel("Orbit", fontsize=20)
plt.show()


### 🔸 Task 6: Success Rate by Year

In [None]:
def Extract_year():
    return [i.split("-")[0] for i in df["Date"]]

df['Year'] = Extract_year()
df_grouped = df.groupby('Year')['Class'].mean()

df_grouped.plot(kind='line', marker='o')
plt.xlabel("Year")
plt.ylabel("Success Rate")
plt.title("Launch Success Rate by Year")
plt.show()


#### ⚙️ Feature Engineering

In [None]:
features = df[['FlightNumber', 'PayloadMass', 'Orbit', 'LaunchSite', 'Flights', 
               'GridFins', 'Reused', 'Legs', 'LandingPad', 'Block', 'ReusedCount', 'Serial']]


### 🔸 Task 7: Create Dummy Variables

In [None]:
features_one_hot = pd.get_dummies(features, columns=['Orbit', 'LaunchSite', 'LandingPad', 'Serial'])
features_one_hot.head()


### 🔸 Task 8: Cast All Numeric Columns to Float64

In [None]:
features_one_hot = features_one_hot.astype('float64')


#### 💾 Save Final Dataset

In [None]:
features_one_hot.to_csv('dataset_part_3.csv', index=False)


### ✅ Summary
Performed detailed EDA using Seaborn and Matplotlib.

Visualized relationships across payload, flight number, orbit, and launch site.

Engineered features for machine learning models.

Exported the processed dataset for future prediction tasks.

#### Complete the Lab 4: Exploring and Preparing Data<br/>
#### Md. Anwar Hossain