<h1>SpaceX  Falcon 9 first stage Landing Prediction</h1>

---


# 2. Data wrangling

In the data set, there are several different cases where the booster did and did not land successfully.

* Landing in Ocean Pad
> * True Ocean - 1
> * False Ocean - 0

* Landing in Ground Pad
> * True RTLS - 1
> * False RTLS - 0

* Landing in Drone Ship
> * True ASDS - 1
> * False ASDS - 0

* Fail in Landing
> * None ASDS - 0
> * None None - 0

Those outcomes are the training labels; with `1` the booster successfully landed and `0` it was unsuccessful.


### Objectives

- Determine Training Labels


----


In [1]:
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

## Data Analysis


In [2]:
# Load previosly cleaned data set.
df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_1.csv")
df.head(3)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2010-06-04,Falcon 9,6104.959412,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
1,2,2012-05-22,Falcon 9,525.0,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
2,3,2013-03-01,Falcon 9,677.0,ISS,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857


In [3]:
# Calculate the percentage of missing values.
nans = df['LandingPad'].isnull().sum() / df['LandingPad'].count() * 100
print( f'LandingPad null percentage: {nans}% \n' )

# Identify which columns are numerical and categorical.
print(df.dtypes)

LandingPad null percentage: 40.625% 

FlightNumber        int64
Date               object
BoosterVersion     object
PayloadMass       float64
Orbit              object
LaunchSite         object
Outcome            object
Flights             int64
GridFins             bool
Reused               bool
Legs                 bool
LandingPad         object
Block             float64
ReusedCount         int64
Serial             object
Longitude         float64
Latitude          float64
dtype: object


### 1. Calculate the number of launches on each site

The data contains several Space X  launch facilities: <a href='https://en.wikipedia.org/wiki/List_of_Cape_Canaveral_and_Merritt_Island_launch_sites?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork865-2023-01-01'>Cape Canaveral Space Launch Complex 40</a>, Vandenberg Air Force Base Space Launch Complex 4E <b>(VAFB SLC-4E)</b>, Kennedy Space Center Launch Complex 39A <b>(KSC LC 39A)</b>.

The location of each Launch Is placed in the column <code>LaunchSite</code>.


In [4]:
df['LaunchSite'].value_counts()

CCAFS SLC 40    55
KSC LC 39A      22
VAFB SLC 4E     13
Name: LaunchSite, dtype: int64

Each launch aims to an dedicated orbit.


<b>LEO</b>
, <b>VLEO</b>
, <b>GTO</b> , <b>SSO (or SO)</b>, <b>ES-L1 </b>, <b>HEO</b> , <b> ISS </b>
, <b> MEO </b>
, <b> HEO </b>
, <b> GEO </b>
, <b> PO </b>

![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/Orbits.png)


### 2. Calculate the number and occurrence of each orbit


In [5]:
df['Orbit'].value_counts()

GTO      27
ISS      21
VLEO     14
PO        9
LEO       7
SSO       5
MEO       3
ES-L1     1
HEO       1
SO        1
GEO       1
Name: Orbit, dtype: int64

### 3. Calculate the number and occurence of mission outcome


In [6]:
outcomes = df['Outcome'].value_counts()
keys = outcomes.keys()

for i, key in enumerate(keys):
 print(i,key, outcomes[key])

# Set of outcomes where the second stage did land successfully
good_outcomes = set( str(x) for x in keys if 'True' in x )
print( f'\nGood Outcomes: {good_outcomes}' )

0 True ASDS 41
1 None None 19
2 True RTLS 14
3 False ASDS 6
4 True Ocean 5
5 False Ocean 2
6 None ASDS 2
7 False RTLS 1

Good Outcomes: {'True ASDS', 'True RTLS', 'True Ocean'}


### 4. Create a landing outcome label


In [7]:
lst = [x for x in df['Outcome'] ]
print(lst)

['None None', 'None None', 'None None', 'False Ocean', 'None None', 'None None', 'True Ocean', 'True Ocean', 'None None', 'None None', 'False Ocean', 'False ASDS', 'True Ocean', 'False ASDS', 'None None', 'None ASDS', 'True RTLS', 'False ASDS', 'False ASDS', 'True ASDS', 'True ASDS', 'True ASDS', 'True RTLS', 'True ASDS', 'None ASDS', 'True ASDS', 'True RTLS', 'None None', 'True ASDS', 'True RTLS', 'None None', 'True RTLS', 'True ASDS', 'True ASDS', 'None None', 'True RTLS', 'True ASDS', 'True RTLS', 'True ASDS', 'True ASDS', 'True ASDS', 'True RTLS', 'True Ocean', 'True RTLS', 'True Ocean', 'None None', 'None None', 'None None', 'True ASDS', 'True ASDS', 'None None', 'None None', 'True ASDS', 'True ASDS', 'True ASDS', 'True ASDS', 'True RTLS', 'True ASDS', 'True ASDS', 'False RTLS', 'None None', 'True ASDS', 'True ASDS', 'True ASDS', 'True ASDS', 'True RTLS', 'True RTLS', 'None None', 'True ASDS', 'True ASDS', 'True ASDS', 'True ASDS', 'None None', 'True ASDS', 'False ASDS', 'True RTL

In [8]:
# landing_class = 1 if good_outcome  |  landing_class = 0 otherwise

# df['Class']= np.where( df['Outcome'].str in good_outcomes, 1, 0 )

df['Class']= df['Outcome'].apply(lambda x: 1 if x in good_outcomes else 0)
df[['Outcome', 'Class']].head(10)

Unnamed: 0,Outcome,Class
0,None None,0
1,None None,0
2,None None,0
3,False Ocean,0
4,None None,0
5,None None,0
6,True Ocean,1
7,True Ocean,1
8,None None,0
9,None None,0


Success rate:


In [9]:
print( f'{round( df["Class"].mean() * 100, 3 )} %' )

66.667 %


Export DataFrame to a <b>CSV</b>


In [10]:
df.to_csv("df_2.csv", index=False)