We'll be performing Exploratory Data Analysis (EDA) on a dataset related to rocket booster landings. Our goal is to create a binary label for supervised learning models, where:

1 represents a successful landing.
0 represents an unsuccessful landing.
We'll be working with the following landing outcome categories:

True Ocean (successful ocean landing)
False Ocean (unsuccessful ocean landing)
True RTLS (successful ground pad landing)
False RTLS (unsuccessful ground pad landing)
True ASDS (successful drone ship landing)
False ASDS (unsuccessful drone ship landing)
Essentially, we'll need to map these string values to numerical labels (1 or 0).

In [1]:
# Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
#NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np

  from pandas.core import (


## DATA ANALYSIS

In [2]:
#Load SpaceX dataset
df=pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_1.csv")
df.head(10)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2010-06-04,Falcon 9,6104.959412,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
1,2,2012-05-22,Falcon 9,525.0,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
2,3,2013-03-01,Falcon 9,677.0,ISS,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857
5,6,2014-01-06,Falcon 9,3325.0,GTO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B1005,-80.577366,28.561857
6,7,2014-04-18,Falcon 9,2296.0,ISS,CCAFS SLC 40,True Ocean,1,False,False,True,,1.0,0,B1006,-80.577366,28.561857
7,8,2014-07-14,Falcon 9,1316.0,LEO,CCAFS SLC 40,True Ocean,1,False,False,True,,1.0,0,B1007,-80.577366,28.561857
8,9,2014-08-05,Falcon 9,4535.0,GTO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B1008,-80.577366,28.561857
9,10,2014-09-07,Falcon 9,4428.0,GTO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B1011,-80.577366,28.561857


In [3]:
#identify the missing values on  each atribute
df.isnull().sum()/len(df)*100


FlightNumber       0.000000
Date               0.000000
BoosterVersion     0.000000
PayloadMass        0.000000
Orbit              0.000000
LaunchSite         0.000000
Outcome            0.000000
Flights            0.000000
GridFins           0.000000
Reused             0.000000
Legs               0.000000
LandingPad        28.888889
Block              0.000000
ReusedCount        0.000000
Serial             0.000000
Longitude          0.000000
Latitude           0.000000
dtype: float64

In [4]:
#identify which columns are numerical
df.dtypes

FlightNumber        int64
Date               object
BoosterVersion     object
PayloadMass       float64
Orbit              object
LaunchSite         object
Outcome            object
Flights             int64
GridFins             bool
Reused               bool
Legs                 bool
LandingPad         object
Block             float64
ReusedCount         int64
Serial             object
Longitude         float64
Latitude          float64
dtype: object

### Calculate the number of launches on each site

The data contains several Space X launch facilities: Cape Canaveral Space Launch Complex 40 VAFB SLC 4E , Vandenberg Air Force Base Space Launch Complex 4E (SLC-4E), Kennedy Space Center Launch Complex 39A KSC LC 39A .
    The location of each Launch Is placed in the column LaunchSite

In [5]:
# Identify numerical columns
print("\nData types of each column:")
print(df.dtypes)

# Apply value_counts() on column LaunchSite
print("\nValue counts for LaunchSite column:")
print(df['LaunchSite'].value_counts())


Data types of each column:
FlightNumber        int64
Date               object
BoosterVersion     object
PayloadMass       float64
Orbit              object
LaunchSite         object
Outcome            object
Flights             int64
GridFins             bool
Reused               bool
Legs                 bool
LandingPad         object
Block             float64
ReusedCount         int64
Serial             object
Longitude         float64
Latitude          float64
dtype: object

Value counts for LaunchSite column:
LaunchSite
CCAFS SLC 40    55
KSC LC 39A      22
VAFB SLC 4E     13
Name: count, dtype: int64


# Orbit Types and Descriptions:

### LEO (Low Earth Orbit)
* Altitude: ≤ 2,000 km
* Characteristics: High orbital speed, shorter orbital period.
* Reference: [1]
* Bookmark: LEO - Low Earth Orbit

### VLEO (Very Low Earth Orbit)
* Altitude: < 450 km
* Characteristics: Closer to Earth for better observation.
* Reference: [2]
* Bookmark: VLEO - Very Low Earth Orbit

### GTO (Geosynchronous Transfer Orbit)
* Altitude: 35,786 km (to achieve geosynchronous)
* Characteristics: Matches Earth's rotation.
* Reference: [3]
* Bookmark: GTO - Geosynchronous Transfer Orbit

### SSO (Sun-Synchronous Orbit)
* Characteristics: Nearly polar, constant local solar time.
* Reference: [4]
* Bookmark: SSO - Sun-Synchronous Orbit

### ES-L1 (Earth-Sun Lagrange Point 1)
* Characteristics: Equilibrium point between Earth and Sun.
* Reference: [5]
* Bookmark: ES-L1 - Earth-Sun Lagrange Point 1

### HEO (Highly Elliptical Orbit)
* Characteristics: High eccentricity.
* Reference: [6]
* Bookmark: HEO - Highly Elliptical Orbit

### ISS (International Space Station)
* Orbit: LEO
* Characteristics: Multinational space station.
* Reference: [7]
* Bookmark: ISS - International Space Station

### MEO (Medium Earth Orbit)
* Altitude: 2,000 km to 35,786 km
* Characteristics: Intermediate altitude orbits.
* Reference: [8]
* Bookmark: MEO - Medium Earth Orbit

### GEO (Geosynchronous Equatorial Orbit)
* Altitude: 35,786 km
* Characteristics: Circular, geosynchronous, equatorial.
* Reference: [10]
* Bookmark: GEO - Geosynchronous Equatorial Orbit

### PO (Polar Orbit)
* Characteristics: Passes over or near poles.
* Reference: [11]
* Bookmark: PO - Polar Orbit

## Calculate the number and occurrence of each orbit

In [6]:
# Calculate the value counts of the 'Orbit' column
orbit_counts = df['Orbit'].value_counts()

# Print the results
print("Orbit Value Counts:\n", orbit_counts)

Orbit Value Counts:
 Orbit
GTO      27
ISS      21
VLEO     14
PO        9
LEO       7
SSO       5
MEO       3
ES-L1     1
HEO       1
SO        1
GEO       1
Name: count, dtype: int64


## Calculate the number and occurence of mission outcome of the orbits

In [8]:
# Calculate the value counts of the 'Outcome' column
landing_outcomes = df['Outcome'].value_counts()

# Print the results
print("Landing Outcomes Value Counts:\n", landing_outcomes)

Landing Outcomes Value Counts:
 Outcome
True ASDS      41
None None      19
True RTLS      14
False ASDS      6
True Ocean      5
False Ocean     2
None ASDS       2
False RTLS      1
Name: count, dtype: int64


In [9]:
for i,outcome in enumerate(landing_outcomes.keys()):
    print(i,outcome)

0 True ASDS
1 None None
2 True RTLS
3 False ASDS
4 True Ocean
5 False Ocean
6 None ASDS
7 False RTLS


In [10]:
bad_outcomes=set(landing_outcomes.keys()[[1,3,5,6,7]])
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

## Create a landing outcome label from Outcome column

In [12]:
# Define the set of bad outcomes
bad_outcome = set(['Failure', 'No attempt', 'Failure (parachute)','Failure (drone ship)','Failure (ocean)'])

# Create the landing_class list
landing_class = [0 if outcome in bad_outcome else 1 for outcome in df['Outcome']]

# Assign the list to the 'Class' column of the DataFrame
df['Class'] = landing_class

# Verify the result by printing the first few rows of 'Outcome' and 'Class'
print(df[['Outcome', 'Class']].head(10))

#Verify the value counts of the new class column.
print("\nValue counts for Class column:")
print(df['Class'].value_counts())

       Outcome  Class
0    None None      1
1    None None      1
2    None None      1
3  False Ocean      1
4    None None      1
5    None None      1
6   True Ocean      1
7   True Ocean      1
8    None None      1
9    None None      1

Value counts for Class column:
Class
1    90
Name: count, dtype: int64


In [13]:
df['Class']=landing_class
df[['Class']].head(8)

Unnamed: 0,Class
0,1
1,1
2,1
3,1
4,1
5,1
6,1
7,1


In [14]:
df.head(5)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Class
0,1,2010-06-04,Falcon 9,6104.959412,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857,1
1,2,2012-05-22,Falcon 9,525.0,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857,1
2,3,2013-03-01,Falcon 9,677.0,ISS,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857,1
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093,1
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857,1


In [15]:
df["Class"].mean()

1.0

In [17]:
# Save the DataFrame to a CSV file
df.to_csv('spacex_labeled.csv', index=False) #index = false prevents saving the index as a column.

print("DataFrame saved to spacex_labeled.csv")

DataFrame saved to spacex_labeled.csv
