***
# SpaceX Falcon 9 first stage Landing Prediction
***




In this project, I developed a model to predict whether the Falcon 9 rocket's first stage will land successfully. SpaceX offers Falcon 9 launches at a significantly lower cost (62 million USD) compared to other providers (165 million USD), primarily due to the reusability of the first stage. By predicting successful landings, this project helps estimate launch costs, which could be valuable for companies bidding against SpaceX for rocket launches. I collected and pre-processed data from an API to ensure it was in the correct format for analysis.

![Alt Text](https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExcDl4Zjg3NXVrazM5MHp1ZnZ4NHppYjdhazUwaWU5cmM4ZWo3OXJlciZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/Ls1Lgm3ThzsY91Ltfk/giphy.webp)

![Alt Text](https://media2.giphy.com/media/v1.Y2lkPTc5MGI3NjExZzlla2JvbXAwNWJxOXc0ZGR4ajVjcmdobnNkMnlxeHgyN3cxMGt5NCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/xT39CRup15MdJgjLy0/giphy.webp)
***

## <u>Collecting and Cleansing Data</u>

#### Importing Required Libraries

In this section, we import the necessary libraries that will facilitate our data analysis and processing tasks:
​
- `requests`: For making API calls to retrieve data.
- `pandas`: A powerful data manipulation and analysis library that allows us to work with dataframes.
- `numpy`: A library for numerical computations that provides support for arrays and matrices.
- `datetime`: A module to handle date and time operations.

In [31]:
import requests
import pandas as pd
import numpy as np
import datetime

# Setting this option will print all collumns of a dataframe
pd.set_option('display.max_columns', None)
# Setting this option will print all of the data in a feature
pd.set_option('display.max_colwidth', None)

#### <u>Data Source</u>

The data used in this analysis is collected from the [SpaceX API](https://api.spacexdata.com/).


In [2]:
spacex_url="https://api.spacexdata.com/v4/launches/past" # requesting rocket launch data from SpaceX API with the following URL:
response = requests.get(spacex_url)

In [4]:
json_data = response.json()

data = pd.json_normalize(json_data)

data.head()

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,capsules,payloads,launchpad,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,auto_update,tbd,launch_library_id,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/94/f2/NN6Ph45r_o.png,https://images2.imgbox.com/5b/02/QcxHUb5V_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 301, 'altitude': 289, 'reason': 'harmonic oscillation leading to premature engine shutdown'}]","Successful first stage burn and transition to second stage, maximum altitude 289 km, Premature engine shutdown at T+7 min 30 s, Failed to reach orbit, Failed to recover first stage",[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/f9/4a/ZboXReNb_o.png,https://images2.imgbox.com/80/a2/bkWotCIS_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-rocket-fails-reach-orbit.html,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 140, 'altitude': 35, 'reason': 'residual stage-1 thrust led to collision between stage 1 and stage 2'}]",Residual stage 1 thrust led to collision between stage 1 and stage 2,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/6c/cb/na1tzhHs_o.png,https://images2.imgbox.com/4a/80/k1oAkY0k_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1-flight-3-mission-summary,https://en.wikipedia.org/wiki/Trailblazer_(satellite),
3,2008-09-20T00:00:00.000Z,1221869000.0,False,0.0,5e9d0d95eda69955f709d1eb,True,[],"Ratsat was carried to orbit on the first successful orbital launch of any privately funded and developed, liquid-propelled carrier rocket, the SpaceX Falcon 1",[],[],[],[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,4,RatSat,2008-09-28T23:15:00.000Z,1222643700,2008-09-28T11:15:00+12:00,hour,False,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdbffd86e000604b32d,False,False,False,[],https://images2.imgbox.com/95/39/sRqN7rsv_o.png,https://images2.imgbox.com/a3/99/qswRYzE8_o.png,,,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,0.0,5e9d0d95eda69955f709d1eb,True,[],,[],[],[],[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,5,RazakSat,2009-07-13T03:35:00.000Z,1247456100,2009-07-13T15:35:00+12:00,hour,False,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdcffd86e000604b32e,False,False,False,[],https://images2.imgbox.com/ab/5a/Pequxd5d_o.png,https://images2.imgbox.com/92/e4/7Cf6MLY0_o.png,,,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs-falcon-1-successfully-delivers-razaksat-satellite-orbit,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1-flight-5,https://en.wikipedia.org/wiki/RazakSAT,


## <u>Data Enrichment with API Calls</u>

Upon reviewing the dataset, we observe that many of the columns contain only IDs. For instance, the `rocket` column provides an identification number without additional information about the rocket itself.

To enrich our dataset, we will utilize the API again to gather detailed information about the launches corresponding to the given IDs. Specifically, we will focus on retrieving data for the following columns:
- `rocket`
- `payloads`
- `launchpad`
- `cores`



In [5]:
# Takes the dataset and uses the rocket column to call the API and append the data to the list
def getBoosterVersion(data):
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])

In [6]:
# Takes the dataset and uses the payloads column to call the API and append the data to the lists
def getPayloadData(data):
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

In [7]:
# Takes the dataset and uses the launchpad column to call the API and append the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])

#### Note: 
From cores we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, wheter the core is reused, wheter legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.

In [8]:
# Takes the dataset and uses the cores column to call the API and append the data to the lists
def getCoreData(data):
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
#If core['core'] is None, the function appends None to each of the lists, indicating that no data is available for that particular core.    
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

## <u>Data Filtering and Subsetting</u>

Here, we focus on creating a refined subset of our DataFrame by selecting only the relevant features, along with the flight number and UTC date.

**Selecting Relevant Columns**:  
   We extract a subset of the DataFrame that includes only the columns of interest:
   - `rocket`: The rocket utilized for the launch
   - `payloads`: The payload(s) carried by the rocket
   - `launchpad`: The location of the launch
   - `cores`: The core stage(s) of the rocket
   - `flight_number`: The unique identifier for the flight
   - `date_utc`: The date and time of the launch in UTC format

In [9]:
 data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

**Removing Rows with Multiple Cores and Payloads**:

We filter out rows containing multiple cores, as these correspond to Falcon rockets equipped with additional boosters. We also exclude rows with multiple payloads attached to a single rocket.

In [10]:
data = data[data['cores'].map(len) == 1]
data = data[data['payloads'].map(len) == 1]

**Extracting Single Values from Lists:**

Since both payloads and cores now contain lists of size 1, we extract the sole value from each list and replace the original feature with this value.

In [11]:
data['cores'] = data['cores'].map(lambda x: x[0])
data['payloads'] = data['payloads'].map(lambda x: x[0])

**Convert UTC Date and Restrict Launch Dates**

In this step, we convert the date_utc column to a datetime datatype and simultaneously filter the dataset to include only the launches that occurred on or before November 13, 2020.

In [12]:
data['date'] = pd.to_datetime(data['date_utc']).dt.date
data = data[data['date'] <= datetime.date(2020, 11, 13)]


The data from these requests will be stored in lists and will be used to create a new dataframe.

In [13]:
#Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

In [14]:
# Call getBoosterVersion
getBoosterVersion(data)

In [15]:
BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

In [16]:
# Call getLaunchSite
getLaunchSite(data)

# Call getPayloadData
getPayloadData(data)

# Call getCoreData
getCoreData(data)

### Constructing the Launch Dictionary

In this step, we construct a dictionary called `launch_dict` to organize the relevant information about each SpaceX launch. Each key in the dictionary represents a specific feature of the launch, while the associated values hold the corresponding data.

In [17]:
launch_dict = {
    'FlightNumber': list(data['flight_number']),       # Flight numbers from the DataFrame
    'Date': list(data['date']),                        # Launch dates extracted from UTC
    'BoosterVersion': BoosterVersion,                  # List of booster versions from API responses
    'PayloadMass': PayloadMass,                        # List of payload masses (assumed collected previously)
    'Orbit': Orbit,                                    # List of orbits for the payloads
    'LaunchSite': LaunchSite,                          # List of launch sites for each launch
    'Outcome': Outcome,                                # Launch outcomes (success/failure)
    'Flights': Flights,                                # Number of flights for each core
    'GridFins': GridFins,                              # Grid fin availability for each core
    'Reused': Reused,                                  # Indicates if the core was reused
    'Legs': Legs,                                      # Whether the landing legs were deployed
    'LandingPad': LandingPad,                          # Landing pad used for the core
    'Block': Block,                                    # Block number for the core
    'ReusedCount': ReusedCount,                        # Count of how many times the core has been reused
    'Serial': Serial,                                  # Serial number of the core
    'Longitude': Longitude,                            # Longitude of the launch site or landing pad
    'Latitude': Latitude                               # Latitude of the launch site or landing pad
}

### Creating a Pandas DataFrame

Next, we create a Pandas DataFrame from the `launch_dict` dictionary:

In [18]:
# Create a DataFrame from launch_dict
launch_df = pd.DataFrame(launch_dict)

# Show a summary of the DataFrame
summary = launch_df.describe(include='all')  
print(summary)

        FlightNumber        Date BoosterVersion   PayloadMass Orbit  \
count      94.000000          94             94     88.000000    94   
unique           NaN          94              2           NaN    11   
top              NaN  2006-03-24       Falcon 9           NaN   GTO   
freq             NaN           1             90           NaN    27   
mean       54.202128         NaN            NaN   5919.165341   NaN   
std        30.589048         NaN            NaN   4909.689575   NaN   
min         1.000000         NaN            NaN     20.000000   NaN   
25%        28.250000         NaN            NaN   2406.250000   NaN   
50%        52.500000         NaN            NaN   4414.000000   NaN   
75%        81.500000         NaN            NaN   9543.750000   NaN   
max       106.000000         NaN            NaN  15600.000000   NaN   

          LaunchSite    Outcome    Flights GridFins Reused  Legs  \
count             94         94  94.000000       94     94    94   
unique     

In [19]:
launch_df.head() 

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857


### Filter the DataFrame to Only Include Falcon 9 Launches

In this task, we focus on filtering the dataset to retain only the Falcon 9 launches, removing any Falcon 1 launches from our analysis. We will utilize the `BoosterVersion` column for this filtering process and save the results in a new DataFrame called `data_falcon9`.


In [20]:
data_falcon9 = launch_df[launch_df['BoosterVersion'] != 'Falcon 1']

In [21]:
# Resetting the FlightNumber column
data_falcon9 = data_falcon9.reset_index(drop=True)
data_falcon9['FlightNumber'] = data_falcon9.index + 1

## <u>Data Wrangling</u>

As we analyze the dataset, we notice that some rows contain missing values. We can identify these missing values with the following code:


In [22]:
data_falcon9.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        5
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

### Dealing with Missing Values


In [23]:
# Calculate the mean value of the PayloadMass column
mean_payload_mass = data_falcon9['PayloadMass'].mean()

# Replace the np.nan values with its mean value
data_falcon9['PayloadMass'].replace(np.nan, mean_payload_mass, inplace=True)

### Calculate the number of launches on each site


In [24]:
launch_counts = data_falcon9['LaunchSite'].value_counts()

print(launch_counts)

LaunchSite
CCSFS SLC 40    55
KSC LC 39A      22
VAFB SLC 4E     13
Name: count, dtype: int64


### Calculate the number and occurrence of each orbit

In [25]:
orbit_counts = data_falcon9['Orbit'].value_counts()

print(orbit_counts)

Orbit
GTO      27
ISS      21
VLEO     14
PO        9
LEO       7
SSO       5
MEO       3
ES-L1     1
HEO       1
SO        1
GEO       1
Name: count, dtype: int64


#### Orbit Types

- **GTO**: Elliptical orbit for transferring satellites to geostationary orbit.
- **ISS**: Modular space station in low Earth orbit for research and collaboration.
- **VLEO**: Orbit below 450 km for close Earth observation.
- **PO**: Satellite passes over both poles for global coverage.
- **LEO**: Orbit up to 2,000 km for communication and imaging satellites.
- **SSO**: Near-polar orbit for consistent solar exposure.
- **MEO**: Geocentric orbits from 2,000 km to just below geostationary.
- **ES-L1**: Stable point between Earth and Sun for spacecraft.
- **HEO**: Highly elliptical orbit for extended periods over certain regions.
- **SO**: Maintains consistent solar exposure over the Earth's surface.
- **GEO**: Circular orbit 35,786 km above the equator, appearing stationary.


### Calculate the Number and Occurrence of Mission Outcomes

In [26]:
landing_outcomes = data_falcon9['Outcome'].value_counts()

print(landing_outcomes)

Outcome
True ASDS      41
None None      19
True RTLS      14
False ASDS      6
True Ocean      5
False Ocean     2
None ASDS       2
False RTLS      1
Name: count, dtype: int64


#### Outcome Descriptions

- **True Ocean**: The mission successfully landed in a designated ocean area.
- **False Ocean**: The mission failed to land in a designated ocean area.
- **True RTLS**: The mission successfully landed on a ground pad.
- **False RTLS**: The mission failed to land on a ground pad.
- **True ASDS**: The mission successfully landed on a drone ship.
- **False ASDS**: The mission failed to land on a drone ship.
- **None ASDS**: Indicates a failure to land on the drone ship.
- **None None**: Indicates a complete failure to land.


The following code iterates over the keys of the `landing_outcomes` variable and prints each outcome with its index. It also creates a set of "bad outcomes" based on specific indices:

In [27]:
for i,outcome in enumerate(landing_outcomes.keys()):
    print(i,outcome)

0 True ASDS
1 None None
2 True RTLS
3 False ASDS
4 True Ocean
5 False Ocean
6 None ASDS
7 False RTLS


In [28]:
bad_outcomes=set(landing_outcomes.keys()[[1,3,5,6,7]])
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

### Creating a Landing Outcome Label

We will create a new list called `landing_class` based on the `Outcome` column. This list will classify each launch outcome as follows:

- **0**: If the corresponding row in `Outcome` is in the `bad_outcomes` set, indicating that the first stage did not land successfully.
- **1**: If the outcome is not in the `bad_outcomes` set, indicating that the first stage landed successfully.

This classification will help us analyse the success rate of the landings.


In [29]:
landing_class = [0 if outcome in bad_outcomes else 1 for outcome in data_falcon9['Outcome']]

data_falcon9['Class']=landing_class

df = data_falcon9  # Renaming the DataFrame for easy reference

df.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Class
0,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857,0
1,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857,0
2,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857,0
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093,0
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857,0


### <u>Success Rate</u>:

In [30]:
df["Class"].mean()

0.6666666666666666

***