# **SpaceX  Falcon 9 First Stage Landing Prediction**

# Data Collection

In this project, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore, if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. In this notebook, we will collect and make sure the data is in the correct format from an API. The following is an example of a successful and launch.

![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/landing_1.gif)


Several examples of an unsuccessful landing are shown here:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/crash.gif)


Most unsuccessful landings are planned. Space X performs a controlled landing in the oceans. 


## Objectives


In this lab, you will make a get request to the SpaceX API. You will also do some basic data wrangling and formating. 

- Request to the SpaceX API.
- Clean the requested data.
- Perform Data Wrangling.

We will import the following libraries into the lab:

In [None]:
# Importing required libraries
import requests # Requests allows us to make HTTP requests which we will use to get data from an API
import pandas as pd # software library written for the Python programming language for data manipulation and analysis
import numpy as np # library adding support to multi-dimensional arrays, matrices and functions to operate on these arrays
import datetime # Datetime is a library that allows us to represent dates

# Setting this option will print all collumns of a dataframe
pd.set_option('display.max_columns', None)
# Setting this option will print all of the data in a feature
pd.set_option('display.max_colwidth', None)

Below we will define a series of helper functions that will help us use the API to extract information using identification numbers in the launch data.

From the <code>rocket</code> column we would like to learn the booster name.


In [None]:
# Taking the dataset and using the rocket column to call the API and append the data to the list
def getBoosterVersion(data):
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])

From the <code>launchpad</code> we would like to know the name of the launch site being used, the logitude and the latitude.


In [None]:
# Taking the dataset and using the launchpad column to call the API and append the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])

From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to.


In [None]:
# Taking the dataset and using the payloads column to call the API and append the data to the lists
def getPayloadData(data):
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, wheter the core is reused, wheter legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.


In [None]:
# Taking the dataset and using the cores column to call the API and append the data to the lists
def getCoreData(data):
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

Now let's start requesting rocket launch data from SpaceX API with the following URL:


In [None]:
spacex_url = "https://api.spacexdata.com/v4/launches/past"

In [None]:
response = requests.get(spacex_url)

Check the content of the response


In [None]:
print(response.content)

### Request and parse the SpaceX launch data using the GET request


To make the requested JSON results more consistent, we will use the following static response object for this project:


In [None]:
static_json_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'

We should see that the request was successfull with the 200 status response code


In [None]:
response.status_code

Now we decode the response content as a Json using <code>.json()</code> and turn it into a Pandas dataframe using <code>.json_normalize()</code>


In [None]:
# Utilization of json_normalize() method to convert the json result into a dataframe
data = pd.json_normalize(response.json())

Using the dataframe <code>data</code> print the first 5 rows


In [None]:
# Getting the head of the dataframe
data.head()

We notice that a lot of the data are IDs. For example, the rocket column has no information about the rocket just an identification number.

We will now use the API again to get information about the launches using the IDs given for each launch. Specifically we will be using columns <code>rocket</code>, <code>payloads</code>, <code>launchpad</code>, and <code>cores</code>.


In [None]:
# Taking a subset of our dataframe, keeping only the features we want and the flight number, and date_utc.
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# Removing rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket
data = data[data['cores'].map(len)==1]
data = data[data['payloads'].map(len)==1]

# Since payloads and cores are lists of size 1 we will also extract the single value in the list and replace the feature
data['cores'] = data['cores'].map(lambda x : x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

# Converting the date_utc to a datetime datatype and then, extracting the date leaving the time
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Utilizing the date to restrict the dates of the launches
data = data[data['date'] <= datetime.date(2020, 11, 13)]

* From the <code>rocket</code> we would like to learn the booster name

* From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to

* From the <code>launchpad</code> we would like to know the name of the launch site being used, the longitude, and the latitude.

* **From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.**

The data from these requests will be stored in lists and will be used to create a new dataframe.


In [None]:
# Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

These functions will apply the outputs globally to the above variables. Let's take a looks at <code>BoosterVersion</code> variable. Before we apply  <code>getBoosterVersion</code> the list is empty:


In [None]:
BoosterVersion

Now, let's apply <code> getBoosterVersion</code> function method to get the booster version


In [None]:
# Calling getBoosterVersion
getBoosterVersion(data)

the list has now been update 


In [None]:
BoosterVersion[0:5]

we can apply the rest of the  functions here:


In [None]:
# Calling getLaunchSite
getLaunchSite(data)

In [None]:
# Calling getPayloadData
getPayloadData(data)

In [None]:
# Calling getCoreData
getCoreData(data)

Finally, let's construct our dataset using the data we have obtained. We will combine the columns into a dictionary.


In [None]:
launch_dict = {'FlightNumber': list(data['flight_number']),
'Date': list(data['date']),
'BoosterVersion': BoosterVersion,
'PayloadMass': PayloadMass,
'Orbit': Orbit,
'LaunchSite': LaunchSite,
'Outcome': Outcome,
'Flights': Flights,
'GridFins': GridFins,
'Reused': Reused,
'Legs': Legs,
'LandingPad': LandingPad,
'Block': Block,
'ReusedCount': ReusedCount,
'Serial': Serial,
'Longitude': Longitude,
'Latitude': Latitude}

Then, we need to create a Pandas data frame from the dictionary launch_dict.


In [None]:
# Creating a data from launch_dict
launch_data = pd.DataFrame.from_dict(launch_dict)

We show the summary of the dataframe.

In [None]:
# Showing the head of the dataframe
launch_data.head()

### Filter the dataframe to only include `Falcon 9` launches


Finally, we will remove the Falcon 1 launches keeping only the Falcon 9 launches. We filter the dataframe using the <code>BoosterVersion</code> column to only keep the Falcon 9 launches and then, save the filtered data to a new dataframe called <code>data_falcon9</code>.

In [None]:
launch_data['BoosterVersion'].value_counts()

In [None]:
# Removing Falcon 1 launches and keeping only Falcon 9 launches
data_falcon9 = launch_data[launch_data['BoosterVersion'] != 'Falcon 1']

In [None]:
data_falcon9['BoosterVersion'].value_counts()

Now, that we have removed some values, we should reset the FlgihtNumber column.

In [None]:
data_falcon9.loc[:, 'FlightNumber'] = list(range(1, data_falcon9.shape[0]+1))
data_falcon9

## Data Wrangling


We can see below that some of the rows have missing values in the dataset.

In [None]:
data_falcon9.isnull().sum()

Before we continue we must deal with these missing values. The <code>LandingPad</code> column will retain None values to represent when landing pads were not used.

### Dealing with Missing Values


We calculate below the mean for the <code>PayloadMass</code> using the <code>.mean()</code>. Then, we use the mean and the <code>.replace()</code> function to replace `np.nan` values in the data with the mean we calculated.

In [None]:
# Calculating the mean value of PayloadMass column
payloadmass_mean = data_falcon9['PayloadMass'].mean()

# Replacing the np.nan values with its mean value
data_falcon9['PayloadMass'] = data_falcon9['PayloadMass'].fillna(payloadmass_mean)

We should see the number of missing values of the <code>PayLoadMass</code> change to zero.

Now, we should have no missing values in our dataset except for in <code>LandingPad</code>.

We can now export it to a <b>CSV</b> for the next section, but to make the answers consistent, in the next lab we will provide data in a pre-selected date range.

data_falcon9.to_csv('dataset_part_1.csv', index=False)

# Data Wrangling

Now we will perform some Exploratory Data Analysis (EDA) to find some patterns in the data and determine what would be the label for training supervised models.

In the data set, there are several different cases where the booster did not land successfully. Sometimes a landing was attempted, but failed due to an accident. For example, <code>True Ocean</code> means the mission outcome was successfully  landed to a specific region of the ocean while <code>False Ocean</code> means the mission outcome was unsuccessfully landed to a specific region of the ocean. <code>True RTLS</code> means the mission outcome was successfully  landed to a ground pad <code>False RTLS</code> means the mission outcome was unsuccessfully landed to a ground pad.<code>True ASDS</code> means the mission outcome was successfully landed on  a drone ship <code>False ASDS</code> means the mission outcome was unsuccessfully landed on a drone ship. 

Here we will mainly convert those outcomes into Training Labels with `1` means the booster successfully landed `0` means it was unsuccessful.

Below we identify and calculate the percentage of the missing values in each attribute.

In [None]:
df.isnull().sum()/len(df)*100

Then, we identify which columns are numerical and categorical:

In [None]:
df.dtypes

### Calculate the number of launches on each site

The data contains several SpaceX launch facilities: <a href='https://en.wikipedia.org/wiki/List_of_Cape_Canaveral_and_Merritt_Island_launch_sites'>Cape Canaveral Space</a> Launch Complex 40  <b>VAFB SLC 4E </b>, Vandenberg Air Force Base Space Launch Complex 4E <b>(SLC-4E)</b>, Kennedy Space Center Launch Complex 39A <b>KSC LC 39A </b>. The location of each Launch Is placed in the column <code>LaunchSite</code>.

Next, we will see the number of launches for each site.

We use the method <code>value_counts()</code> on the column <code>LaunchSite</code> to determine the number of launches on each site: 

In [None]:
df['LaunchSite'].value_counts()

Each launch aims to an dedicated orbit, and here are some common orbit types:


* <b>LEO</b>: Low Earth orbit (LEO) is an Earth-centred orbit with an altitude of 2,000 km (1,200 mi) or less (approximately one-third of the radius of Earth),[1] or with at least 11.25 periods per day (an orbital period of 128 minutes or less) and an eccentricity less than 0.25.[2] Most of the manmade objects in outer space are in LEO <a href='https://en.wikipedia.org/wiki/Low_Earth_orbit'>[1]</a>.


* <b>VLEO</b>: Very Low Earth Orbits (VLEO) can be defined as the orbits with a mean altitude below 450 km. Operating in these orbits can provide a number of benefits to Earth observation spacecraft as the spacecraft operates closer to the observation<a href='https://www.researchgate.net/publication/271499606_Very_Low_Earth_Orbit_mission_concepts_for_Earth_Observation_Benefits_and_challenges'>[2]</a>.


* <b>GTO</b>: A geosynchronous orbit is a high Earth orbit that allows satellites to match Earth's rotation. Located at 22,236 miles (35,786 kilometers) above Earth's equator, this position is a valuable spot for monitoring weather, communications and surveillance. Because the satellite orbits at the same speed that the Earth is turning, the satellite seems to stay in place over a single longitude, though it may drift north to south,” NASA wrote on its Earth Observatory website <a  href="https://www.space.com/29222-geosynchronous-orbit.html">[3]</a>.


* <b>SSO (or SO)</b>: It is a Sun-synchronous orbit  also called a heliosynchronous orbit is a nearly polar orbit around a planet, in which the satellite passes over any given point of the planet's surface at the same local mean solar time <a href="https://en.wikipedia.org/wiki/Sun-synchronous_orbit">[4]</a>.


* <b>ES-L1</b>: At the Lagrange points the gravitational forces of the two large bodies cancel out in such a way that a small object placed in orbit there is in equilibrium relative to the center of mass of the large bodies. L1 is one such point between the sun and the earth <a href="https://en.wikipedia.org/wiki/Lagrange_point#L1_point">[5]</a>.


* <b>HEO</b>: A highly elliptical orbit, is an elliptic orbit with high eccentricity, usually referring to one around Earth <a href="https://en.wikipedia.org/wiki/Highly_elliptical_orbit">[6]</a>.


* <b>ISS</b>: A modular space station (habitable artificial satellite) in low Earth orbit. It is a multinational collaborative project between five participating space agencies: NASA (United States), Roscosmos (Russia), JAXA (Japan), ESA (Europe), and CSA (Canada)<a href="https://en.wikipedia.org/wiki/International_Space_Station">[7]</a>.


* <b>MEO</b>: Geocentric orbits ranging in altitude from 2,000 km (1,200 mi) to just below geosynchronous orbit at 35,786 kilometers (22,236 mi). Also known as an intermediate circular orbit. These are "most commonly at 20,200 kilometers (12,600 mi), or 20,650 kilometers (12,830 mi), with an orbital period of 12 hours <a href="https://en.wikipedia.org/wiki/List_of_orbits">[8]</a>.


* <b>HEO</b>: Geocentric orbits above the altitude of geosynchronous orbit (35,786 km or 22,236 mi) <a href="https://en.wikipedia.org/wiki/List_of_orbits">[9]</a>.


* <b>GEO</b>: It is a circular geosynchronous orbit 35,786 kilometres (22,236 miles) above Earth's equator and following the direction of Earth's rotation <a href="https://en.wikipedia.org/wiki/Geostationary_orbit">[10]</a>.

* <b>PO</b>: It is one type of satellites in which a satellite passes above or nearly above both poles of the body being orbited (usually a planet such as the Earth <a href="https://en.wikipedia.org/wiki/Polar_orbit">[11]</a> are shown in the following plot:

![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/Orbits.png)

### Calculation of the number and occurrence of each orbit

We use the method <code>.value_counts()</code> to determine the number and occurrence of each orbit in the column <code>Orbit</code>.

In [None]:
df['Orbit'].value_counts()

### Calculation of the number and occurence of mission outcome of the orbits

We use the method <code>.value_counts()</code> on the column <code>Outcome</code> to determine the number of <code>landing_outcomes</code>. We then assign it to a variable landing_outcomes.

In [None]:
df['Outcome'].value_counts()

In [None]:
# landing_outcomes = values on Outcome column
landing_outcomes = df['Outcome'].value_counts()
landing_outcomes

<code>True Ocean</code> means the mission outcome was successfully  landed to a specific region of the ocean while <code>False Ocean</code> means the mission outcome was unsuccessfully landed to a specific region of the ocean. <code>True RTLS</code> means the mission outcome was successfully  landed to a ground pad <code>False RTLS</code> means the mission outcome was unsuccessfully landed to a ground pad. <code>True ASDS</code> means the mission outcome was successfully  landed to a drone ship <code>False ASDS</code> means the mission outcome was unsuccessfully landed to a drone ship. <code>None ASDS</code> and <code>None None</code> these represent a failure to land.

In [None]:
for i, outcome in enumerate(landing_outcomes.keys()):
    print(i, outcome)

We create a set of outcomes where the second stage did not land successfully:

In [None]:
bad_outcomes = set(landing_outcomes.keys()[[1, 3, 5, 6, 7]])
bad_outcomes

### Creation of a landing outcome label from Outcome column

Using column <code>Outcome</code>, we create a list where the element is zero if the corresponding row in <code>Outcome</code> is in the set <code>bad_outcome</code>. Otherwise, it's one. We then assign it to the variable <code>landing_class</code>:

In [None]:
# landing_class = 0 if bad_outcome
# landing_class = 1 otherwise
df['Class'] = df['Outcome'].apply(lambda x: 0 if x in bad_outcomes else 1)

This variable will represent the classification variable that represents the outcome of each launch. If the value is zero, the  first stage did not land successfully, whereas one means that the first stage landed successfully.

In [None]:
# df['Class'] = landing_class
df[['Class']].head()

In [None]:
df.head()

We use the following line of code to determine the success rate:

In [None]:
df['Class'].mean()

We can now export it to a CSV for the next section,but to make the answers consistent, in the next lab we will provide data in a pre-selected date range.

df.to_csv("dataset_part_2.csv", index=False)