<h1>SpaceX  Falcon 9 first stage Landing Prediction</h1>

---


# 1.1 Collecting the data


SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/landing_1.gif)


### Objectives




- Request to the SpaceX API
- Clean the requested data


----


### Auxiliary Functions


In [1]:
import requests
import pandas as pd
import numpy as np
import datetime

# Setting this option will print all collumns of a dataframe
pd.set_option('display.max_columns', None)
# Setting this option will print all of the data in a feature
pd.set_option('display.max_colwidth', None)

Helper functions; using the API to extract information using identification numbers in the launch data.


> From the <code>rocket</code> column we would like to learn the booster name.






In [2]:
# Takes the dataset and uses the rocket column to call the API and append the data to the list
def getBoosterVersion(data):
  for x in data['rocket']:
    if x:
      response = requests.get( f"https://api.spacexdata.com/v4/rockets/{str(x)}" ).json()
      BoosterVersion.append(response['name'])

> From the <code>launchpad</code> we would like to know the name of the launch site being used, the logitude, and the latitude.


In [3]:
# Uses the launchpad column
def getLaunchSite(data):
  for x in data['launchpad']:
    if x :
      response = requests.get( f"https://api.spacexdata.com/v4/launchpads/{str(x)}").json()
      Longitude.append(response['longitude'])
      Latitude.append(response['latitude'])
      LaunchSite.append(response['name'])

> From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to.


In [4]:
# Uses the payloads column
def getPayloadData(data):
  for load in data['payloads']:
    if load :
      response = requests.get( f"https://api.spacexdata.com/v4/payloads/{load}" ).json()
      PayloadMass.append(response['mass_kg'])
      Orbit.append(response['orbit'])

> From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, wheter the core is reused, wheter legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.


In [5]:
# Uses the cores column to call the API
def getCoreData(data):
  for core in data['cores']:
    if core['core'] != None:
      response = requests.get( f"https://api.spacexdata.com/v4/cores/{core['core']}" ).json()
      Block.append(response['block'])
      ReusedCount.append(response['reuse_count'])
      Serial.append(response['serial'])
    else:
      Block.append(None)
      ReusedCount.append(None)
      Serial.append(None)

    Outcome.append( str(core['landing_success'])+' '+str(core['landing_type']))
    Flights.append(core['flight'])
    GridFins.append(core['gridfins'])
    Reused.append(core['reused'])
    Legs.append(core['legs'])
    LandingPad.append(core['landpad'])

### Load Data


In [6]:
spacex_url = "https://api.spacexdata.com/v4/launches/past"
response = requests.get(spacex_url)

#print(response.content)

#### 1. Request and parse the SpaceX launch data


In [7]:
# Make the requested JSON results more consistent, using the following static response:
static_json_url='https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'
response.status_code

200

In [8]:
from pandas.io.json import json_normalize
# json_normalize method to convert the json result into a df
data = pd.json_normalize(response.json())

data.head(1)

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,capsules,payloads,launchpad,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,auto_update,tbd,launch_library_id,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/94/f2/NN6Ph45r_o.png,https://images2.imgbox.com/5b/02/QcxHUb5V_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,


Most of the relevant information is encoded by an ID. We will now use the API again to get information about the launches using the IDs given for each launch. Using columns <code>rocket</code>, <code>payloads</code>, <code>launchpad</code>, and <code>cores</code>.


In [9]:
# Only relevant features
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]


# We will remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
data = data[data['cores'].map(len) == 1]; data = data[data['payloads'].map(len) == 1]
# Lists of size 1; extract the single value in the list and replace the feature.
data['cores'] = data['cores'].map(lambda x : x[0]); data['payloads'] = data['payloads'].map(lambda x : x[0])


# Convert the date_utc to a datetime datatype and then extracting the date leaving the time
data['date'] = pd.to_datetime(data['date_utc']).dt.date
# Restrict launches to consider by date
data = data[data['date'] <= datetime.date(2020, 11, 13)]

Now, let's apply the auxiliary functions.

In [10]:
#Global variables
BoosterVersion = []

PayloadMass = []; Orbit = []

LaunchSite = []; Longitude = []; Latitude = []

Outcome = []; Flights = []; GridFins = []; Reused = []
Legs = []; LandingPad = []; Block = []; ReusedCount = []; Serial = []

In [11]:
getBoosterVersion(data)
getLaunchSite(data)
getPayloadData(data)
getCoreData(data)

In [12]:
BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

Finally lets construct our dataset using the data obtained.

In [13]:
launch_dict = { 'FlightNumber': list(data['flight_number']),
                'Date': list(data['date']),
                'BoosterVersion':BoosterVersion,
                'PayloadMass':PayloadMass,
                'Orbit':Orbit,
                'LaunchSite':LaunchSite,
                'Outcome':Outcome,
                'Flights':Flights,
                'GridFins':GridFins,
                'Reused':Reused,
                'Legs':Legs,
                'LandingPad':LandingPad,
                'Block':Block,
                'ReusedCount':ReusedCount,
                'Serial':Serial,
                'Longitude': Longitude,
                'Latitude': Latitude
               }

df = pd.DataFrame(launch_dict)

print(df.shape)
df.head()

(94, 17)


Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857


#### 2. Filter by `Falcon 9` launches


In [14]:
df = df[ df['BoosterVersion']!='Falcon 1' ]
df.reset_index(drop = True, inplace = True)

rows = df.shape[0]
df['FlightNumber'] = list( range(1, rows + 1) )

print(df.shape)

(90, 17)


### Data Wrangling


The <code>LandingPad</code> column will retain None values to represent when landing pads were not used.


#### 3. Cleaning Missing Values


Calculate below the mean for the <code>PayloadMass</code> using the <code>.mean()</code>. Then use the mean and the <code>.replace()</code> function to replace `np.nan` values in the data with the mean you calculated.


In [15]:
avg_load = df['PayloadMass'].mean()
df['PayloadMass'].fillna(avg_load, inplace = True)

In [16]:
df.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        0
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

Export DataFrame to a <b>CSV</b>

In [17]:
df.to_csv('df_1.csv', index = False)