# Space X Falcon 9 First Stage Landing Prediction
## Part 1: Data Collection

 In this lab, we will make get requests to the SpaceX API as well as clean the requested data.

----

#### Importing our libraries

In [1]:
import requests # allows us to make HTTP requests
import datetime # allows to represent dates
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None

#### Defining our helper functions
These will aid in the use of the API to extract information using identification numbers.

From the <code>rocket</code> column we would like to learn the booster name.

In [2]:
# takes dataframe and appends it to add booster name
def getBoosterVersion(data):
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])

From the <code>launchpad</code> we would like to know the name, longitude, and latitude of the launch site being used.

In [3]:
# takes dataframe and appends it to add launch pad name, longitude, and latitude.
def getLaunchSite(data):
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])

From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to.

In [4]:
# takes dataframe and appends it to add payload mass and orbit endpoint
def getPayloadData(data):
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

From <code>cores</code> we would like to learn many things including:
1. Outcome of the landing
2. Type of landing
3. Number of flights with that core
4. Whether gridfins were used
5. Whether core is reused
6. Whether legs were used
7. Landing pad used
8. Block of the core (# used to separate versions of cores)
9. Number of times core has been reused
10. Serial number of core

In [5]:
# takes dataframe and appends the data to the lists
def getCoreData(data):
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

----

### 1.  Requesting and Parsing Launch Data

Verifying request status code

In [6]:
spacex_url ="https://api.spacexdata.com/v4/launches/past"
response = requests.get(spacex_url)
response.status_code

200

Decoding response as JSON and converting it into a Pandas dataframe, <code>data</code>.

In [7]:
response_json = response.json()
data = pd.json_normalize(response_json)

In [8]:
data.head()

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,...,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'mer...",Engine failure at 33 seconds and loss of vehicle,[],[],...,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-fa...,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 301, 'altitude': 289, 'reason': 'har...",Successful first stage burn and transition to ...,[],[],...,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-roc...,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 140, 'altitude': 35, 'reason': 'resi...",Residual stage 1 thrust led to collision betwe...,[],[],...,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1...,https://en.wikipedia.org/wiki/Trailblazer_(sat...,
3,2008-09-20T00:00:00.000Z,1221869000.0,False,0.0,5e9d0d95eda69955f709d1eb,True,[],Ratsat was carried to orbit on the first succe...,[],[],...,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,0.0,5e9d0d95eda69955f709d1eb,True,[],,[],[],...,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs...,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1...,https://en.wikipedia.org/wiki/RazakSAT,


#### Initial formatting:

Extracting subset with only the features we want along with <code>flight_number</code> and <code>date_utc</code>.

In [9]:
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

Removing rows with multiple <code>cores</code>.

In [10]:
data = data[data['cores'].map(len)==1]

<code>payloads</code> and <code>cores</code> are lists with 1 item, here we will extract that value and replace the feature.

In [11]:
data['cores'] = data['cores'].map(lambda x : x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

Converting <code>date_utc</code> to datetime datatype and extracting date.

In [12]:
data['date'] = pd.to_datetime(data['date_utc']).dt.date

Restricting launches using <code>date</code>

In [13]:
# Using the date we will restrict the dates of the launches
data = data[data['date'] <= datetime.date(2020, 11, 13)]

#### Requesting API again to get further information about launches using ID's given for each launch.

Data returned from these requested will be stored in lists and will be used to create a new dataframe.

In [14]:
# global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

Calling previously defined helper functions to populate our lists.

In [15]:
getBoosterVersion(data)

In [16]:
getLaunchSite(data)

In [17]:
getPayloadData(data)

In [18]:
getCoreData(data)

### Constructing our dataset:

Combining our columns into a dictionary.

In [19]:
launch_dict = {'FlightNumber': list(data['flight_number']),
'Date': list(data['date']),
'BoosterVersion':BoosterVersion,
'PayloadMass':PayloadMass,
'Orbit':Orbit,
'LaunchSite':LaunchSite,
'Outcome':Outcome,
'Flights':Flights,
'GridFins':GridFins,
'Reused':Reused,
'Legs':Legs,
'LandingPad':LandingPad,
'Block':Block,
'ReusedCount':ReusedCount,
'Serial':Serial,
'Longitude': Longitude,
'Latitude': Latitude}

In [20]:
data = pd.DataFrame.from_dict(launch_dict)

In [21]:
data.head(5)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,3,2008-08-03,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1C,167.743129,9.047721
3,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
4,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721


### 2. Filtering Dataframe to only include Falcon 9 launches

In [22]:
data_falcon9 = data[data['BoosterVersion']!='Falcon 1']
data_falcon9.head(2)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
5,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
6,7,2010-12-08,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0004,-80.577366,28.561857


Reseting index <code>FlightNumber</code>.

In [23]:
data_falcon9.loc[:,'FlightNumber'] = list(range(1,data_falcon9.shape[0]+1))
data_falcon9.set_index('FlightNumber', inplace=True)
data_falcon9.head()

Unnamed: 0_level_0,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
FlightNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
2,2010-12-08,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0004,-80.577366,28.561857
3,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
4,2012-10-08,Falcon 9,400.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0006,-80.577366,28.561857
5,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857


### 3. Data Wrangling

Checking for missing values:

In [24]:
data_falcon9.isnull().sum()

Date               0
BoosterVersion     0
PayloadMass        6
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        31
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

<code>LandingPad</code> and <code>PayloadMass</code> are the only columns with missing values.

<code>LandingPad</code> will stay the same as it's null values represent that landing pads were not used.

<code>PayloadMass</code>'s missing values however will need to dealt with.

#### Replacing <code>PayloadMass</code>'s missing values with it's mean.

In [25]:
mean = data_falcon9['PayloadMass'].mean()
data_falcon9['PayloadMass'].replace(np.nan, mean, inplace=True)

In [26]:
# we should get 0 nulls for payloadMass
data_falcon9.isnull().sum()

Date               0
BoosterVersion     0
PayloadMass        0
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        31
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

### Now that our data has been collected and formatted, let's save our progress and export it to CSV for the next steps.

In [27]:
data_falcon9.to_csv('dataset_part_1.csv', index=False)