## Space X Data Analysis of First Stage Landing Predictions. 
### Data Wrangling
#### After Webscraping Data and utilizing the Space X API to gather data, we need to clean up some of the data that we have gathered. As always, prior to beginning, we'll start by importing our libraries as well as the .CSV file we exported from our Data Collection via API section of this project. We'll also take a look at what percentage of our values are missing data and what types of values are stored within each column. From there we will calculate the number of launches at each launch site, determine the number of occurences for different types of orbits for each launch, and then create classes to filter weather a landing was a success or a failure in terms of whether the First Stage portion of each launch successfully landed or not. Finally we'll save this all to a new CSV for use in future projects. 

In [1]:
!pip install pandas
!pip install numpy









In [2]:
import pandas as pd
import numpy as np

#### We're starting with loading in the data from our previous data collection utilizing our API. 

In [3]:
df = pd.read_csv("dataset_part_1.csv")
df.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
1,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
2,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


We'll now analyze what percentage of our data is missing for each column. We'll find that only our Landing Pad Column is missing data. 

In [4]:
df.isnull().sum()/len(df)*100

FlightNumber       0.000000
Date               0.000000
BoosterVersion     0.000000
PayloadMass        0.000000
Orbit              0.000000
LaunchSite         0.000000
Outcome            0.000000
Flights            0.000000
GridFins           0.000000
Reused             0.000000
Legs               0.000000
LandingPad        28.888889
Block              0.000000
ReusedCount        0.000000
Serial             0.000000
Longitude          0.000000
Latitude           0.000000
dtype: float64

We also want to see what the type of each column is. 

In [5]:
df.dtypes

FlightNumber        int64
Date               object
BoosterVersion     object
PayloadMass       float64
Orbit              object
LaunchSite         object
Outcome            object
Flights             int64
GridFins             bool
Reused               bool
Legs                 bool
LandingPad         object
Block             float64
ReusedCount         int64
Serial             object
Longitude         float64
Latitude          float64
dtype: object

#### Here we'll take a look to figure out how many launches occurred at each launch site in our dataset. 

In [6]:
# Apply value_counts() on column LaunchSite
launch_counts = df['LaunchSite'].value_counts()
print(launch_counts)

LaunchSite
CCSFS SLC 40    55
KSC LC 39A      22
VAFB SLC 4E     13
Name: count, dtype: int64


#### Additionally, each launch is being conducted with a goal in mind. What kind of orbit is each launch after? Let's find out... 

In [7]:
# Apply value_counts on Orbit column
orbit_counts = df['Orbit'].value_counts()
print(orbit_counts)

Orbit
GTO      27
ISS      21
VLEO     14
PO        9
LEO       7
SSO       5
MEO       3
HEO       1
ES-L1     1
SO        1
GEO       1
Name: count, dtype: int64


####  Now we'll be creating classes to filter weather a landing was a success or a failure in terms of whether the First Stage portion of each launch successfully landed.

In [8]:
# landing_outcomes = values on Outcome column
landing_outcomes = df['Outcome'].value_counts()
print(landing_outcomes)

Outcome
True ASDS      41
None None      19
True RTLS      14
False ASDS      6
True Ocean      5
False Ocean     2
None ASDS       2
False RTLS      1
Name: count, dtype: int64


In [9]:
for i,outcome in enumerate(landing_outcomes.keys()):
    print(i,outcome)

0 True ASDS
1 None None
2 True RTLS
3 False ASDS
4 True Ocean
5 False Ocean
6 None ASDS
7 False RTLS


In [10]:
bad_outcomes=set(landing_outcomes.keys()[[1,3,5,6,7]])
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

In [11]:
# landing_class = 0 if bad_outcome, landing_class = 1 otherwise
landing_class = [0 if outcome in bad_outcomes else 1 for outcome in df['Outcome']]

In [12]:
df['Class']=landing_class
df[['Class']].head(8)

Unnamed: 0,Class
0,0
1,0
2,0
3,0
4,0
5,0
6,1
7,1


In [13]:
df.head(5)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Class
0,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857,0
1,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857,0
2,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857,0
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093,0
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857,0


#### Based on the information above, we can determine the overall success rate of all Falcon 9 launches based on whether or not they successfully landed. 

In [14]:
df["Class"].mean()

np.float64(0.6666666666666666)

Here we can see that the Falcon 9 has a successful landing rate of 66.67%. Overall, when you consider most landings are effectively trying to land a toothpick upright on top of a platform the size of a quarter rela