# **SpaceX  Falcon 9 First Stage Landing Prediction**

## EDA with SQL (IBM DB2):

## Overview of the Dataset

SpaceX has gained worldwide attention for a series of historic milestones. 

It is the only private company ever to return a spacecraft from low-earth orbit, which it first accomplished in December 2010.
SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars wheras other providers cost upward of 165 million dollars each, much of the savings is because Space X can reuse the first stage. 


Therefore if we can determine if the first stage will land, we can determine the cost of a launch. 

This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.

This dataset includes a record for each payload carried during a SpaceX mission into outer space.

### Download the dataset

We need to load the spacex dataset and save it into a sqlite3 database. It's available on IBM cloud gallery in the format of a csv file. It's a better version than the one we imported using SpaceX public API.

Click on the link below to download and save the dataset (.CSV file):

 <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/data/Spacex.csv" target="_blank">Spacex DataSet</a>

## Steps for creating an IBM DB2 database on IBM cloud to use for this project and loading SpaceX csv file into it:

**Navigate to the Go to UI screen** 

* First of all, refer to this <a href="https://cloud.ibm.com/catalog/services/db2">link</a> for viewing the Go to UI screen.

* Later click on **Data link(below SQL)**  in the Go to UI screen  and click on **Load Data** tab.

* Later browse for the downloaded spacex file.



<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/images/browsefile.png" width="800">


* Once done select the schema and load the file.  


 <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/images/spacexload3.png" width="800">


If you are facing a problem in uploading the dataset (which is a csv file), you can follow the steps below to upload the .sql file instead of the CSV file:

* Download the file <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/Spacex%20.sql">Spacex.sql</a>

* Later click on **SQL** in the  **Go to UI Screen**.

* Use the **From file** option to browse for the **SQL** file and upload it.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/images/sqlfile.png">

* Once you upload the script,you can use the **Run All** option to run all the queries to insert the data.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/images/runall.png">

    


In [13]:
# Installing necessary packages to use within this section of the project

!pip install sqlalchemy==1.3.9
!pip install --force-reinstall ibm_db==3.1.0 ibm_db_sa==0.3.7
!pip install ipython-sql

### Connect to the database

Let's first load the SQL extension and establish a connection with the database.

In [1]:
# We execute this command so we could SQL magic function (Loading SQL extension)

%load_ext sql

In [10]:
# Import IBM DB2 package so we'll be able to create a DB2 database on IBM cloud

import ibm_db




**DB2 magic in case of  UI service credentials.**



<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/images/servicecredentials.png" width="600">  

* Use the following format.

* Add security=SSL at the end

**%sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name?security=SSL**


In [6]:
#Replace the placeholder values with your actual Db2 hostname, username, and password:
dsn_hostname = "_____" # e.g.: "54a2f15b-5c0f-46df-8954-7e38e612c2bd.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud"
dsn_uid = "_____"        # e.g. "abc12345"
dsn_pwd = "_____"      # e.g. "7dBZ3wWt9XN6$o0J"

dsn_driver = "{IBM DB2 ODBC DRIVER}"
dsn_database = "BLUDB"            # e.g. "BLUDB"
dsn_port = "30120"                # e.g. "32733" 
dsn_protocol = "TCPIP"            # i.e. "TCPIP"
dsn_security = "SSL"              #i.e. "SSL"

In [7]:
#Create the dsn connection string
dsn = (
    "DRIVER={0};"
    "DATABASE={1};"
    "HOSTNAME={2};"
    "PORT={3};"
    "PROTOCOL={4};"
    "UID={5};"
    "PWD={6};"
    "SECURITY={7};").format(dsn_driver, dsn_database, dsn_hostname, dsn_port, dsn_protocol, dsn_uid, dsn_pwd,dsn_security)

#print the connection string to check correct values are specified
print(dsn)

DRIVER={IBM DB2 ODBC DRIVER};DATABASE=BLUDB;HOSTNAME=8e359033-a1c9-4643-82ef-8ac06f5107eb.bs2io90l08kqb1od8lcg.databases.appdomain.cloud;PORT=30120;PROTOCOL=TCPIP;UID=nvn90420;PWD=0VxT0mBK2WDIMB3i;SECURITY=SSL;


In [14]:
#Create database connection

try:
    conn = ibm_db.connect(dsn, "", "")
    print ("Connected to database: ", dsn_database, "as user: ", dsn_uid, "on host: ", dsn_hostname)

except:
    print ("Unable to connect: ", ibm_db.conn_errormsg())

In [44]:
import pandas as pd
import ibm_db_dbi
pconn = ibm_db_dbi.Connection(conn)

### Query 1




##### Display the names of the unique launch sites  in the space mission :

In [64]:
selectQuery ="select distinct(launch_site) from spacex"
df = pd.read_sql(selectQuery, pconn)
df

Unnamed: 0,MISSION_OUTCOME
0,Failure (in flight)
1,Success
2,Success (payload status unclear)



### Query 2


#####  Display 5 records where launch sites begin with the string 'CCA'  :

In [46]:
selectQuery ="select * from spacex where launch_site like 'CCA%' limit 5"
df = pd.read_sql(selectQuery, pconn)
df



Unnamed: 0,DATE,TIME_UTC,BOOSTER_VERSION,LAUNCH_SITE,PAYLOAD,PAYLOAD_MASS_KG_,ORBIT,CUSTOMER,MISSION_OUTCOME,LANDING_OUTCOME
0,2010-06-04,18:45:00,F9 v1.0 B0003,CCAFS LC-40,Dragon Spacecraft Qualification Unit,0,LEO,SpaceX,Success,Failure (parachute)
1,2010-12-08,15:43:00,F9 v1.0 B0004,CCAFS LC-40,"Dragon demo flight C1, two CubeSats, barrel of...",0,LEO (ISS),NASA (COTS) NRO,Success,Failure (parachute)
2,2012-05-22,7:44:00,F9 v1.0 B0005,CCAFS LC-40,Dragon demo flight C2,525,LEO (ISS),NASA (COTS),Success,No attempt
3,2012-10-08,0:35:00,F9 v1.0 B0006,CCAFS LC-40,SpaceX CRS-1,500,LEO (ISS),NASA (CRS),Success,No attempt
4,2013-03-01,15:10:00,F9 v1.0 B0007,CCAFS LC-40,SpaceX CRS-2,677,LEO (ISS),NASA (CRS),Success,No attempt


### Query 3




##### Display the total payload mass carried by boosters launched by NASA (CRS) :

In [55]:
selectQuery ="select sum(payload_mass_kg_) as SUM from spacex where customer = 'NASA (CRS)'"
df = pd.read_sql(selectQuery, pconn)
df



Unnamed: 0,SUM
0,45596


### Query 4




##### Display average payload mass carried by booster version F9 v1.1 :

In [56]:
selectQuery ="select avg(payload_mass_kg_) as AVG from spacex where booster_version like 'F9 v1.1'"
df = pd.read_sql(selectQuery, pconn)
df



Unnamed: 0,AVG
0,2928


### Query 5

##### List the date when the first successful landing outcome in ground pad was acheived :

In [59]:
selectQuery ="select min(date) as min from spacex where landing_outcome = 'Success (ground pad)'"
df = pd.read_sql(selectQuery, pconn)
df



Unnamed: 0,MIN
0,2015-12-22


### Query 6

##### List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than 6000 :

In [61]:
selectQuery ="select booster_version from spacex where landing_outcome = 'Success (drone ship)' and (payload_mass_kg_ > 4000 and payload_mass_kg_ < 6000)"
df = pd.read_sql(selectQuery, pconn)
df



Unnamed: 0,BOOSTER_VERSION
0,F9 FT B1022
1,F9 FT B1026
2,F9 FT B1021.2
3,F9 FT B1031.2


### Query 7




##### List the total number of successful and failure mission outcomes :

In [73]:
selectQuery ="select mission_outcome, count(*) as Failures from spacex group by mission_outcome"
df = pd.read_sql(selectQuery, pconn)
print(df)

                    MISSION_OUTCOME  FAILURES
0               Failure (in flight)         1
1                           Success        99
2  Success (payload status unclear)         1




'\nselectQuery1 ="select count(*) as Successes from spacex where mission_outcome != \'Failure (in flight)\'"\ndf1 = pd.read_sql(selectQuery1, pconn)\nprint(df1)\n'

So, we have 99 successful missions , one failure and a doubtful mission outcome (payload status unclear).

### Query 8



##### List the   names of the booster_versions which have carried the maximum payload mass :

In [68]:
selectQuery ="select booster_version from spacex where payload_mass_kg_ = (select max(payload_mass_kg_) from spacex)"
df = pd.read_sql(selectQuery, pconn)
df

Unnamed: 0,BOOSTER_VERSION
0,F9 B5 B1048.4
1,F9 B5 B1049.4
2,F9 B5 B1051.3
3,F9 B5 B1056.4
4,F9 B5 B1048.5
5,F9 B5 B1051.4
6,F9 B5 B1049.5
7,F9 B5 B1060.2
8,F9 B5 B1058.3
9,F9 B5 B1051.6


### Query 9


##### List the failed landing_outcomes in drone ship, their booster versions, and launch site names for in year 2015 :

In [71]:
selectQuery ="select date, landing_outcome, booster_version, launch_site from spacex where landing_outcome = 'Failure (drone ship)' and date like '2015%'"
df = pd.read_sql(selectQuery, pconn)
df



Unnamed: 0,DATE,LANDING_OUTCOME,BOOSTER_VERSION,LAUNCH_SITE
0,2015-01-10,Failure (drone ship),F9 v1.1 B1012,CCAFS LC-40
1,2015-04-14,Failure (drone ship),F9 v1.1 B1015,CCAFS LC-40


### Query 10

##### Rank the count of landing outcomes (such as Failure (drone ship) or Success (ground pad)) between the date 2010-06-04 and 2017-03-20, in descending order :

In [76]:
selectQuery ="select landing_outcome, count(*) as count from spacex where date between '2010-06-04' and '2017-03-20' group by landing_outcome order by count desc"
df = pd.read_sql(selectQuery, pconn)
df



Unnamed: 0,LANDING_OUTCOME,COUNT
0,No attempt,10
1,Failure (drone ship),5
2,Success (drone ship),5
3,Controlled (ocean),3
4,Success (ground pad),3
5,Failure (parachute),2
6,Uncontrolled (ocean),2
7,Precluded (drone ship),1


In [77]:
# Close the connection with the database

ibm_db.close(conn)

True