# Introduction

In this section, I'll be performing some exploratory data analysis on the data on the Space X Launches that I scraped from Wikipedia. The data analysis will be done with SQL this time. To use SQL to query the data, I will connect to a DB file I created and then load the CSV file into the database as a new table using Pandas.

## Setting Up Environment

In [1]:
%load_ext sql

In [2]:
import sqlite3
import csv
import pandas as pd
import numpy as np
import sqlalchemy

con = sqlite3.connect("datasets/SpaceXDB.db")
cur = con.cursor()

In [3]:
%sql sqlite:///datasets/SpaceXDB.db

In [4]:
data = 'datasets/launch_data_falcon9_wiki.csv'
df = pd.read_csv(data)

df.loc['Payload mass'] = pd.to_numeric(df['Payload mass'])

# load the dataframe into SpaceXDB
df.to_sql("SPACEXTBL", con, if_exists='replace', index=False, method="multi")

122

## Querying the Database with SQL

Now the database is set up and the table has been loaded in.

### Task 1

##### Display the names of the unique launch sites  in the space mission.

In [5]:
%%sql
SELECT DISTINCT "Launch site"
FROM SPACEXTBL;

 * sqlite:///datasets/SpaceXDB.db
Done.


Launch site
CCSFS SLC-40
VAFB SLC-4E
KSC LC-39A
""


### Task 2

##### Display 5 records where launch sites begin with the string 'CCA'

Cape Canaveral Space Force Station used to be called Cape Canaveral Air Force Station. I changed all of them to be "Space Force Station" for consistency.

In [6]:
%%sql
SELECT * 
FROM SPACEXTBL
WHERE "Launch site"
LIKE "CCS%"
LIMIT 5;

 * sqlite:///datasets/SpaceXDB.db
Done.


Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Version Booster,Booster landing,Date,Time
1.0,CCSFS SLC-40,Dragon Spacecraft Qualification Unit,0.0,LEO,SpaceX,Success,F9 v1.0B0003.1,Failure,4 June 2010,18:45
2.0,CCSFS SLC-40,Dragon,0.0,LEO,NASA (COTS) NRO,Success,F9 v1.0B0004.1,Failure,8 December 2010,15:43
3.0,CCSFS SLC-40,Dragon,525.0,LEO,NASA (COTS),Success,F9 v1.0B0005.1,No attempt,22 May 2012,07:44
4.0,CCSFS SLC-40,SpaceX CRS-1,4700.0,LEO,NASA (CRS),Success,F9 v1.0B0006.1,No attempt,8 October 2012,00:35
5.0,CCSFS SLC-40,SpaceX CRS-2,4877.0,LEO,NASA (CRS),Success,F9 v1.0B0007.1,No attempt,1 March 2013,15:10


### Task 3

##### Display the total payload mass carried by boosters launched by NASA (CRS)

In [11]:
%%sql
SELECT Customer, SUM("Payload mass") as Total_Payload_Carried
FROM SPACEXTBL
GROUP BY Customer
HAVING Customer='NASA (CRS)';

 * sqlite:///datasets/SpaceXDB.db
Done.


Customer,Total_Payload_Carried
NASA (CRS),59941.0


### Task 4

##### Display average payload mass carried by booster version F9 v1.1

In [13]:
%%sql
WITH Boosters AS (
    SELECT "Version Booster", "Payload mass" 
    FROM SPACEXTBL 
    WHERE "Version Booster"
    LIKE 'F9 v1.1%'
)

SELECT AVG("Payload mass") as Avg_Payload_Mass
FROM Boosters;

 * sqlite:///datasets/SpaceXDB.db
Done.


Avg_Payload_Mass
2534.6666666666665


### Task 5

##### List the date when the first succesful landing outcome in ground pad was acheived.

*Hint:Use min function*

In [15]:
%%sql
SELECT MIN(Date) AS First_Success_Ground_Pad
FROM SPACEXTBL
WHERE "Booster landing"="Success (ground pad)";

 * sqlite:///datasets/SpaceXDB.db
Done.


First_Success_Ground_Pad
""
