# Exploratory Data Analysis (SQL)

## Objective
Perform exploratory analysis on SpaceX Falcon 9 launch data using SQL
to validate insights obtained during visual EDA and demonstrate
relational data querying skills.

SQL-based analysis complements Python-based exploration by enabling
structured aggregation, filtering, and grouping operations.


In [19]:
import pandas as pd
import sqlite3


In [20]:
df = pd.read_csv("../data/processed/spacex_api_clean.csv")
df.shape


(205, 10)

In [21]:
conn = sqlite3.connect(":memory:")
df.to_sql("launches", conn, index=False, if_exists="replace")


205

## SQL Table Overview

The dataset is loaded into an in-memory SQLite database to enable
SQL-based exploratory analysis.


In [22]:
query = """
PRAGMA table_info(launches);
"""
pd.read_sql(query, conn)


Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,flight_number,INTEGER,0,,0
1,1,name,TEXT,0,,0
2,2,date_utc,TEXT,0,,0
3,3,success,INTEGER,0,,0
4,4,rocket,TEXT,0,,0
5,5,launchpad,TEXT,0,,0
6,6,payloads,TEXT,0,,0
7,7,cores,TEXT,0,,0
8,8,landing_success,INTEGER,0,,0
9,9,launch_date,TEXT,0,,0


## Overall Landing Success Rate

Calculate the overall proportion of successful landings
using SQL aggregation functions.


In [23]:
query = """
SELECT
    landing_success,
    COUNT(*) AS count
FROM launches
GROUP BY landing_success;
"""
pd.read_sql(query, conn)


Unnamed: 0,landing_success,count
0,0,24
1,1,181


The `landing_success` field is encoded as:
- 1 → Successful landing
- 0 → Failed landing

## Landing Success by Launch Site

Analyze how landing success varies across different launch sites.


In [24]:
query = """
SELECT
    launchpad,
    COUNT(*) AS total_launches,
    ROUND(AVG(landing_success), 3) AS success_rate
FROM launches
GROUP BY launchpad
ORDER BY success_rate DESC;
"""
pd.read_sql(query, conn)


Unnamed: 0,launchpad,total_launches,success_rate
0,5e9e4502f509094188566f88,58,0.948
1,5e9e4502f509092b78566f87,30,0.9
2,5e9e4501f509094ba4566f84,112,0.866
3,5e9e4502f5090995de566f86,5,0.4


## Landing Success by Rocket Configuration

Evaluate whether different rocket configurations
are associated with different landing success rates.


In [25]:
query = """
SELECT
    rocket,
    COUNT(*) AS total_launches,
    ROUND(AVG(landing_success), 3) AS success_rate
FROM launches
GROUP BY rocket
ORDER BY success_rate DESC;
"""
pd.read_sql(query, conn)


Unnamed: 0,rocket,total_launches,success_rate
0,5e9d0d95eda69973a809d1ec,195,0.903
1,5e9d0d95eda69974db09d1ed,5,0.6
2,5e9d0d95eda69955f709d1eb,5,0.4


## Temporal Trends in Landing Success

Assess how landing success rates evolve over time using SQL.


In [26]:
query = """
SELECT
    SUBSTR(launch_date, 1, 4) AS launch_year,
    COUNT(*) AS total_launches,
    ROUND(AVG(landing_success), 3) AS success_rate
FROM launches
GROUP BY launch_year
ORDER BY launch_year;
"""
pd.read_sql(query, conn)


Unnamed: 0,launch_year,total_launches,success_rate
0,2006,1,0.0
1,2007,1,0.0
2,2008,2,0.5
3,2009,1,1.0
4,2010,2,1.0
5,2012,2,1.0
6,2013,3,1.0
7,2014,6,1.0
8,2015,7,0.857
9,2016,9,0.889


## High-Confidence Launch Configurations

Identify rocket configurations with a high number of launches
and strong landing success rates.


In [27]:
query = """
SELECT
    rocket,
    COUNT(*) AS total_launches,
    ROUND(AVG(landing_success), 3) AS success_rate
FROM launches
GROUP BY rocket
HAVING total_launches >= 10
ORDER BY success_rate DESC;
"""
pd.read_sql(query, conn)


Unnamed: 0,rocket,total_launches,success_rate
0,5e9d0d95eda69973a809d1ec,195,0.903


## Key SQL Insights

- Landing success rates vary significantly by launch site.
- Certain rocket configurations consistently achieve higher success rates.
- Landing success improves over time, reflecting operational learning.

These findings validate the trends observed during Python-based EDA
and reinforce feature selection decisions for predictive modeling.

Overall, SQL-based analysis reinforces the conclusions drawn from
visual exploration and provides a structured foundation for
feature engineering and predictive modeling.

## Next Steps

The SQL-based insights confirm patterns identified during visual EDA.
The next phase of the project will focus on building and evaluating
machine learning models to predict landing success.
