# Falcon 9 Launch Data: Wrangling and Exploration

This notebook performs data cleaning and basic exploratory analysis on the Falcon 9 launch dataset collected from the SpaceX API.

In [None]:
import pandas as pd
import numpy as np

## Load Dataset

We start by loading the dataset from the SpaceX API.

In [None]:
df = pd.read_csv("../data/raw/dataset_part_1.csv")

df.head(10)

## Check for Missing Values

We inspect the dataset to identify any missing values in the columns.

In [None]:
# Percentual de valores ausentes por coluna
df.isnull().sum() / df.shape[0] * 100

In [None]:
df.dtypes

## Most Frequent Launch Sites

We use `value_counts()` to explore the most common launch sites.

In [None]:
# Frequência de cada local de lançamento
df['LaunchSite'].value_counts()

## Most Common Orbits

Next, we examine the distribution of orbits used in the launches.

In [None]:
# Frequência de cada tipo de órbita
df['Orbit'].value_counts()

## Landing Outcome Analysis

The `Outcome` column records the landing result of the booster. We classify the outcomes into successful and failed landings.

In [None]:
# Frequência de cada tipo de pouso
landing_outcomes = df['Outcome'].value_counts()
landing_outcomes

In [None]:
# Visualização com índice
for i, outcome in enumerate(landing_outcomes.keys()):
    print(i, outcome)

In [None]:
# Define quais outcomes são considerados falha (baseado nos índices anteriores)
bad_outcomes = set(landing_outcomes.keys()[[1, 3, 5, 6, 7]])
bad_outcomes

## Create Binary Landing Success Column

We create a new column `Class` that encodes landing success:
- 1 for successful landing
- 0 for failure

In [None]:
# variável binária Class
landing_class = []

for _, value in df['Outcome'].items():
    if value in bad_outcomes:
        landing_class.append(0)
    else:
        landing_class.append(1)

df['Class'] = landing_class

In [None]:
df[['Class']].head(8)

In [None]:
# Proporção de pousos bem-sucedidos
df["Class"].mean()

## Save Cleaned Dataset

We export the resulting dataframe with the new `Class` column to `dataset_part_2.csv` for future use in modeling.

In [None]:
# Salva o dataset com a nova coluna Class
df.to_csv("../data/processed/dataset_part_2.csv", index=False)