<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


# **SpaceX  Falcon 9 first stage Landing Prediction**


# Lab 1: Collecting the data


Estimated time needed: **45** minutes


In this capstone, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. In this lab, you will collect and make sure the data is in the correct format from an API. The following is an example of a successful and launch.


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/landing_1.gif)


Several examples of an unsuccessful landing are shown here:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/crash.gif)


Most unsuccessful landings are planned. Space X performs a controlled landing in the oceans. 


## Objectives


In this lab, you will make a get request to the SpaceX API. You will also do some basic data wrangling and formating. 

- Request to the SpaceX API
- Clean the requested data


----


## Import Libraries and Define Auxiliary Functions


We will import the following libraries into the lab


In [35]:
# Requests allows us to make HTTP requests which we will use to get data from an API
import requests
# Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
# NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np
# Datetime is a library that allows us to represent dates
import datetime

# Setting this option will print all collumns of a dataframe
pd.set_option('display.max_columns', None)
# Setting this option will print all of the data in a feature
pd.set_option('display.max_colwidth', None)

Below we will define a series of helper functions that will help us use the API to extract information using identification numbers in the launch data.

From the <code>rocket</code> column we would like to learn the booster name.


In [57]:
# Takes the dataset and uses the rocket column to call the API and append the data to the list
def getBoosterVersion(data):
    for x in data['rocket']:
        if x:
            response = requests.get("https://api.spacexdata.com/v4/rockets/" + str(x)).json()
            BoosterVersion.append(response['name'])

From the <code>launchpad</code> we would like to know the name of the launch site being used, the logitude, and the latitude.


In [58]:
# Takes the dataset and uses the launchpad column to call the API and append the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:
        if x:
            response = requests.get("https://api.spacexdata.com/v4/launchpads/" + str(x)).json()
            Longitude.append(response['longitude'])
            Latitude.append(response['latitude'])
            LaunchSite.append(response['name'])

From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to.


In [59]:
# Takes the dataset and uses the payloads column to call the API and append the data to the lists
def getPayloadData(data):
    for load_list in data['payloads']:
        if load_list:
            load = load_list[0]  # ‚Üê Pega o primeiro da lista
            response = requests.get("https://api.spacexdata.com/v4/payloads/" + load).json()
            PayloadMass.append(response['mass_kg'])
            Orbit.append(response['orbit'])
        else:
            PayloadMass.append(None)
            Orbit.append(None)

From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, wheter the core is reused, wheter legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.


In [60]:
# Takes the dataset and uses the cores column to call the API and append the data to the lists
def getCoreData(data):
    for core_list in data['cores']:
        if core_list and len(core_list) > 0:
            core = core_list[0]  # ‚Üê Pega o primeiro da lista
            
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/" + core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            
            Outcome.append(str(core['landing_success']) + ' ' + str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])
        else:
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)
            Outcome.append(None)
            Flights.append(None)
            GridFins.append(None)
            Reused.append(None)
            Legs.append(None)
            LandingPad.append(None)


Now let's start requesting rocket launch data from SpaceX API with the following URL:


In [61]:
spacex_url = "https://api.spacexdata.com/v4/launches/past"
response = requests.get(spacex_url)
print(f"Status: {response.status_code}")

data = response.json()
df = pd.DataFrame(data)
print(f"Lan√ßamentos: {len(df)}")


Status: 200
Lan√ßamentos: 187


Check the content of the response


In [70]:
print("\n1Ô∏è‚É£ Extraindo vers√£o do booster...")
getBoosterVersion(df)
print(f"   ‚úÖ {len(BoosterVersion)} boosters extra√≠dos")

print("\n2Ô∏è‚É£ Extraindo dados do local de lan√ßamento...")
getLaunchSite(df)
print(f"   ‚úÖ {len(LaunchSite)} locais extra√≠dos")

print("\n3Ô∏è‚É£ Extraindo dados da carga √∫til...")
getPayloadData(df)
print(f"   ‚úÖ {len(PayloadMass)} payloads extra√≠dos")

print("\n4Ô∏è‚É£ Extraindo dados do core...")
getCoreData(df)
print(f"   ‚úÖ {len(Outcome)} cores extra√≠dos")



1Ô∏è‚É£ Extraindo vers√£o do booster...
   ‚úÖ 187 boosters extra√≠dos

2Ô∏è‚É£ Extraindo dados do local de lan√ßamento...
   ‚úÖ 187 locais extra√≠dos

3Ô∏è‚É£ Extraindo dados da carga √∫til...
   ‚úÖ 187 payloads extra√≠dos

4Ô∏è‚É£ Extraindo dados do core...
   ‚úÖ 187 cores extra√≠dos


In [73]:
print(f"Tamanho do DataFrame: {len(df)}")
print(f"Colunas do DataFrame: {df.shape}")
print(f"\nPrimeiras 5 linhas:")
print(df[['flight_number', 'rocket', 'date_utc']].head())
print(f"\n√öltimas 5 linhas:")
print(df[['flight_number', 'rocket', 'date_utc']].tail())


Tamanho do DataFrame: 187
Colunas do DataFrame: (187, 27)

Primeiras 5 linhas:
   flight_number                    rocket                  date_utc
0              1  5e9d0d95eda69955f709d1eb  2006-03-24T22:30:00.000Z
1              2  5e9d0d95eda69955f709d1eb  2007-03-21T01:10:00.000Z
2              3  5e9d0d95eda69955f709d1eb  2008-08-03T03:34:00.000Z
3              4  5e9d0d95eda69955f709d1eb  2008-09-28T23:15:00.000Z
4              5  5e9d0d95eda69955f709d1eb  2009-07-13T03:35:00.000Z

√öltimas 5 linhas:
     flight_number                    rocket                  date_utc
182            183  5e9d0d95eda69973a809d1ec  2022-09-05T02:09:00.000Z
183            184  5e9d0d95eda69973a809d1ec  2022-09-11T01:10:00.000Z
184            185  5e9d0d95eda69973a809d1ec  2022-09-17T01:05:00.000Z
185            186  5e9d0d95eda69973a809d1ec  2022-09-24T23:30:00.000Z
186            187  5e9d0d95eda69973a809d1ec  2022-10-05T16:00:00.000Z


In [71]:
# Ver os primeiros 5 resultados
print("\nPrimeiros 5 cores:")
for i in range(5):
    print(f"{i}: Block={Block[i]}, Serial={Serial[i]}, Outcome={Outcome[i]}")



Primeiros 5 cores:
0: Block=None, Serial=Merlin1A, Outcome=None None
1: Block=None, Serial=Merlin2A, Outcome=None None
2: Block=None, Serial=Merlin1C, Outcome=None None
3: Block=None, Serial=Merlin2C, Outcome=None None
4: Block=None, Serial=Merlin3C, Outcome=None None


You should see the response contains massive information about SpaceX launches. Next, let's try to discover some more relevant information for this project.


### Task 1: Request and parse the SpaceX launch data using the GET request


To make the requested JSON results more consistent, we will use the following static response object for this project:


In [11]:
# Use json_normalize meethod to convert the json result into a dataframe

We should see that the request was successfull with the 200 status response code


In [15]:
# Get the head of the dataframe

Now we decode the response content as a Json using <code>.json()</code> and turn it into a Pandas dataframe using <code>.json_normalize()</code>


In [80]:
# Converter a resposta em JSON
data = response.json()

Using the dataframe <code>data</code> print the first 5 rows


In [81]:
print("\n" + "="*70)
print("AN√ÅLISES DOS DADOS")
print("="*70)

# 1. Taxa de reutiliza√ß√£o
reused_count = df_final['Reused'].sum()
total = len(df_final)
reuse_rate = (reused_count / total) * 100

print(f"\n1Ô∏è‚É£ REUTILIZA√á√ÉO:")
print(f"   Boosters reutilizados: {reused_count} / {total}")
print(f"   Taxa de reutiliza√ß√£o: {reuse_rate:.1f}%")

# 2. Tipos de boosters
print(f"\n2Ô∏è‚É£ TIPOS DE BOOSTERS:")
booster_counts = df_final['Booster Version'].value_counts()
for booster, count in booster_counts.items():
    pct = (count / total) * 100
    print(f"   ‚Ä¢ {booster}: {count} ({pct:.1f}%)")

# 3. Locais de lan√ßamento
print(f"\n3Ô∏è‚É£ LOCAIS DE LAN√áAMENTO (TOP 5):")
launch_sites = df_final['Launch Site'].value_counts().head(5)
for site, count in launch_sites.items():
    print(f"   ‚Ä¢ {site}: {count}")

# 4. √ìrbitas mais comuns
print(f"\n4Ô∏è‚É£ √ìRBITAS MAIS COMUNS:")
orbits = df_final['Orbit'].value_counts().head(5)
for orbit, count in orbits.items():
    print(f"   ‚Ä¢ {orbit}: {count}")

# 5. Massa m√©dia dos payloads
payload_mass_numeric = pd.to_numeric(df_final['Payload Mass'], errors='coerce')
print(f"\n5Ô∏è‚É£ PAYLOAD MASS:")
print(f"   M√≠nimo: {payload_mass_numeric.min():.0f} kg")
print(f"   M√°ximo: {payload_mass_numeric.max():.0f} kg")
print(f"   M√©dia: {payload_mass_numeric.mean():.0f} kg")

# 6. Locais de pouso
print(f"\n6Ô∏è‚É£ LOCAIS DE POUSO:")
landing_pads = df_final['LandingPad'].value_counts().head(5)
for pad, count in landing_pads.items():
    if pad and str(pad) != 'nan':
        print(f"   ‚Ä¢ {pad}: {count}")

print("\n" + "="*70)
print("‚úÖ LABORAT√ìRIO CONCLU√çDO COM SUCESSO!")
print("="*70)



AN√ÅLISES DOS DADOS

1Ô∏è‚É£ REUTILIZA√á√ÉO:
   Boosters reutilizados: 115 / 187
   Taxa de reutiliza√ß√£o: 61.5%

2Ô∏è‚É£ TIPOS DE BOOSTERS:
   ‚Ä¢ Falcon 9: 179 (95.7%)
   ‚Ä¢ Falcon 1: 5 (2.7%)
   ‚Ä¢ Falcon Heavy: 3 (1.6%)

3Ô∏è‚É£ LOCAIS DE LAN√áAMENTO (TOP 5):
   ‚Ä¢ CCSFS SLC 40: 99
   ‚Ä¢ KSC LC 39A: 55
   ‚Ä¢ VAFB SLC 4E: 28
   ‚Ä¢ Kwajalein Atoll: 5

4Ô∏è‚É£ √ìRBITAS MAIS COMUNS:
   ‚Ä¢ VLEO: 58
   ‚Ä¢ GTO: 35
   ‚Ä¢ ISS: 33
   ‚Ä¢ LEO: 20
   ‚Ä¢ PO: 14

5Ô∏è‚É£ PAYLOAD MASS:
   M√≠nimo: 20 kg
   M√°ximo: 15600 kg
   M√©dia: 7868 kg

6Ô∏è‚É£ LOCAIS DE POUSO:
   ‚Ä¢ 5e9e3032383ecb6bb234e7ca: 64
   ‚Ä¢ 5e9e3033383ecbb9e534e7cc: 40
   ‚Ä¢ 5e9e3033383ecb075134e7cd: 21
   ‚Ä¢ 5e9e3032383ecb267a34e7c7: 18
   ‚Ä¢ 5e9e3032383ecb554034e7c9: 6

‚úÖ LABORAT√ìRIO CONCLU√çDO COM SUCESSO!


In [84]:
# Salvar o DataFrame final
df_final.to_csv('spacex_api_data.csv', index=False)

print("\n‚úÖ Arquivo 'spacex_api_data.csv' salvo com sucesso!")
print(f"   Total de registros: {len(df_final)}")
print(f"   Total de colunas: {len(df_final.columns)}")



‚úÖ Arquivo 'spacex_api_data.csv' salvo com sucesso!
   Total de registros: 187
   Total de colunas: 17


You will notice that a lot of the data are IDs. For example the rocket column has no information about the rocket just an identification number.

We will now use the API again to get information about the launches using the IDs given for each launch. Specifically we will be using columns <code>rocket</code>, <code>payloads</code>, <code>launchpad</code>, and <code>cores</code>.


In [85]:
import datetime

print("="*70)
print("LIMPEZA E FILTRAGEM DOS DADOS")
print("="*70)

# PASSO 1: Fazer backup do DataFrame original
df_raw = df.copy()
print(f"\n1Ô∏è‚É£ Backup criado:")
print(f"   Linhas originais: {len(df_raw)}")

# PASSO 2: Manter apenas colunas importantes
data = df[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']].copy()
print(f"\n2Ô∏è‚É£ Colunas selecionadas (6 colunas)")

# PASSO 3: Remover linhas com m√∫ltiplos cores (Falcon Heavy tem 2 cores extras)
print(f"\n3Ô∏è‚É£ Filtrando m√∫ltiplos cores:")
print(f"   Antes: {len(data)} linhas")
data = data[data['cores'].map(len) == 1]
print(f"   Depois: {len(data)} linhas")
print(f"   Removidas: {len(df_raw) - len(data)} linhas (Falcon Heavy)")

# PASSO 4: Remover linhas com m√∫ltiplos payloads
print(f"\n4Ô∏è‚É£ Filtrando m√∫ltiplos payloads:")
print(f"   Antes: {len(data)} linhas")
data = data[data['payloads'].map(len) == 1]
print(f"   Depois: {len(data)} linhas")
print(f"   Removidas: linhas com m√∫ltiplos payloads")

# PASSO 5: Extrair valores √∫nicos das listas
print(f"\n5Ô∏è‚É£ Extraindo valores das listas:")
data['cores'] = data['cores'].map(lambda x: x[0])
data['payloads'] = data['payloads'].map(lambda x: x[0])
print(f"   ‚úÖ Cores extra√≠dos (de lista para string)")
print(f"   ‚úÖ Payloads extra√≠dos (de lista para string)")

# PASSO 6: Converter datas
print(f"\n6Ô∏è‚É£ Convertendo datas:")
data['date'] = pd.to_datetime(data['date_utc']).dt.date
print(f"   ‚úÖ Coluna 'date' criada no formato YYYY-MM-DD")
print(f"   Primeira data: {data['date'].min()}")
print(f"   √öltima data: {data['date'].max()}")

# PASSO 7: Filtrar por data (at√© 13 de novembro de 2020)
print(f"\n7Ô∏è‚É£ Filtrando por data (at√© 13/11/2020):")
print(f"   Antes: {len(data)} linhas")
data = data[data['date'] <= datetime.date(2020, 11, 13)]
print(f"   Depois: {len(data)} linhas")
print(f"   Removidas: linhas ap√≥s 13/11/2020")

print(f"\n" + "="*70)
print(f"‚úÖ LIMPEZA CONCLU√çDA!")
print(f"="*70)
print(f"\nüìä Resumo final:")
print(f"   Linhas no dataset limpo: {len(data)}")
print(f"   Colunas: {list(data.columns)}")
print(f"   Per√≠odo: {data['date'].min()} a {data['date'].max()}")

# PASSO 8: Visualizar o resultado
print(f"\nüìã Primeiras 5 linhas do dataset limpo:")
print(data.head())

print(f"\nüìà Informa√ß√µes do dataset:")
print(data.info())


LIMPEZA E FILTRAGEM DOS DADOS

1Ô∏è‚É£ Backup criado:
   Linhas originais: 187

2Ô∏è‚É£ Colunas selecionadas (6 colunas)

3Ô∏è‚É£ Filtrando m√∫ltiplos cores:
   Antes: 187 linhas
   Depois: 184 linhas
   Removidas: 3 linhas (Falcon Heavy)

4Ô∏è‚É£ Filtrando m√∫ltiplos payloads:
   Antes: 184 linhas
   Depois: 172 linhas
   Removidas: linhas com m√∫ltiplos payloads

5Ô∏è‚É£ Extraindo valores das listas:
   ‚úÖ Cores extra√≠dos (de lista para string)
   ‚úÖ Payloads extra√≠dos (de lista para string)

6Ô∏è‚É£ Convertendo datas:
   ‚úÖ Coluna 'date' criada no formato YYYY-MM-DD
   Primeira data: 2006-03-24
   √öltima data: 2022-10-05

7Ô∏è‚É£ Filtrando por data (at√© 13/11/2020):
   Antes: 172 linhas
   Depois: 94 linhas
   Removidas: linhas ap√≥s 13/11/2020

‚úÖ LIMPEZA CONCLU√çDA!

üìä Resumo final:
   Linhas no dataset limpo: 94
   Colunas: ['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc', 'date']
   Per√≠odo: 2006-03-24 a 2020-11-05

üìã Primeiras 5 linhas do 

In [86]:
# Salvar o dataset limpo
data.to_csv('spacex_data_cleaned.csv', index=False)
print("‚úÖ Dataset limpo salvo como 'spacex_data_cleaned.csv'")

# Ver estat√≠sticas
print(f"\nüìä ESTAT√çSTICAS:")
print(f"   Redu√ß√£o de dados: {(1 - len(data)/len(df_raw))*100:.1f}%")
print(f"   Linhas mantidas: {len(data)} de {len(df_raw)}")


‚úÖ Dataset limpo salvo como 'spacex_data_cleaned.csv'

üìä ESTAT√çSTICAS:
   Redu√ß√£o de dados: 49.7%
   Linhas mantidas: 94 de 187


* From the <code>rocket</code> we would like to learn the booster name

* From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to

* From the <code>launchpad</code> we would like to know the name of the launch site being used, the longitude, and the latitude.

* **From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.**

The data from these requests will be stored in lists and will be used to create a new dataframe.


In [87]:
#Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

These functions will apply the outputs globally to the above variables. Let's take a looks at <code>BoosterVersion</code> variable. Before we apply  <code>getBoosterVersion</code> the list is empty:


In [21]:
BoosterVersion

[]

Now, let's apply <code> getBoosterVersion</code> function method to get the booster version


In [88]:
# Call getBoosterVersion
def getBoosterVersion(dataset):
    for x in dataset['rocket']:
        if x:
            response = requests.get("https://api.spacexdata.com/v4/rockets/" + str(x)).json()
            BoosterVersion.append(response['name'])

the list has now been update 


In [89]:
BoosterVersion[0:5]

[]

we can apply the rest of the  functions here:


In [90]:
# Call getLaunchSite
def getLaunchSite(dataset):
    for x in dataset['launchpad']:
        if x:
            response = requests.get("https://api.spacexdata.com/v4/launchpads/" + str(x)).json()
            Longitude.append(response['longitude'])
            Latitude.append(response['latitude'])
            LaunchSite.append(response['name'])

In [92]:
# Call getPayloadData
def getCoreData(dataset):
    for core_dict in dataset['cores']:
        if core_dict and core_dict['core'] != None:
            response = requests.get("https://api.spacexdata.com/v4/cores/" + core_dict['core']).json()
            Block.append(response['block'])
            ReusedCount.append(response['reuse_count'])
            Serial.append(response['serial'])
        else:
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)

# Executar as fun√ß√µes
print("\n1Ô∏è‚É£ Extraindo booster versions...")
getBoosterVersion(data)
print(f"   ‚úÖ {len(BoosterVersion)} extra√≠dos")

print("\n2Ô∏è‚É£ Extraindo launch sites...")
getLaunchSite(data)
print(f"   ‚úÖ {len(LaunchSite)} extra√≠dos")

print("\n3Ô∏è‚É£ Extraindo payload data...")
getPayloadData(data)
print(f"   ‚úÖ {len(PayloadMass)} extra√≠dos")

print("\n4Ô∏è‚É£ Extraindo core data...")
getCoreData(data)
print(f"   ‚úÖ {len(Block)} extra√≠dos")

print("\n‚úÖ TODAS AS FEATURES EXTRA√çDAS!")


1Ô∏è‚É£ Extraindo booster versions...
   ‚úÖ 94 extra√≠dos

2Ô∏è‚É£ Extraindo launch sites...
   ‚úÖ 94 extra√≠dos

3Ô∏è‚É£ Extraindo payload data...
   ‚úÖ 94 extra√≠dos

4Ô∏è‚É£ Extraindo core data...
   ‚úÖ 94 extra√≠dos

‚úÖ TODAS AS FEATURES EXTRA√çDAS!


In [26]:
# Call getCoreData
getCoreData(data)

Finally lets construct our dataset using the data we have obtained. We we combine the columns into a dictionary.


In [93]:
launch_dict = {'FlightNumber': list(data['flight_number']),
'Date': list(data['date']),
'BoosterVersion':BoosterVersion,
'PayloadMass':PayloadMass,
'Orbit':Orbit,
'LaunchSite':LaunchSite,
'Outcome':Outcome,
'Flights':Flights,
'GridFins':GridFins,
'Reused':Reused,
'Legs':Legs,
'LandingPad':LandingPad,
'Block':Block,
'ReusedCount':ReusedCount,
'Serial':Serial,
'Longitude': Longitude,
'Latitude': Latitude}


Then, we need to create a Pandas data frame from the dictionary launch_dict.


In [95]:
print("\n" + "="*70)
print("CRIANDO DATAFRAME FINAL")
print("="*70)

# Criar DataFrame com todas as features
X = pd.DataFrame({
    'FlightNumber': data['flight_number'].values,
    'Date': data['date'].values,
    'BoosterVersion': BoosterVersion,
    'LaunchSite': LaunchSite,
    'Longitude': Longitude,
    'Latitude': Latitude,
    'PayloadMass': PayloadMass,
    'Orbit': Orbit,
    'Block': Block,
    'ReusedCount': ReusedCount,
    'Serial': Serial
})

print(f"\n‚úÖ DataFrame final criado!")
print(f"   Dimens√µes: {X.shape[0]} linhas x {X.shape[1]} colunas")

# Ver o resultado
print(f"\nüìã Primeiras 10 linhas:")
print(X.head(10))

print(f"\nüìà Informa√ß√µes:")
print(X.info())

print(f"\nüìä Primeiras estat√≠sticas:")
print(X.describe())



CRIANDO DATAFRAME FINAL

‚úÖ DataFrame final criado!
   Dimens√µes: 94 linhas x 11 colunas

üìã Primeiras 10 linhas:
   FlightNumber        Date BoosterVersion       LaunchSite   Longitude  \
0             1  2006-03-24       Falcon 1  Kwajalein Atoll  167.743129   
1             2  2007-03-21       Falcon 1  Kwajalein Atoll  167.743129   
2             4  2008-09-28       Falcon 1  Kwajalein Atoll  167.743129   
3             5  2009-07-13       Falcon 1  Kwajalein Atoll  167.743129   
4             6  2010-06-04       Falcon 9     CCSFS SLC 40  -80.577366   
5             8  2012-05-22       Falcon 9     CCSFS SLC 40  -80.577366   
6            10  2013-03-01       Falcon 9     CCSFS SLC 40  -80.577366   
7            11  2013-09-29       Falcon 9      VAFB SLC 4E -120.610829   
8            12  2013-12-03       Falcon 9     CCSFS SLC 40  -80.577366   
9            13  2014-01-06       Falcon 9     CCSFS SLC 40  -80.577366   

    Latitude  PayloadMass Orbit  Block  ReusedCount    

Show the summary of the dataframe


In [96]:
print("\n" + "="*70)
print("EXTRAINDO TARGET (Landing Success)")
print("="*70)

# Criar lista para armazenar resultado de pouso
LandingClass = []

for core_dict in data['cores']:
    if core_dict['landing_success'] == True:
        LandingClass.append(1)  # Sucesso
    else:
        LandingClass.append(0)  # Falha

# Adicionar como coluna no X
X['LandingClass'] = LandingClass

print(f"\n‚úÖ Target adicionado ao DataFrame!")
print(f"\nüìä Distribui√ß√£o de pouso:")
print(f"   Sucessos (1): {sum(LandingClass)}")
print(f"   Falhas (0): {len(LandingClass) - sum(LandingClass)}")
print(f"   Taxa de sucesso: {(sum(LandingClass)/len(LandingClass)*100):.1f}%")

print(f"\nüìã √öltimas 10 linhas com target:")
print(X.tail(10))



EXTRAINDO TARGET (Landing Success)

‚úÖ Target adicionado ao DataFrame!

üìä Distribui√ß√£o de pouso:
   Sucessos (1): 60
   Falhas (0): 34
   Taxa de sucesso: 63.8%

üìã √öltimas 10 linhas com target:
    FlightNumber        Date BoosterVersion    LaunchSite  Longitude  \
84            96  2020-06-13       Falcon 9  CCSFS SLC 40 -80.577366   
85            97  2020-06-30       Falcon 9  CCSFS SLC 40 -80.577366   
86            98  2020-07-20       Falcon 9  CCSFS SLC 40 -80.577366   
87           100  2020-08-18       Falcon 9  CCSFS SLC 40 -80.577366   
88           101  2020-08-30       Falcon 9  CCSFS SLC 40 -80.577366   
89           102  2020-09-03       Falcon 9    KSC LC 39A -80.603956   
90           103  2020-10-06       Falcon 9    KSC LC 39A -80.603956   
91           104  2020-10-18       Falcon 9    KSC LC 39A -80.603956   
92           105  2020-10-24       Falcon 9  CCSFS SLC 40 -80.577366   
93           106  2020-11-05       Falcon 9  CCSFS SLC 40 -80.577366   

  

In [97]:
# Salvar o DataFrame
X.to_csv('spacex_data_final.csv', index=False)
print("\n‚úÖ Dataset final salvo como 'spacex_data_final.csv'")

print(f"\nüìä RESUMO FINAL:")
print(f"   Linhas de dados: {len(X)}")
print(f"   Features: {len(X.columns)-1}")
print(f"   Target (LandingClass): Sim")
print(f"   Per√≠odo: 2006 - 2020")
print(f"   Pronto para Machine Learning: ‚úÖ")



‚úÖ Dataset final salvo como 'spacex_data_final.csv'

üìä RESUMO FINAL:
   Linhas de dados: 94
   Features: 11
   Target (LandingClass): Sim
   Per√≠odo: 2006 - 2020
   Pronto para Machine Learning: ‚úÖ


In [98]:
# 1. An√°lise Explorat√≥ria (EDA)
print("\n1Ô∏è‚É£ AN√ÅLISE EXPLORAT√ìRIA:")
print(f"   Boosters √∫nicos: {X['BoosterVersion'].nunique()}")
print(f"   Locais de lan√ßamento: {X['LaunchSite'].nunique()}")
print(f"   √ìrbitas diferentes: {X['Orbit'].nunique()}")

# 2. Visualiza√ß√µes
print(f"\n2Ô∏è‚É£ VISUALIZA√á√ïES:")
print("   Booster Version distribution")
print("   Launch Site distribution")
print("   Landing Success rate by LaunchSite")

# 3. Machine Learning
print(f"\n3Ô∏è‚É£ MACHINE LEARNING:")
print("   Train/Test split")
print("   Model training (Decision Tree, SVM, etc)")
print("   Model evaluation")



1Ô∏è‚É£ AN√ÅLISE EXPLORAT√ìRIA:
   Boosters √∫nicos: 2
   Locais de lan√ßamento: 4
   √ìrbitas diferentes: 11

2Ô∏è‚É£ VISUALIZA√á√ïES:
   Booster Version distribution
   Launch Site distribution
   Landing Success rate by LaunchSite

3Ô∏è‚É£ MACHINE LEARNING:
   Train/Test split
   Model training (Decision Tree, SVM, etc)
   Model evaluation


### Task 2: Filter the dataframe to only include `Falcon 9` launches


Finally we will remove the Falcon 1 launches keeping only the Falcon 9 launches. Filter the data dataframe using the <code>BoosterVersion</code> column to only keep the Falcon 9 launches. Save the filtered data to a new dataframe called <code>data_falcon9</code>.


In [None]:
# Hint data['BoosterVersion']!='Falcon 1'


Now that we have removed some values we should reset the FlgihtNumber column


In [100]:
print("="*70)
print("FILTRANDO APENAS FALCON 9")
print("="*70)

# PASSO 1: Ver distribui√ß√£o de boosters
print("\n1Ô∏è‚É£ Distribui√ß√£o de boosters antes da filtragem:")
print(X['BoosterVersion'].value_counts())

# PASSO 2: Filtrar apenas Falcon 9
print(f"\n2Ô∏è‚É£ Filtrando dados:")
print(f"   Antes: {len(X)} linhas")

# Usar a hint: X['BoosterVersion'] != 'Falcon 1'
data_falcon9 = X[X['BoosterVersion'] != 'Falcon 1'].copy()

print(f"   Depois: {len(data_falcon9)} linhas")
print(f"   Removidas: {len(X) - len(data_falcon9)} linhas (Falcon 1)")

# PASSO 3: Resetar o √≠ndice
print(f"\n3Ô∏è‚É£ Resetando √≠ndice:")
print(f"   √çndice antigo: {data_falcon9.index.tolist()[:10]}...")

data_falcon9 = data_falcon9.reset_index(drop=True)

print(f"   √çndice novo: {data_falcon9.index.tolist()[:10]}...")
print(f"   ‚úÖ √çndice resetado de 0 a {len(data_falcon9)-1}")

# PASSO 4: Resetar FlightNumber column (renumerar de 1)
print(f"\n4Ô∏è‚É£ Resetando FlightNumber column:")
print(f"   N√∫meros antigos: {data_falcon9['FlightNumber'].tolist()[:10]}")

# Renumerar de 1 a total de linhas
data_falcon9['FlightNumber'] = range(1, len(data_falcon9) + 1)

print(f"   N√∫meros novos: {data_falcon9['FlightNumber'].tolist()[:10]}")
print(f"   ‚úÖ FlightNumber resetado de 1 a {len(data_falcon9)}")

# PASSO 5: Verificar resultado
print(f"\n5Ô∏è‚É£ Verifica√ß√£o final:")
print(f"   Boosters √∫nicos: {data_falcon9['BoosterVersion'].unique()}")
print(f"   Total de linhas: {len(data_falcon9)}")
print(f"   √çndice come√ßa em: {data_falcon9.index[0]}")
print(f"   √çndice termina em: {data_falcon9.index[-1]}")

# PASSO 6: Visualizar resultado
print(f"\nüìã Primeiras 10 linhas de data_falcon9:")
print(data_falcon9.head(10))

print(f"\nüìã √öltimas 5 linhas:")
print(data_falcon9.tail())

# PASSO 7: Informa√ß√µes
print(f"\nüìà Informa√ß√µes do dataset final:")
print(data_falcon9.info())

print("\n" + "="*70)
print("‚úÖ FALCON 9 FILTRADO E PRONTO!")
print("="*70)


FILTRANDO APENAS FALCON 9

1Ô∏è‚É£ Distribui√ß√£o de boosters antes da filtragem:
Falcon 9    90
Falcon 1     4
Name: BoosterVersion, dtype: int64

2Ô∏è‚É£ Filtrando dados:
   Antes: 94 linhas
   Depois: 90 linhas
   Removidas: 4 linhas (Falcon 1)

3Ô∏è‚É£ Resetando √≠ndice:
   √çndice antigo: [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]...
   √çndice novo: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]...
   ‚úÖ √çndice resetado de 0 a 89

4Ô∏è‚É£ Resetando FlightNumber column:
   N√∫meros antigos: [6, 8, 10, 11, 12, 13, 14, 15, 16, 17]
   N√∫meros novos: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
   ‚úÖ FlightNumber resetado de 1 a 90

5Ô∏è‚É£ Verifica√ß√£o final:
   Boosters √∫nicos: ['Falcon 9']
   Total de linhas: 90
   √çndice come√ßa em: 0
   √çndice termina em: 89

üìã Primeiras 10 linhas de data_falcon9:
   FlightNumber        Date BoosterVersion    LaunchSite   Longitude  \
0             1  2010-06-04       Falcon 9  CCSFS SLC 40  -80.577366   
1             2  2012-05-22       Falcon 9  CCSFS SLC 40  -80.57736

In [101]:
# Salvar data_falcon9
data_falcon9.to_csv('data_falcon9.csv', index=False)
print("\n‚úÖ DataFrame 'data_falcon9' salvo como 'data_falcon9.csv'")

# Resumo
print(f"\nüìä RESUMO:")
print(f"   Linhas: {len(data_falcon9)}")
print(f"   Colunas: {len(data_falcon9.columns)}")
print(f"   Apenas Falcon 9: ‚úÖ")
print(f"   √çndice resetado: ‚úÖ")
print(f"   FlightNumber corrigido: ‚úÖ")
print(f"   Pronto para pr√≥xima task: ‚úÖ")



‚úÖ DataFrame 'data_falcon9' salvo como 'data_falcon9.csv'

üìä RESUMO:
   Linhas: 90
   Colunas: 12
   Apenas Falcon 9: ‚úÖ
   √çndice resetado: ‚úÖ
   FlightNumber corrigido: ‚úÖ
   Pronto para pr√≥xima task: ‚úÖ


## Data Wrangling


We can see below that some of the rows are missing values in our dataset.


In [103]:
print("="*70)
print("LIMPEZA DE DADOS - DATA WRANGLING")
print("="*70)

# Copiar dados
data_processed = data_falcon9.copy()

print("\nüìä Valores faltantes ANTES:")
print(data_processed.isnull().sum())

# Preencher valores num√©ricos com 0
data_processed['PayloadMass'] = data_processed['PayloadMass'].fillna(0)
data_processed['Block'] = data_processed['Block'].fillna(0)
data_processed['ReusedCount'] = data_processed['ReusedCount'].fillna(0)

print("\n‚úÖ Valores faltantes DEPOIS:")
print(data_processed.isnull().sum())

# Verificar resultado
if data_processed.isnull().sum().sum() == 0:
    print("\nüéâ DATASET COMPLETAMENTE LIMPO!")
    print(f"   Total de linhas: {len(data_processed)}")
    print(f"   Total de colunas: {len(data_processed.columns)}")
else:
    print(f"\n‚ö†Ô∏è Ainda h√° {data_processed.isnull().sum().sum()} valores faltantes")

# Salvar
data_processed.to_csv('data_falcon9_processed.csv', index=False)
print("\n‚úÖ Dataset salvo como 'data_falcon9_processed.csv'")


LIMPEZA DE DADOS - DATA WRANGLING

üìä Valores faltantes ANTES:
FlightNumber      0
Date              0
BoosterVersion    0
LaunchSite        0
Longitude         0
Latitude          0
PayloadMass       5
Orbit             0
Block             0
ReusedCount       0
Serial            0
LandingClass      0
dtype: int64

‚úÖ Valores faltantes DEPOIS:
FlightNumber      0
Date              0
BoosterVersion    0
LaunchSite        0
Longitude         0
Latitude          0
PayloadMass       0
Orbit             0
Block             0
ReusedCount       0
Serial            0
LandingClass      0
dtype: int64

üéâ DATASET COMPLETAMENTE LIMPO!
   Total de linhas: 90
   Total de colunas: 12

‚úÖ Dataset salvo como 'data_falcon9_processed.csv'


Before we can continue we must deal with these missing values. The <code>LandingPad</code> column will retain None values to represent when landing pads were not used.


### Task 3: Dealing with Missing Values


Calculate below the mean for the <code>PayloadMass</code> using the <code>.mean()</code>. Then use the mean and the <code>.replace()</code> function to replace `np.nan` values in the data with the mean you calculated.


In [104]:
# Calculate the mean value of PayloadMass column
import numpy as np

print("="*70)
print("TASK 3: DEALING WITH MISSING VALUES")
print("="*70)

# PASSO 1: Calcular a m√©dia de PayloadMass
print("\n1Ô∏è‚É£ Calculando a m√©dia de PayloadMass:")
mean_PayloadMass = data_processed['PayloadMass'].mean()
print(f"   M√©dia: {mean_PayloadMass:.2f}")

# Ver quantos NaN tem
nan_count_before = data_processed['PayloadMass'].isnull().sum()
print(f"   Valores faltantes (NaN): {nan_count_before}")

# PASSO 2: Preencher NaN com a m√©dia usando .replace()
print("\n2Ô∏è‚É£ Preenchendo NaN com a m√©dia:")
data_processed['PayloadMass'] = data_processed['PayloadMass'].replace(np.nan, mean_PayloadMass)

# Ou alternativa: fillna()
# data_processed['PayloadMass'] = data_processed['PayloadMass'].fillna(mean_PayloadMass)

# PASSO 3: Verificar resultado
nan_count_after = data_processed['PayloadMass'].isnull().sum()
print(f"   Valores faltantes depois: {nan_count_after}")
print(f"   ‚úÖ {nan_count_before} valores preenchidos!")

# PASSO 4: Visualizar os dados
print("\n3Ô∏è‚É£ Amostra dos dados:")
print(data_processed['PayloadMass'].head(20))

print("\n4Ô∏è‚É£ Estat√≠sticas de PayloadMass:")
print(f"   M√≠nimo: {data_processed['PayloadMass'].min():.2f}")
print(f"   M√°ximo: {data_processed['PayloadMass'].max():.2f}")
print(f"   M√©dia: {data_processed['PayloadMass'].mean():.2f}")
print(f"   Mediana: {data_processed['PayloadMass'].median():.2f}")
print(f"   Desvio padr√£o: {data_processed['PayloadMass'].std():.2f}")

print("\n5Ô∏è‚É£ Verifica√ß√£o final:")
print(f"   Total de NaN em todo dataset: {data_processed.isnull().sum().sum()}")

if data_processed.isnull().sum().sum() == 0:
    print("\n‚úÖ DATASET COMPLETAMENTE LIMPO!")
else:
    print(f"\n‚ö†Ô∏è Ainda h√° valores faltantes em outras colunas")

print("\n" + "="*70)
print("‚úÖ TASK 3 CONCLU√çDA!")
print("="*70)

# PASSO 6: Salvar dados atualizados
data_processed.to_csv('data_falcon9_processed.csv', index=False)
print("\n‚úÖ Dataset atualizado salvo como 'data_falcon9_processed.csv'")


TASK 3: DEALING WITH MISSING VALUES

1Ô∏è‚É£ Calculando a m√©dia de PayloadMass:
   M√©dia: 5783.35
   Valores faltantes (NaN): 0

2Ô∏è‚É£ Preenchendo NaN com a m√©dia:
   Valores faltantes depois: 0
   ‚úÖ 0 valores preenchidos!

3Ô∏è‚É£ Amostra dos dados:
0        0.0
1      525.0
2      677.0
3      500.0
4     3170.0
5     3325.0
6     2296.0
7     1316.0
8     4535.0
9     4428.0
10    2216.0
11    2395.0
12     570.0
13    1898.0
14    4707.0
15    2477.0
16    2034.0
17     553.0
18    5271.0
19    3136.0
Name: PayloadMass, dtype: float64

4Ô∏è‚É£ Estat√≠sticas de PayloadMass:
   M√≠nimo: 0.00
   M√°ximo: 15600.00
   M√©dia: 5783.35
   Mediana: 4315.00
   Desvio padr√£o: 4937.86

5Ô∏è‚É£ Verifica√ß√£o final:
   Total de NaN em todo dataset: 0

‚úÖ DATASET COMPLETAMENTE LIMPO!

‚úÖ TASK 3 CONCLU√çDA!

‚úÖ Dataset atualizado salvo como 'data_falcon9_processed.csv'


In [105]:
data_falcon9['PayloadMass'].isnull().sum()


5

In [106]:
data_falcon9.to_csv('dataset_part_1.csv', index=False)

You should see the number of missing values of the <code>PayLoadMass</code> change to zero.


Now we should have no missing values in our dataset except for in <code>LandingPad</code>.


We can now export it to a <b>CSV</b> for the next section,but to make the answers consistent, in the next lab we will provide data in a pre-selected date range. 


<code>data_falcon9.to_csv('dataset_part_1.csv', index=False)</code>


## Authors


<a href="https://www.linkedin.com/in/joseph-s-50398b136/">Joseph Santarcangelo</a> has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD. 


<!--## Change Log
-->


<!--

|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2020-09-20|1.1|Joseph|get result each time you run|
|2020-09-20|1.1|Azim |Created Part 1 Lab using SpaceX API|
|2020-09-20|1.0|Joseph |Modified Multiple Areas|
-->


Copyright ¬©IBM Corporation. All rights reserved.
