<p style="text-align:center">
    <a href="https://skills.network/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


<h1 style="text-align:left;padding:12px;border-radius:4px;color:#fff;margin-top:16px;background:linear-gradient(90deg,#114477,#114477,#000);box-shadow:0 0 0px #333;">Andrés Salamone Lacunza</h1>

> **Date Completed:** 2025-05-21
> **Note:** This notebook is an adaptation of the original work by JonathanMClark, modified to reflect my own contributions and updates for the IBM Data Science Professional Certificate Capstone Project.

# **SpaceX Falcon 9 First Stage Landing Prediction**
## Lab 1: Collecting the Data

Estimated time needed: **45** minutes


In this capstone, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. In this lab, you will collect and make sure the data is in the correct format from an API. The following is an example of a successful launch.


#### Successful Landing Example
![Successful Landing](images/successful_landing.jpg)

**Note:** You need to upload an image of a successful Falcon 9 landing to your GitHub repository in the `images/` folder and update the link above accordingly.

Several examples of an unsuccessful landing are shown here:


![Unsuccessful Landing](images/unsuccessful_landing.jpg)

**Note:** You need to upload an image of an unsuccessful Falcon 9 landing to your GitHub repository in the `images/` folder and update the link above accordingly.

Most unsuccessful landings are planned. SpaceX performs a controlled landing in the oceans.


<h3 style="width:50%;padding:12px;border-radius:4px;color:#fff;margin-top:16px;background:linear-gradient(90deg,#992200,#000);box-shadow:0 0 0px #333;">Objectives</h3>

* Get data from SpaceX API
* Clean/Wrangle/Format the data set

<h3 style="width:50%;padding:12px;border-radius:4px;color:#fff;margin-top:16px;background:linear-gradient(90deg,#009922,#000);box-shadow:0 0 0px #333;">Import Libraries</h3>

In [1]:
import requests
import pandas as pd
import numpy as np
import datetime

# Configure pandas to display all columns and full column content for better visibility of the data
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

print("All libraries have been imported.")

All libraries have been imported.


<h3 style="width:50%;padding:12px;border-radius:4px;color:#fff;margin-top:16px;background:linear-gradient(90deg,#009922,#000);box-shadow:0 0 0px #333;">Define Useful Functions</h3>

> **Note** <br>
> These code bits were provided.

In [2]:
# NOTE: This code was provided.
# Takes the dataset and uses the rocket column to call the API and append the booster version to the DataFrame
def getBoosterVersion(data):
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])

From the <code>launchpad</code> we would like to know the name of the launch site being used, the longitude, and the latitude.


In [3]:
# NOTE: This code was provided.
# Takes the dataset and uses the launchpad column to call the API and append the latitude and longitude to the DataFrame
def getLaunchSite(data):
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])

From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to.


In [4]:
# NOTE: This code was provided.
# Takes the dataset and uses the payloads column to call the API and append the payload mass to the DataFrame
def getPayloadData(data):
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to separate version of cores, the number of times this specific core has been reused, and the serial of the core.


In [5]:
# NOTE: This code was provided.
# Takes the dataset and uses the cores column to call the API and append the data about the cores to the DataFrame
def getCoreData(data):
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

<h3 style="width:50%;padding:12px;border-radius:4px;color:#fff;margin-top:16px;background:linear-gradient(90deg,#0099bb,#000);box-shadow:0 0 0px #333;">Start Here</h3>

<h3 style="width:50%;padding:12px;border-radius:4px;color:#fff;margin-top:16px;background:linear-gradient(90deg,#009922,#000);box-shadow:0 0 0px #333;">DataFrame of Launch Data - All</h3>

<h3 style="width:15%;padding:4px 16px 4px 32px;border-radius:4px;color:#fff;margin-top:6px;background:linear-gradient(90deg,#ff5500,#551100,#ff5500);box-shadow:0 0 0px #333;text-align:center;">Task 1</h3>

### Task 1: Request and Parse the SpaceX Launch Data Using the GET Request


In [6]:
# Convert JSON file into DataFrame
static_json_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'
response = requests.get(static_json_url)
response_json = response.json()
data_initial = pd.json_normalize(response_json)
data_initial.head(1)

In [7]:
data_initial.shape

In [8]:
# View column names
pd.DataFrame(data_initial.columns)

**Comentario personal:** En esta tarea, utilicé la API proporcionada para recolectar datos de lanzamientos de SpaceX y los convertí en un DataFrame usando `pd.json_normalize`. Aprendí cómo manejar datos JSON y extraer información relevante para análisis posteriores.


<h3 style="width:50%;padding:12px;border-radius:4px;color:#fff;margin-top:16px;background:linear-gradient(90deg,#009922,#000);box-shadow:0 0 0px #333;">DataFrame of Launch Data - Selected Information</h3>

You will notice that a lot of the data are IDs. For example the rocket column has no information about the rocket just an identification number.

We will now use the API again to get information about the launches using the IDs given for each launch. Specifically we will be using columns <code>rocket</code>, <code>payloads</code>, <code>launchpad</code>, and <code>cores</code>.


In [9]:
# Lets take a subset of our dataframe keeping only the features we want and the flight number, and date_utc.
data = data_initial[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# We will remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
data = data[data['cores'].map(len)==1]
data = data[data['payloads'].map(len)==1]

# Since payloads and cores are lists of size 1 we will also extract the single value in the list and replace the feature.
data['cores'] = data['cores'].map(lambda x : x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

# We also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Using the date we will restrict the dates of the launches
data = data[data['date'] <= datetime.date(2020, 11, 13)]

* From the <code>rocket</code> we would like to learn the booster name

* From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to

* From the <code>launchpad</code> we would like to know the name of the launch site being used, the longitude, and the latitude.

* From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to separate version of cores, the number of times this specific core has been reused, and the serial of the core.

The data from these requests will be stored in lists and will be used to create a new dataframe.


In [10]:
# Set global variables to be empty lists
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

In [11]:
# Confirm list to be empty
BoosterVersion

In [12]:
# Call getBoosterVersion
getBoosterVersion(data)

In [13]:
# Call getLaunchSite
getLaunchSite(data)

In [14]:
# Call getPayloadData
getPayloadData(data)

In [15]:
# Call getCoreData
getCoreData(data)

In [16]:
# The lists have now been updated
BoosterVersion[0:5]

In [17]:
# Combine the columns into a dictionary
launch_dict = {'FlightNumber': list(data['flight_number']),
               'Date': list(data['date']),
               'BoosterVersion': BoosterVersion,
               'PayloadMass': PayloadMass,
               'Orbit': Orbit,
               'LaunchSite': LaunchSite,
               'Outcome': Outcome,
               'Flights': Flights,
               'GridFins': GridFins,
               'Reused': Reused,
               'Legs': Legs,
               'LandingPad': LandingPad,
               'Block': Block,
               'ReusedCount': ReusedCount,
               'Serial': Serial,
               'Longitude': Longitude,
               'Latitude': Latitude}

In [18]:
# Create a DataFrame from launch_dict
launch_df = pd.DataFrame(launch_dict)
launch_df.head(3)

In [19]:
launch_df.shape

<h3 style="width:15%;padding:4px 16px 4px 32px;border-radius:4px;color:#fff;margin-top:6px;background:linear-gradient(90deg,#ff5500,#551100,#ff5500);box-shadow:0 0 0px #333;text-align:center;">Task 2</h3>

### Task 2: Filter the DataFrame to Only Include `Falcon 9` Launches


In [20]:
# Quantify types of booster versions.
launch_df['BoosterVersion'].value_counts()

In [21]:
# Filter launches to include only Falcon 9
data_falcon_9 = launch_df.loc[launch_df['BoosterVersion'].isin(['Falcon 9'])].copy()

# Reset the FlightNumber column
data_falcon_9['FlightNumber'] = list(range(1, data_falcon_9.shape[0]+1))
data_falcon_9.head(2)

In [22]:
# Confirm that only the Falcon 9 booster is included.
data_falcon_9['BoosterVersion'].value_counts()

**Comentario personal:** Filtré los datos para incluir solo lanzamientos de Falcon 9, lo que me permitió enfocarme en el objetivo del proyecto. También ajusté el índice de `FlightNumber` para que fuera secuencial, lo que facilita el seguimiento de los lanzamientos.


In [23]:
data_falcon_9.shape

In [24]:
data_falcon_9.describe()

## Data Wrangling


In [25]:
# There are some missing values in the dataset
data_falcon_9.isnull().sum()

<h3 style="width:15%;padding:4px 16px 4px 32px;border-radius:4px;color:#fff;margin-top:6px;background:linear-gradient(90deg,#ff5500,#551100,#ff5500);box-shadow:0 0 0px #333;text-align:center;">Task 3</h3>

### Task 3: Dealing with Missing Values


Calculate below the mean for the <code>PayloadMass</code> using the <code>.mean()</code>. Then use the mean and the <code>.replace()</code> function to replace `np.nan` values in the data with the mean you calculated. The <code>LandingPad</code> column will retain None values to represent when landing pads were not used.


In [26]:
# Calculate the mean value of the values in the PayloadMass column and replace the np.nan values with this mean value
mean = data_falcon_9['PayloadMass'].mean()
data_falcon_9['PayloadMass'] = data_falcon_9['PayloadMass'].replace(np.nan, mean)

In [27]:
# There are now no missing values for 'PayloadMass'. We keep the 'None' values in the 'LandingPad' column to represent when landing pads were not used.
data_falcon_9.isnull().sum()

**Comentario personal:** Manejar valores nulos fue un paso importante para asegurar la calidad de los datos. Reemplacé los valores nulos en `PayloadMass` con la media, lo que me permitió mantener la integridad del conjunto de datos para análisis futuros.


<h3 style="width:50%;padding:12px;border-radius:4px;color:#fff;margin-top:16px;background:linear-gradient(90deg,#009922,#000);box-shadow:0 0 0px #333;">Export DataFrame to .CSV</h3>

> **Note** <br>
> dataset_part_1.csv

In [28]:
# Export DataFrame as .csv
data_falcon_9.to_csv('dataset_part_1.csv', index=False)