### Final Project Part 1

In this capstone, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. In this lab, you will collect and make sure the data is in the correct format from an API. The following is an example of a successful and launch.

Objectives
+In this lab, you will make a get request to the SpaceX API. You will also do some basic data wrangling and formating.
+ Request to the SpaceX API
+ Clean the requested data

## Import Libraries and Define Auxiliary Functions


In [1]:
from mis_funciones import check_valores_nulos,calcular_media,remplazar_por_media
import pandas as pd  # Import the pandas library for data manipulation and analysis
import numpy as np  # Import the numpy library for numerical operations
import requests  # Import the requests library to make HTTP requests
import datetime  # Import the datetime module to work with date and time


In [2]:
# Configure pandas to not truncate columns when printing
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.expand_frame_repr', False)  # Prevent the DataFrame from expanding into multiple lines
pd.set_option('display.max_rows', None)  # Show all rows
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.width', None)  # Automatically adjust display width
pd.set_option('display.max_colwidth', None)  # Maximum column width without truncation

# Configure pandas to not display scientific notation
pd.set_option('display.float_format', '{:.2f}'.format)  # Display numbers with two decimal places

Below we will define a series of helper functions that will help us use the API to extract information using identification numbers in the launch data.

+ From the <code>rocket</code> column we would like to learn the booster name.

In [3]:
# The following function uses the 'rocket' column of a dataset to obtain information
# from the SpaceX API about each rocket. To do this, it iterates over each rocket identifier,
# constructs a specific URL by concatenating the identifier with the base URL, sends an HTTP GET request,
# and finally extracts the rocket name from the obtained JSON to store it in the BoosterVersion list.
def getBoosterVersion(data):
	# Iterate over each rocket identifier in the 'rocket' column of the dataset.
	for x in data['rocket']:
		# Check that the identifier 'x' has a valid value (i.e., it is not None or empty).
		if x:
			# URL construction:
			# - A fixed base URL is used: "https://api.spacexdata.com/v4/rockets/"
			# - The identifier 'x' is converted to a string with str(x) to ensure it is text.
			# - The base URL is concatenated with the converted identifier to form a complete and specific URL.
			#   For example, if x is "5e9d0d95eda69955f709d1eb", the resulting URL will be:
			#   "https://api.spacexdata.com/v4/rockets/5e9d0d95eda69955f709d1eb"
			# This URL is used to request data for the rocket corresponding to that identifier.
			response = requests.get("https://api.spacexdata.com/v4/rockets/" + str(x)).json()
			
			# Response processing:
			# - The API returns a response in JSON format containing various rocket data.
			# - The .json() method converts that response into a Python dictionary.
			# - The value associated with the 'name' key is extracted, which contains the rocket name (e.g., "Falcon 9").
			# - This name is added to the global BoosterVersion list.
			BoosterVersion.append(response['name'])


From the launchpad we would like to know the name of the launch site being used, the logitude, and the latitude.

In [4]:
# This function receives a dataset (data) and uses the 'launchpad' column
# which contains unique identifiers for launchpads.
# For each identifier, it constructs a specific URL for the SpaceX API,
# sends a GET request to retrieve the corresponding data, and then extracts:
# - The longitude
# - The latitude
# - The name of the launch site
# Finally, it adds this information to three separate lists: Longitude, Latitude, and LaunchSite.

def getLaunchSite(data):
	# Iterate over each launchpad identifier in the 'launchpad' column
	for x in data['launchpad']:
		# Check that the identifier is not empty or null (i.e., it has a valid value)
		if x:
			# Dynamic URL construction:
			# - Start with the base URL for the SpaceX API for launchpads:
			#   "https://api.spacexdata.com/v4/launchpads/"
			# - Convert the identifier 'x' to text (in case it is a number or another type).
			# - Concatenate it with the base URL, forming a complete and unique URL for each launchpad.
			#   For example, if x = "5e9e4502f5090995de566f86", the final URL will be:
			#   "https://api.spacexdata.com/v4/launchpads/5e9e4502f5090995de566f86"
			response = requests.get("https://api.spacexdata.com/v4/launchpads/" + str(x)).json()

			# Process the response:
			# - The API response is converted into a Python dictionary using .json().
			# - The dictionary contains data about the launchpad.
			# - Extract the following specific fields:
			#   - 'longitude': geographical longitude of the site
			#   - 'latitude': geographical latitude of the site
			#   - 'name': name of the launch site (e.g., "Cape Canaveral", "VAFB SLC 4E")
			# - Add each field to its corresponding list.
			Longitude.append(response['longitude'])  # Store the longitude of the site
			Latitude.append(response['latitude'])    # Store the latitude of the site
			LaunchSite.append(response['name'])      # Store the name of the site


#### From the payload we would like to learn the mass of the payload and the orbit that it is going to.

In [5]:
# This function takes a dataset (data) as input, specifically using the 'payloads' column,
# which contains unique identifiers for payloads associated with each launch.
# For each identifier in the column:
# 1. Dynamically constructs a URL to access the SpaceX API.
# 2. Makes an HTTP GET request to retrieve the payload details.
# 3. Extracts two key pieces of data: the payload mass in kilograms and the target orbit.
# 4. Adds this information to the PayloadMass and Orbit lists, respectively.

def getPayloadData(data):
	# Iterate over each payload identifier in the 'payloads' column
	for load in data['payloads']:
		# Verify that the identifier exists and is not empty (i.e., it has a valid value)
		if load:
			# Dynamic URL construction:
			# - Start with the base URL of the SpaceX API for accessing payloads:
			#   "https://api.spacexdata.com/v4/payloads/"
			# - Then concatenate the unique identifier (load) at the end of this URL.
			#   This allows us to form a custom URL pointing to the information for that specific payload.
			#   For example, if load = "5eb0e4b5b6c3bb0006eeb1e1", the final URL will be:
			#   "https://api.spacexdata.com/v4/payloads/5eb0e4b5b6c3bb0006eeb1e1"
			response = requests.get("https://api.spacexdata.com/v4/payloads/" + load).json()

			# JSON response processing:
			# - The .json() function converts the response into a Python dictionary.
			# - From this dictionary, we extract:
			#     - 'mass_kg': which represents the total mass of the payload in kilograms.
			#     - 'orbit': which represents the type of orbit the payload was sent to (e.g., "LEO", "GTO", etc.).
			# - Then we save this data into their respective lists.
			PayloadMass.append(response['mass_kg'])  # Add the payload mass (in kg)
			Orbit.append(response['orbit'])         # Add the type of orbit the payload was sent to


From the cores (main boosters), we would like to learn the outcome of the landing, the type of landing, the number of flights made with that core, whether gridfins (aerodynamic control surfaces) were used, whether the core was reused (used in more than one launch), whether landing legs were used, the landing pad used, the block of the core (a number used to distinguish different versions of cores), the number of times this specific core has been reused, and the serial number of the core (unique identifier of the booster).

In [6]:
# ----------------------------------------------------------------------------------
# Function: getCoreData
# ----------------------------------------------------------------------------------
# This function receives a DataFrame (called 'data') and loops through the 'cores' column.
# Each entry in that column is a dictionary with information about the core (rocket booster)
# used in a SpaceX launch.
# The function extracts different data points from each core and stores them in
# lists that are defined outside this function (such as Block, ReusedCount, etc.).
# ----------------------------------------------------------------------------------

def getCoreData(data):
    
    # Loop through each row (each dictionary) in the 'cores' column
    for core in data['cores']:
        
        # If the value for 'core' ID is NOT None, it means it's a valid core
        if core['core'] != None:
            
            # Make a request to the SpaceX API using the core ID
            # The response is a dictionary with detailed information about the core
            response = requests.get("https://api.spacexdata.com/v4/cores/" + core['core']).json()
            
            # Extract the 'block' number from the response and save it in the Block list
            Block.append(response['block'])
            
            # Save how many times the core was reused in the ReusedCount list
            ReusedCount.append(response['reuse_count'])
            
            # Save the serial number of the core in the Serial list
            Serial.append(response['serial'])
        
        else:
            # If the core ID is None (no data), store None in the corresponding lists
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)
        
        # Combine landing success ('landing_success') and landing type ('landing_type')
        # and store it as a single string in the Outcome list
        # Example: "True RTLS", "False Ocean", etc.
        Outcome.append(str(core['landing_success']) + ' ' + str(core['landing_type']))
        
        # Save the number of flights the core has made (from the current data) to the Flights list
        Flights.append(core['flight'])
        
        # Save whether the core used grid fins to the GridFins list
        GridFins.append(core['gridfins'])
        
        # Save whether the core was reused to the Reused list
        Reused.append(core['reused'])
        
        # Save whether the core had landing legs to the Legs list
        Legs.append(core['legs'])
        
        # Save the ID of the landing pad to the LandingPad list
        LandingPad.append(core['landpad'])



Now let's start requesting rocket launch data from SpaceX API with the following URL:

In [7]:
spacex_url="https://api.spacexdata.com/v4/launches/past"

In [8]:
response = requests.get(spacex_url)

In [27]:
print(response.content[:100])

b'[{"fairings": {"reused": false, "recovery_attempt": false, "recovered": false, "ships": []}, "links"'


### Task 1: Request and parse the SpaceX launch data using the GET request

to make the requested JSON results more consistent, we will use the following static response object for this project:

In [10]:
static_json_url='https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'

#We should see that the request was successfull with the 200 status response code
response=requests.get(static_json_url)
response.status_code




200

#### Now we decode the response content as a Json using <code>.json()</code> and turn it into a Pandas dataframe using <code>.json_normalize()</code>

In [11]:
data_json=response.json()
data=pd.json_normalize(data_json)
data.head(1)

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,tbd,net,window,rocket,success,details,crew,ships,capsules,payloads,launchpad,auto_update,failures,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142553600.0,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,True,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png,https://images2.imgbox.com/40/e3/GypSkayF_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,


You will notice that a lot of the data are IDs. For example the rocket column has no information about the rocket just an identification number.

We will now use the API again to get information about the launches using the IDs given for each launch. Specifically we will be using columns <code>rocket</code>, <code>payloads</code>, <code>launchpad</code>, and <code>cores</code>.


In [12]:
# ----------------------------------------------------------------------------------
# Selecting relevant columns:
# ----------------------------------------------------------------------------------
# Create a subset of the original DataFrame by retaining only the columns that contain
# the information relevant to the analysis. We keep:
#   - 'rocket'        : Identifier or name of the rocket used in the launch.
#   - 'payloads'      : List containing information about the payloads associated with the launch.
#   - 'launchpad'     : Identifier or location of the launchpad.
#   - 'cores'         : List containing information about the rocket cores (boosters) used in the launch.
#   - 'flight_number' : Unique flight number assigned to the launch.
#   - 'date_utc'      : Date and time of the launch in UTC format.
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]
data.head(1)

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc
0,5e9d0d95eda69955f709d1eb,[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",1,2006-03-24T22:30:00.000Z


In [13]:
# ----------------------------------------------------------------------------------
# Filtering rows to simplify the analysis:
# ----------------------------------------------------------------------------------
# In the original DataFrame, the 'cores' and 'payloads' columns contain lists with
# one or more dictionaries. Since launches involving multiple cores or payloads
# can complicate the analysis, we want to keep only the launches that have exactly
# one core and one payload. We achieve this by filtering rows based on the length of
# the lists in these columns.

# ------------------------------------------------------------------------------
# Step 1: Filter launches with exactly one core ('cores')
# ------------------------------------------------------------------------------
# - data['cores'].map(len):
#     This applies the len() function to each cell in the 'cores' column.
#     Since each cell is a list, it returns the number of elements in that list 
#     (i.e., the number of cores for that launch).
#
# - data['cores'].map(len) == 1:
#     This compares the length to 1, producing a boolean Series where True indicates 
#     that the list contains exactly one element (a single core).
#
# - data[data['cores'].map(len) == 1]:
#     This filters the DataFrame to keep only the rows where the 'cores' column has 
#     exactly one core.
data = data[data['cores'].map(len) == 1]

# ------------------------------------------------------------------------------
# Step 2: Filter launches with exactly one payload ('payloads')
# ------------------------------------------------------------------------------
# The same process is applied to the 'payloads' column. This line filters the DataFrame
# to retain only the rows where the 'payloads' list has exactly one element.
# The result is a DataFrame where:
#   - Each launch has exactly one core in the 'cores' column.
#   - Each launch has exactly one payload in the 'payloads' column.
data = data[data['payloads'].map(len) == 1]

# ------------------------------------------------------------------------------
# Step 3: Extract the single dictionary from the 'cores' and 'payloads' lists
# ------------------------------------------------------------------------------
# Since each cell in 'cores' and 'payloads' now contains a list with exactly one element,
# we extract that single value (a dictionary) and replace the list with the dictionary itself.
#
# - For 'cores', the lambda function lambda x: x[0] returns the first (and only) element of the list.
# - The same is done for 'payloads'.
data['cores'] = data['cores'].map(lambda x: x[0])
data['payloads'] = data['payloads'].map(lambda x: x[0])

# ------------------------------------------------------------------------------
# Step 4: Convert 'date_utc' to a datetime object and extract only the date
# ------------------------------------------------------------------------------
# The 'date_utc' column contains the launch date and time as a string.
# - pd.to_datetime(data['date_utc']) converts this string to a datetime object.
# - The dt.date accessor extracts just the date component (year, month, day), removing the time.
# A new column 'date' is created with this date information.
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# ------------------------------------------------------------------------------
# Step 5: Filter the DataFrame based on the launch date
# ------------------------------------------------------------------------------
# Using the newly created 'date' column, we filter the DataFrame to keep only the launches
# that occurred on or before November 13, 2020. This helps narrow the analysis to a specific
# timeframe.
data = data[data['date'] <= datetime.date(2020, 11, 13)]


In [14]:
data.head(1)

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc,date
0,5e9d0d95eda69955f709d1eb,5eb0e4b5b6c3bb0006eeb1e1,5e9e4502f5090995de566f86,"{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",1,2006-03-24T22:30:00.000Z,2006-03-24


* From the <code>rocket</code> we would like to learn the booster name

* From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to

* From the <code>launchpad</code> we would like to know the name of the launch site being used, the longitude, and the latitude.

* **From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.**

The data from these requests will be stored in lists and will be used to create a new dataframe.


In [15]:
#Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

### 🔧 Running the Function to Collect Core Data

 Now let's execute the fuctions to retrieve and store detailed information about each rocket core from the SpaceX dataset. This step will populate the external lists with values such as block number, reuse count, serial number, landing details, and more.

In [16]:
#Now, let's apply <code> getBoosterVersion</code> function method to get the booster version
getBoosterVersion(data)
# Call getLaunchSite
getLaunchSite(data)

# Call getPayloadData
getPayloadData(data)

# Call getCoreData
getCoreData(data)

Finally lets construct our dataset using the data we have obtained. We we combine the columns into a dictionary.

In [17]:
# ----------------------------------------------------------------------------------
# Constructing the Final Dataset
# ----------------------------------------------------------------------------------
# Now that we have extracted all the required data, we will combine everything into
# a single dictionary. Each key in the dictionary represents a column in the final
# DataFrame, and the value is a list containing the corresponding column values.

# Note:
# We use list(data['column_name']) to convert each pandas Series into a list.
# This ensures consistency, as some of the other data (like BoosterVersion, PayloadMass, etc.)
# are already stored in lists. By converting all to lists, we can build the DataFrame
# without shape mismatches or inconsistencies.

launch_dict = {
    'FlightNumber': list(data['flight_number']),   # Convert column to list to ensure uniform data type
    'Date': list(data['date']),                    # Same here for launch date
    'BoosterVersion': BoosterVersion,              # Booster versions collected earlier
    'PayloadMass': PayloadMass,                    # Payload mass in kg
    'Orbit': Orbit,                                # Orbit type for each mission
    'LaunchSite': LaunchSite,                      # Launch site used for each flight
    'Outcome': Outcome,                            # Landing outcome (e.g., "True RTLS", "False Ocean")
    'Flights': Flights,                            # Number of flights the booster has made
    'GridFins': GridFins,                          # Whether grid fins were used
    'Reused': Reused,                              # Whether the booster was reused
    'Legs': Legs,                                  # Whether landing legs were present
    'LandingPad': LandingPad,                      # ID of the landing pad used
    'Block': Block,                                # Block number from API
    'ReusedCount': ReusedCount,                    # Number of times the booster was reused
    'Serial': Serial,                              # Serial number of the booster
    'Longitude': Longitude,                        # Launch site longitude
    'Latitude': Latitude                           # Launch site latitude
}



Then, we need to create a Pandas data frame from the dictionary launch_dict.

In [18]:
 # ----------------------------------------------------------------------------------
# Creating a Pandas DataFrame from the launch_dict
# ----------------------------------------------------------------------------------
# We pass the dictionary (where keys are column names and values are lists) into 
# the pd.DataFrame() constructor. This creates a structured table with rows and columns.

df = pd.DataFrame(launch_dict)




In [19]:
df.head(1)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.74,9.05


### Task 2: Filter the dataframe to only include `Falcon 9` launches

Finally we will remove the Falcon 1 launches keeping only the Falcon 9 launches. Filter the data dataframe using the <code>BoosterVersion</code> column to only keep the Falcon 9 launches. Save the filtered data to a new dataframe called <code>data_falcon9</code>.


In [20]:
# Filter the launches to include only those with the 'Falcon 9' rocket
# Use the 'BoosterVersion' column to identify Falcon 9 launches.
# Create a new DataFrame called 'data_falcon9' that contains only rows where 'BoosterVersion' equals 'Falcon 9'.
data_falcon9 = df[df['BoosterVersion'] == 'Falcon 9']

# Display the first rows of the filtered DataFrame to verify the operation was successful.
# This allows us to visually inspect the data and ensure only Falcon 9 launches are included.
# Count the number of Falcon 9 launches in the filtered DataFrame
data_falcon9.info()

<class 'pandas.core.frame.DataFrame'>
Index: 90 entries, 4 to 93
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   FlightNumber    90 non-null     int64  
 1   Date            90 non-null     object 
 2   BoosterVersion  90 non-null     object 
 3   PayloadMass     85 non-null     float64
 4   Orbit           90 non-null     object 
 5   LaunchSite      90 non-null     object 
 6   Outcome         90 non-null     object 
 7   Flights         90 non-null     int64  
 8   GridFins        90 non-null     bool   
 9   Reused          90 non-null     bool   
 10  Legs            90 non-null     bool   
 11  LandingPad      64 non-null     object 
 12  Block           90 non-null     float64
 13  ReusedCount     90 non-null     int64  
 14  Serial          90 non-null     object 
 15  Longitude       90 non-null     float64
 16  Latitude        90 non-null     float64
dtypes: bool(3), float64(4), int64(3), object(7

Now that we have removed some values we should reset the FlgihtNumber column

In [21]:
# ----------------------------------------------------------------------------------
# After removing some rows from the dataset, the FlightNumber column may have gaps 
# or be out of order. This line resets the FlightNumber column so it contains 
# consecutive numbers starting from 1.
# ----------------------------------------------------------------------------------

# Replace the entire 'FlightNumber' column with a new sequence of integers
# We use .loc to select all rows (:) in the 'FlightNumber' column
# range(1, data_falcon9.shape[0]+1) generates numbers from 1 to the total number of rows (inclusive)
# We convert the range to a list so it can be assigned directly to the column
data_falcon9.loc[:, 'FlightNumber'] = list(range(1, data_falcon9.shape[0] + 1)) 

# Display the updated DataFrame
data_falcon9.head()


Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.58,28.56
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.58,28.56
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.58,28.56
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.61,34.63
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.58,28.56


## Data Wrangling
We can see below that some of the rows are missing values in our dataset.

In [22]:
# ----------------------------------------------------------------------------------
# Null Values Analysis in the DataFrame
# ----------------------------------------------------------------------------------
# The .isnull() function is used to identify null (NaN) values in the DataFrame.
# It returns a DataFrame of the same size with boolean values:
# - True indicates that the value is null.
# - False indicates that the value is not null.
#
# Then, .sum() is applied to count the total number of null values in each column.
# This provides an overview of the columns with missing data and the number of missing values in each.
#
# This analysis is useful for identifying data quality issues and deciding
# how to handle missing values (e.g., removing them, imputing them, etc.).
data_falcon9.isnull().sum()


FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        5
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

Before we can continue we must deal with these missing values. The <code>LandingPad</code> column will retain None values to represent when landing pads were not used.

### Task 3: Dealing with Missing Values

Calculate below the mean for the <code>PayloadMass</code> using the <code>.mean()</code>. Then use the mean and the <code>.replace()</code> function to replace `np.nan` values in the data with the mean you calculated.


In [23]:

# Calculate the mean value of PayloadMass 
calcular_media(data_falcon9,'PayloadMass',float)


# Replace the np.nan values with its mean value
data_falcon9.loc[:,"PayloadMass"]=data_falcon9['PayloadMass'].fillna(calcular_media(data_falcon9,'PayloadMass',float))

You should see the number of missing values of the <code>PayLoadMass</code> change to zero.

Now we should have no missing values in our dataset except for in <code>LandingPad</code>.



In [24]:
data_falcon9.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        0
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

In [25]:
data_falcon9.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,6123.55,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.58,28.56
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.58,28.56
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.58,28.56
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.61,34.63
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.58,28.56


We can now export it to a <b>CSV</b> for the next section,but to make the answers consistent, in the next lab we will provide data in a pre-selected date range. 

In [26]:
#data_falcon9.to_csv("dataset_part_1.csv",index=False)