# Project 3 - **Space X Falcon 9 first stage landing prediction**

## Part 1: Collecting the data

In this capstone we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on it's website with a cost of 62 million dollars, other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. In this project we will collect and make sure the data is in the correct format from an API. 

## Objectives

In this project, we will make a get request to the SpaceX API. We will also do some basic data wrangling and formating.

- Request to the SpaceX API
- Clean the requested data

In [34]:
#import all required libraries
import pandas as pd
import numpy as np
import requests #requests will allow us to make HTTP request which we will use to get data from an API
import datetime

## Let's start requesting rocket launch data from SpaceX API with the following URL
Let's fetch information about rocket launches from SpaceX's online database (API).
- SpaceX API – SpaceX provides a public API (a website that gives data) where we can request details about their launches, rockets, and missions.
- Requesting Data – We will send a request to the API URL (web address) to get information.
- URL (Web Address) – This is the link where the API stores the data, such as `https://api.spacexdata.com/v4/launches/past`, This URL gives all launch data (dates, rockets used, mission details, etc.).

With this we can analyze past launches (dates, success rate, rockets used.), visualize launch history (graphs, tables), and track upcoming launches if available in data. In short, we are connecting to SpaceX database via the internet abd getting real-time rocket launch data in python!

In [35]:
spacex_url = ('https://api.spacexdata.com/v4/launches/past')

In [36]:
response = requests.get(spacex_url)

After executing the above code, we can see that the response contains massive information about SpaceX launches. Next, let's try to discover some more relevant information for this project.

## Request and parse the SpaceX launch data using the GET reuqest

To make the requested JSON results more consistent, we will use the following static response object for this project:

What we are doing actually:
- Sending a GET request to SpaceX api
- Receving launch data in JSON format
- Extracting and parsing the relevant information

A static response object means that instead of calling the API each time (which might change data), we use a fixed (predefined) JSON response for consistency. This helps to avoid APIrate limits, ensures consistent results for testing, and no need of internet connection after fetching once.

In [37]:
static_json_url = ('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json')

In [38]:
response = requests.get(static_json_url)

In [39]:
response.status_code

200

We should see that the request was successfull with the 200 status response code

### Decoding the Response and Converting to a Pandas DataFrame
Now we decode the response content as a Json using .json() and turn it into a Pandas dataframe using .json_normalize()

After getting the JSON response from the SpaceX API, we need to convert it into a structured format for analysis. We will:
- Decode JSON using .json().
- Normalize JSON into a table format using pandas.json_normalize().
- Convert it into a DataFrame for easy data manipulation.

What happens in this process?
- `requests.get(url)` - Fetches data from SpaceX API.
- `response.json()` → Converts API response to JSON (a dictionary-like format).
- `pd.json_normalize(data)` → Converts nested JSON into a table.

This helps in extracting nested JSON data and flattens it into a table format, makes data analysis easier in pandas, works well when JSON has many levels of nesting.

In [40]:
json_data = response.json()
data = pd.json_normalize(json_data)
data.head()

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,tbd,net,window,rocket,success,details,crew,ships,capsules,payloads,launchpad,auto_update,failures,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,True,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png,https://images2.imgbox.com/40/e3/GypSkayF_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,False,0.0,5e9d0d95eda69955f709d1eb,False,"Successful first stage burn and transition to second stage, maximum altitude 289 km, Premature engine shutdown at T+7 min 30 s, Failed to reach orbit, Failed to recover first stage",[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,True,"[{'time': 301, 'altitude': 289, 'reason': 'harmonic oscillation leading to premature engine shutdown'}]",2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/4f/e3/I0lkuJ2e_o.png,https://images2.imgbox.com/be/e7/iNqsqVYM_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-rocket-fails-reach-orbit.html,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Residual stage 1 thrust led to collision between stage 1 and stage 2,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,True,"[{'time': 140, 'altitude': 35, 'reason': 'residual stage-1 thrust led to collision between stage 1 and stage 2'}]",3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/3d/86/cnu0pan8_o.png,https://images2.imgbox.com/4b/bd/d8UxLh4q_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1-flight-3-mission-summary,https://en.wikipedia.org/wiki/Trailblazer_(satellite),
3,2008-09-20T00:00:00.000Z,1221869000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,True,"Ratsat was carried to orbit on the first successful orbital launch of any privately funded and developed, liquid-propelled carrier rocket, the SpaceX Falcon 1",[],[],[],[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,True,[],4,RatSat,2008-09-28T23:15:00.000Z,1222643700,2008-09-28T11:15:00+12:00,hour,False,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdbffd86e000604b32d,False,False,False,[],https://images2.imgbox.com/e9/c9/T8CfiSYb_o.png,https://images2.imgbox.com/e0/a7/FNjvKlXW_o.png,,,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,False,0.0,5e9d0d95eda69955f709d1eb,True,,[],[],[],[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,True,[],5,RazakSat,2009-07-13T03:35:00.000Z,1247456100,2009-07-13T15:35:00+12:00,hour,False,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdcffd86e000604b32e,False,False,False,[],https://images2.imgbox.com/a7/ba/NBZSw3Ho_o.png,https://images2.imgbox.com/8d/fc/0qdZMWWx_o.png,,,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs-falcon-1-successfully-delivers-razaksat-satellite-orbit,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1-flight-5,https://en.wikipedia.org/wiki/RazakSAT,


In [41]:
data.columns

Index(['static_fire_date_utc', 'static_fire_date_unix', 'tbd', 'net', 'window',
       'rocket', 'success', 'details', 'crew', 'ships', 'capsules', 'payloads',
       'launchpad', 'auto_update', 'failures', 'flight_number', 'name',
       'date_utc', 'date_unix', 'date_local', 'date_precision', 'upcoming',
       'cores', 'id', 'fairings.reused', 'fairings.recovery_attempt',
       'fairings.recovered', 'fairings.ships', 'links.patch.small',
       'links.patch.large', 'links.reddit.campaign', 'links.reddit.launch',
       'links.reddit.media', 'links.reddit.recovery', 'links.flickr.small',
       'links.flickr.original', 'links.presskit', 'links.webcast',
       'links.youtube_id', 'links.article', 'links.wikipedia', 'fairings'],
      dtype='object')

We can notice that lot of the data are IDs. For example the rocket column has no information about the rocket just an identification number.

We will now use the API again to get information about the launches using the IDs given for each launch. Specifically we will be using columns `rocket`, `payload`, `launchpad`, and `cores`.

In [42]:
#let's take a subset of our dataframe keeping only the features we want and the flight number, and date_utc.
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

#let's remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
data = data[data['cores'].map(len)==1]
data = data[data['payloads'].map(len)==1]

#since payloads and cores are lists of size 1, we will also extract the single value in the list and replace the feature.
data['cores'] = data['cores'].map(lambda x: x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

#we also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Using the date we will restrict the dates of the launches
data = data[data['date'] <= datetime.date(2020, 11, 13)]

In [43]:
data.columns

Index(['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc',
       'date'],
      dtype='object')

In [44]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 94 entries, 0 to 105
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   rocket         94 non-null     object
 1   payloads       94 non-null     object
 2   launchpad      94 non-null     object
 3   cores          94 non-null     object
 4   flight_number  94 non-null     int64 
 5   date_utc       94 non-null     object
 6   date           94 non-null     object
dtypes: int64(1), object(6)
memory usage: 5.9+ KB


**Code Explanation**

Imagine we work for SpaceX

We are given a spreadsheet (DataFrame) that contains details of all SpaceX rocket launches. However, this spreadsheet has a lot of unnecessary information and we only need specific details to analyze.

Our task is to clean up the data by:
1. Keeping only useful information (like which rocket was used, lauchpad location, and date of launch).
2. Removing complicated cases (like rockets with multiple boosters or multiple payloads).
3. Formatting the data properly (extracting only the date from the timestamp).
4. Filtering out launches before November 13, 2020, to focus only on past missions.


1. step-1: Keep only Important Columns
`data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]` - we are keeping only the essential information we need from our data:
- `rocket`: The name of the rocket (e.g., Falcon 9).
- `payloads`: The cargo being sent to space (e.g., a satellite or a crewed spacecraft).
- `launchpad`: The location from where the rocket was launched (e.g., Kennedy Space Center).
- `cores`: The main engine booster (some rockets have multiple boosters).
- `flight_number`: The launch number for tracking (e.g., Flight 100).
- `date_utc`: The exact date and time of launch (e.g., 2020-05-30 19:22:00 UTC).

2. Syep-2: Remove Rockets with multiple boosters
`data = data[data['cores'].map(len) == 1]`
- Some SpaceX rockets (like Falcon Heavy) have 3 boosters instead of one.
- We only want rockets with 1 booster, so we remove those with multiple cores.

3. Step-3: Remove Rockets Carrying Multiple Payloads
`data = data[data['payloads'].map(len) == 1]`
- Some rockets carry multiple satellites at once.
- We only want rockets that carry a single satellite or spacecraft, so we filter out multi-payload launches.

4. Step-4: Extract Single Values from Lists
`data.loc[:, 'cores'] = data['cores'].map(lambda x: x[0])
data.loc[:, 'payloads'] = data['payloads'].map(lambda x: x[0])`
- In our dataset, the cores and payloads columns store information in lists (even if there's just one value).
- We extract the single value inside each list and store it directly.

5. Step-5: Convert Launch Date to Just the Date
`data.loc[:, 'date'] = pd.to_datetime(data['date_utc']).dt.date`
- The date_utc column contains both date and time (e.g., 2020-05-30 19:22:00 UTC).
- We only need the date (2020-05-30) and remove the time part.

6. Step-6: Keep Only Launches Before November 13, 2020
`data = data[data['date'] <= datetime.date(2020, 11, 13)]`
- We only want past launches, so we remove all launches after November 13, 2020.

#### Extracting Additional Data from SpaceX API
We will request and extract key details from the API using `rocket`, `payloads`, `launchpad`, and `cores` information. These details will be stored in lists and then converted into a Pandas DataFrame.

Data set we want to extract:

* From the <code>rocket</code> we would like to learn the booster name

* From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to

* From the <code>launchpad</code> we would like to know the name of the launch site being used, the longitude, and the latitude.

* **From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.**

The data from these requests will be stored in lists and will be used to create a new dataframe.

| Category | Details to Extract |
|---|---|
| Rocket | Booster name |
| Payload | Payload mass (kg), Orbit type |
| Launchpad | Launch site name, Longitude, Latitude |
| Cores | Landing outcome, Landing type, Number of flights, Gridfins used, Core reused, Legs used, Landing pad, Core block version, Number of times reused, Core serial |



In [57]:
#Global variables
BoosterVersion = []
PayloadMass = []
Orbit = []
LauchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []
LaunchSite = []

### Defining Functions
Below let's define a series of helper functions that will help us use the API to extract information using identification numbers in the launch data.

From the `rocket` column we would like to learn the booster name.

In [58]:
#takes the dataset and uses the rocket column to call the API and append the data to the list

def getBoosterVersion(data):
    for x in data['rocket']:
        if x:
            url = f"https://api.spacexdata.com/v4/rockets/{x}"  # Properly format the URL
            response = requests.get(url).json()  # Fetch data from API
            
            if 'name' in response:  # Ensure 'name' key exists in response
                BoosterVersion.append(response['name'])


Explanation of above code:

The function `getBoosterVersion(data)`, takes a dataset `data` as input, looks at the 'rocket' column in the dataset and for each value (rocket ID) in the column, it sends a request to the SpaceX API to get more details about that rocket, it extracts the 'name of the rocket from the API response and adds the rocket name to the list called 'BoosterVersion'.

For example if the dataset has `rocket` IDs like `5e9d0d95eda69955f709d1eb`, the function will call `https://api.spacexdata.com/v4/rockets/5e9d0d95eda69955f709d1eb` get a response with details about the rocket and extract the name (e.g., 'Falcon 9') and store 'Falcon 9' in the `BoosterVersion` list.

From the `launchpad` we would like to known the name of the lauch site being used, the longitude and the latitude.

In [59]:
# Takes the dataset and uses the lauchpad column to call the API and the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:  # Corrected spelling
        if x:
            url = f"https://api.spacexdata.com/v4/launchpads/{x}"  # Proper URL formatting
            response = requests.get(url).json()  # Fetch data from API
            
            if 'longitude' in response and 'latitude' in response and 'name' in response:  
                Longitude.append(response['longitude'])  # Store longitude
                Latitude.append(response['latitude'])    # Store latitude
                LaunchSite.append(response['name'])      # Store launch site name

From the `payload` we would like to learn the mass of the payload and the orbit that it is going to.

In [60]:
def getPayloadData(data):
    for load in data['payloads']:  # Loop through each payload ID
        if load:
            url = f"https://api.spacexdata.com/v4/payloads/{load}"  # Correct URL formatting
            response = requests.get(url).json()  # Fetch data from API
            
            # Check if 'mass_kg' and 'orbit' exist in response
            mass = response.get('mass_kg', None)  # Use .get() to avoid KeyError
            orbit = response.get('orbit', None)

            PayloadMass.append(mass)
            Orbit.append(orbit)

From `cores` we would like to learn the outcome of the landing, the type of the landing, no of flights with that core, whether gridfins were used, wheter the core is refused, wheter legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.

In [61]:
# takes the dataset and uses the cores column to call the API and append the data to the lists
def getCoreData(data):
    for core in data['cores']:  # Loop through each core dictionary in the column
        if core['core']:  # Check if core ID is valid
            url = f"https://api.spacexdata.com/v4/cores/{core['core']}"  # Correct URL
            response = requests.get(url).json()  # Fetch API response

            # Extract data safely using .get()
            Block.append(response.get('block', None))
            ReusedCount.append(response.get('reuse_count', None))
            Serial.append(response.get('serial', None))
        else:
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)

        # Handle potential missing data
        Outcome.append(f"{core.get('landing_success', 'Unknown')} - {core.get('landing_type', 'Unknown')}")
        Flights.append(core.get('flight', None))
        GridFins.append(core.get('gridfins', None))
        Reused.append(core.get('reused', None))
        Legs.append(core.get('legs', None))
        LandingPad.append(core.get('landpad', None))

These functions will apply the outputs globally to the above variables. Let's take a looks at BoosterVersion variable. Before we apply getBoosterVersion the list is empty:

In [62]:
BoosterVersion

[]

In [63]:
#Now, let's apply  getBoosterVersion function method to get the booster version
getBoosterVersion(data)

In [64]:
#the list has now been update
BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

In [65]:
#we can apply the rest of the functions here

# Call getLaunchSite
getLaunchSite(data)

In [66]:
# Call getPayloadData
getPayloadData(data)

In [67]:
# Call getCoreData
getCoreData(data)

In [68]:
#Finally lets construct our dataset using the data we have obtained. We we combine the columns into a dictionary.
launch_dict = {'FlightNumber': list(data['flight_number']),
'Date': list(data['date']),
'BoosterVersion':BoosterVersion,
'PayloadMass':PayloadMass,
'Orbit':Orbit,
'LaunchSite':LaunchSite,
'Outcome':Outcome,
'Flights':Flights,
'GridFins':GridFins,
'Reused':Reused,
'Legs':Legs,
'LandingPad':LandingPad,
'Block':Block,
'ReusedCount':ReusedCount,
'Serial':Serial,
'Longitude': Longitude,
'Latitude': Latitude}

**Now, let's create a pandas dataframe from the dictionary launch_dict.**

In [69]:
lauch_df = pd.DataFrame(launch_dict)
lauch_df.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None - None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None - None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None - None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None - None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857


In [70]:
#summary of the dataframe (launch-df)
lauch_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 94 entries, 0 to 93
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   FlightNumber    94 non-null     int64  
 1   Date            94 non-null     object 
 2   BoosterVersion  94 non-null     object 
 3   PayloadMass     88 non-null     float64
 4   Orbit           94 non-null     object 
 5   LaunchSite      94 non-null     object 
 6   Outcome         94 non-null     object 
 7   Flights         94 non-null     int64  
 8   GridFins        94 non-null     bool   
 9   Reused          94 non-null     bool   
 10  Legs            94 non-null     bool   
 11  LandingPad      64 non-null     object 
 12  Block           90 non-null     float64
 13  ReusedCount     94 non-null     int64  
 14  Serial          94 non-null     object 
 15  Longitude       94 non-null     float64
 16  Latitude        94 non-null     float64
dtypes: bool(3), float64(4), int64(3), obj

In [71]:
lauch_df.describe()

Unnamed: 0,FlightNumber,PayloadMass,Flights,Block,ReusedCount,Longitude,Latitude
count,94.0,88.0,94.0,90.0,94.0,94.0,94.0
mean,54.202128,5919.165341,1.755319,3.5,3.053191,-75.553302,28.581782
std,30.589048,4909.689575,1.197544,1.595288,4.153938,53.39188,4.639981
min,1.0,20.0,1.0,1.0,0.0,-120.610829,9.047721
25%,28.25,2406.25,1.0,2.0,0.0,-80.603956,28.561857
50%,52.5,4414.0,1.0,4.0,1.0,-80.577366,28.561857
75%,81.5,9543.75,2.0,5.0,4.0,-80.577366,28.608058
max,106.0,15600.0,6.0,5.0,13.0,167.743129,34.632093


#### Key Insights:
##### 1. Flight Numbers (`FlightNumber`)  
- Range: 1 to 106 (indicates sequential launches).
- Mean: 54.2, suggesting most launches happened in the mid-range.  
- Std Dev (30.59): High variation in flight numbers, confirming increasing launch activity over time.

##### 2. Payload Mass (`PayloadMass`)  
- Range: 20 kg to 15,600 kg  
- Mean: 5,919 kg, which is within typical payload capacities for SpaceX rockets.  
- Std Dev (4909.69): Large variation, meaning some payloads are much heavier.  
- Minimum (20 kg): Likely a small satellite mission.  
- Maximum (15,600 kg): Likely a heavy payload mission (e.g., Starlink batch or commercial satellite).  

##### 3. Flights Per Core (`Flights`)  
- Range: 1 to 6  
- Mean: 1.75, indicating most rockets flew once or twice.  
- 75th Percentile (2 flights): Suggests some cores are reused, but most are not heavily reused.  

##### 4. Block Versions (`Block`)  
- Range: 1 to 5 (indicates different Falcon 9 versions).  
- Mean: 3.5, suggesting most launches happened with newer Falcon 9 versions.  

##### 5. Core Reuse Count (`ReusedCount`)  
- Range: 0 to 13 (some cores were reused up to 13 times).  
- Mean: 3.05, confirming SpaceX's strong focus on rocket reusability.  
- 25% of launches used new cores (0 reuses), but most rockets were reused at least once.  

##### 6. Launch Sites (`Longitude` & `Latitude`)  
- Multiple Locations:  
  - Cape Canaveral / Kennedy Space Center (~-80.57, 28.56)  
  - Vandenberg (~-120.61, 34.63) 
  - New Zealand (~167.74, 9.04) (potential future sites).  
- Std Dev in Longitude (53.39): Suggests launches happen across different global locations.  

##### Summary:  
- Increasing Launches Over Time 
- Payloads Vary from Small Satellites to Large Missions 
- Core Reuse is Common (Some Used up to 13 Times!)  
- Most Flights Happen with Newer Falcon 9 Variants 
- SpaceX Uses Multiple Launch Sites Worldwide

### Filter the dataframe to only include `Falcon 9` launches

In [72]:
lauch_df['BoosterVersion'].unique()

array(['Falcon 1', 'Falcon 9'], dtype=object)

In [81]:
lauch_df['BoosterVersion'].nunique()

2

Finally we will remove the Falcon 1 launches keeping only the Falcon 9 launches. Filter the data dataframe using the <code>BoosterVersion</code> column to only keep the Falcon 9 launches. Save the filtered data to a new dataframe called <code>data_falcon9</code>

In [73]:
data_falcon9 = lauch_df[lauch_df['BoosterVersion'].str.contains('Falcon 9', na = False)]

In [74]:
data_falcon9.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,8,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,10,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,11,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False - Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,12,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


In [82]:
data_falcon9.info()

<class 'pandas.core.frame.DataFrame'>
Index: 90 entries, 4 to 93
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   FlightNumber    90 non-null     int64  
 1   Date            90 non-null     object 
 2   BoosterVersion  90 non-null     object 
 3   PayloadMass     90 non-null     float64
 4   Orbit           90 non-null     object 
 5   LaunchSite      90 non-null     object 
 6   Outcome         90 non-null     object 
 7   Flights         90 non-null     int64  
 8   GridFins        90 non-null     bool   
 9   Reused          90 non-null     bool   
 10  Legs            90 non-null     bool   
 11  LandingPad      64 non-null     object 
 12  Block           90 non-null     float64
 13  ReusedCount     90 non-null     int64  
 14  Serial          90 non-null     object 
 15  Longitude       90 non-null     float64
 16  Latitude        90 non-null     float64
dtypes: bool(3), float64(4), int64(3), object(7

In [76]:
#now that we have removed some values, we should reset the FlightNumber column
data_falcon9.loc[:,'FlightNumber'] = list(range(1, data_falcon9.shape[0] + 1))
data_falcon9

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False - Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None - None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,86,2020-09-03,Falcon 9,15600.0,VLEO,KSC LC 39A,True - ASDS,2,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1060,-80.603956,28.608058
90,87,2020-10-06,Falcon 9,15600.0,VLEO,KSC LC 39A,True - ASDS,3,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,13,B1058,-80.603956,28.608058
91,88,2020-10-18,Falcon 9,15600.0,VLEO,KSC LC 39A,True - ASDS,6,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1051,-80.603956,28.608058
92,89,2020-10-24,Falcon 9,15600.0,VLEO,CCSFS SLC 40,True - ASDS,3,True,True,True,5e9e3033383ecbb9e534e7cc,5.0,12,B1060,-80.577366,28.561857


## Data Wrangling
From the above data frame we can see some missing values, let's calculate the mnissing values in the data frame

In [77]:
data_falcon9.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        5
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

We can see that `PayloadMass` and `LandingPad` have some missing values, before proceeding let's deal with these missing values. let's calculate below the mean for the `PayloadMass` using the `.mean()`. Then use the mean and the `.replace()` function to replace `np.NaN` vakues in the data with the mean we calculated.

In [78]:
mean_payload_mass = data_falcon9['PayloadMass'].mean()
data_falcon9['PayloadMass'].replace(np.NaN, mean_payload_mass, inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data_falcon9['PayloadMass'].replace(np.NaN, mean_payload_mass, inplace = True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_falcon9['PayloadMass'].replace(np.NaN, mean_payload_mass, inplace = True)


In [79]:
data_falcon9.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        0
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

Now we should have no missing values in our dataset except for in LandingPad.

We can now export it to a CSV for the next section,but to make the answers consistent, in the next lab we will provide data in a pre-selected date range.

In [80]:
data_falcon9.to_csv('dataset_spacex_part_1.csv', index = False)