# **Data Collection**
In the following section, we will collect data to build the machine learning model.


## Libraries

There are several common libraries that serve different purposes:

1. **Getting data from a company's API endpoint**
   - **requests**: A library used to send and handle HTTP requests, commonly applied for interacting with RESTful API endpoints.

2. **Data processing**
   - **pandas**: A library that structures data into table-like formats (e.g., DataFrames) and provides powerful tools for data manipulation.
   - **NumPy**: A library that enables efficient numerical computations through vectorized operations, significantly improving performance—particularly when paired with GPU acceleration.
celeration.

3. **Others**
   - **datetime**: A library for converting strings to date objects and vice versa, as well as handling date and time manipulations.


In [114]:
import requests

import pandas as pd

import numpy as np

import datetime


In this project, we aim to collect SpaceX's data to predict the reusability of rocket boosters. We can retrieve this data by sending a GET request to SpaceX's API endpoint, such as `https://api.spacexdata.com/v4/launches/past`, which contains records of past launches. However, because SpaceX frequently updates the data and how they manage its features, and since this project is for experimental purposes, we will use a fixed snapshot of the data. The following link provides a static snapshot (i.e., a historical copy) of this endpoint's data stored online.


In [115]:
SPACE_X_PAST_LAUNCH_API = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json"

In the following code block, we will first retrieve the data through an API request. Then, we will convert the response object into JSON format. Checking the request's status is optional but recommended. A status code of `200` indicates the request was successful; otherwise, we may need to debug potential issues.

In [116]:
# Send a GET request to the endpoint
response = requests.get(SPACE_X_PAST_LAUNCH_API)

# Convert the response object to JSON format for data processing
json_data = None

if response.status_code == 200:

    # Successfully retrive the data through the GET　request.
    json_data = response.json()

# Verify that json_data successfully retrieved data; otherwise, raise an informative error
assert json_data is not None, "Failed to retrieve JSON data. Check if the API endpoint is correct or reachable."


# data = pd.json_normalize(json_data, sep=".")



Now, we would like to convert the JSON data into a pandas DataFrame to easily manipulate the data in a tabular format.

There are two common ways to achieve this:

1. **`pandas.DataFrame(json_data)`**
2. **`pandas.json_normalize(json_data)`**

Both methods will transform `json_data` into a pandas DataFrame. However, `json_normalize` automatically flattens nested data structures within the JSON. A nested structure typically refers to cells containing dictionary-like data. Note that this method does not flatten nested lists or arrays.


In [117]:
# Here, we choose to flatten potential nested data structures, using "." as the separator.
data = pd.json_normalize(json_data)


Now, let's view the first few rows of the dataset using the `head()` method provided by the pandas library:


In [118]:
data.head()

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,tbd,net,window,rocket,success,details,crew,ships,capsules,payloads,launchpad,auto_update,failures,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,True,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png,https://images2.imgbox.com/40/e3/GypSkayF_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,False,0.0,5e9d0d95eda69955f709d1eb,False,"Successful first stage burn and transition to second stage, maximum altitude 289 km, Premature engine shutdown at T+7 min 30 s, Failed to reach orbit, Failed to recover first stage",[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,True,"[{'time': 301, 'altitude': 289, 'reason': 'harmonic oscillation leading to premature engine shutdown'}]",2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/4f/e3/I0lkuJ2e_o.png,https://images2.imgbox.com/be/e7/iNqsqVYM_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-rocket-fails-reach-orbit.html,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Residual stage 1 thrust led to collision between stage 1 and stage 2,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,True,"[{'time': 140, 'altitude': 35, 'reason': 'residual stage-1 thrust led to collision between stage 1 and stage 2'}]",3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/3d/86/cnu0pan8_o.png,https://images2.imgbox.com/4b/bd/d8UxLh4q_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1-flight-3-mission-summary,https://en.wikipedia.org/wiki/Trailblazer_(satellite),
3,2008-09-20T00:00:00.000Z,1221869000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,True,"Ratsat was carried to orbit on the first successful orbital launch of any privately funded and developed, liquid-propelled carrier rocket, the SpaceX Falcon 1",[],[],[],[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,True,[],4,RatSat,2008-09-28T23:15:00.000Z,1222643700,2008-09-28T11:15:00+12:00,hour,False,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdbffd86e000604b32d,False,False,False,[],https://images2.imgbox.com/e9/c9/T8CfiSYb_o.png,https://images2.imgbox.com/e0/a7/FNjvKlXW_o.png,,,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,False,0.0,5e9d0d95eda69955f709d1eb,True,,[],[],[],[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,True,[],5,RazakSat,2009-07-13T03:35:00.000Z,1247456100,2009-07-13T15:35:00+12:00,hour,False,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdcffd86e000604b32e,False,False,False,[],https://images2.imgbox.com/a7/ba/NBZSw3Ho_o.png,https://images2.imgbox.com/8d/fc/0qdZMWWx_o.png,,,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs-falcon-1-successfully-delivers-razaksat-satellite-orbit,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1-flight-5,https://en.wikipedia.org/wiki/RazakSAT,


Sometimes, it's inconvenient when we cannot view all columns of a DataFrame because pandas substitutes some columns or cell contents with the notation `"..."`. We can address this issue by changing pandas' default display settings. Specifically, we use `pd.set_option()` to adjust settings and `pd.reset_option()` to revert them to default values. The settings we want to modify in this case are `"display.max_columns"` and `"display.max_colwidth"`.


In [119]:

# Setting this option will print all collumns of a dataframe
pd.set_option('display.max_columns', None)
# Setting this option will print all of the data in a feature
pd.set_option('display.max_colwidth', None)

Now, we then be able to see all the columns and entire content. Let's call head() again to verify this.

In [120]:
data.head()

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,tbd,net,window,rocket,success,details,crew,ships,capsules,payloads,launchpad,auto_update,failures,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,True,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png,https://images2.imgbox.com/40/e3/GypSkayF_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,False,0.0,5e9d0d95eda69955f709d1eb,False,"Successful first stage burn and transition to second stage, maximum altitude 289 km, Premature engine shutdown at T+7 min 30 s, Failed to reach orbit, Failed to recover first stage",[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,True,"[{'time': 301, 'altitude': 289, 'reason': 'harmonic oscillation leading to premature engine shutdown'}]",2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/4f/e3/I0lkuJ2e_o.png,https://images2.imgbox.com/be/e7/iNqsqVYM_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-rocket-fails-reach-orbit.html,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Residual stage 1 thrust led to collision between stage 1 and stage 2,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,True,"[{'time': 140, 'altitude': 35, 'reason': 'residual stage-1 thrust led to collision between stage 1 and stage 2'}]",3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/3d/86/cnu0pan8_o.png,https://images2.imgbox.com/4b/bd/d8UxLh4q_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1-flight-3-mission-summary,https://en.wikipedia.org/wiki/Trailblazer_(satellite),
3,2008-09-20T00:00:00.000Z,1221869000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,True,"Ratsat was carried to orbit on the first successful orbital launch of any privately funded and developed, liquid-propelled carrier rocket, the SpaceX Falcon 1",[],[],[],[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,True,[],4,RatSat,2008-09-28T23:15:00.000Z,1222643700,2008-09-28T11:15:00+12:00,hour,False,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdbffd86e000604b32d,False,False,False,[],https://images2.imgbox.com/e9/c9/T8CfiSYb_o.png,https://images2.imgbox.com/e0/a7/FNjvKlXW_o.png,,,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,False,0.0,5e9d0d95eda69955f709d1eb,True,,[],[],[],[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,True,[],5,RazakSat,2009-07-13T03:35:00.000Z,1247456100,2009-07-13T15:35:00+12:00,hour,False,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdcffd86e000604b32e,False,False,False,[],https://images2.imgbox.com/a7/ba/NBZSw3Ho_o.png,https://images2.imgbox.com/8d/fc/0qdZMWWx_o.png,,,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs-falcon-1-successfully-delivers-razaksat-satellite-orbit,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1-flight-5,https://en.wikipedia.org/wiki/RazakSAT,


After converting the JSON data into a DataFrame, we want to understand the meaning of each column. This will help us select the features most likely to contribute to predicting booster reusability. To understand the columns clearly, we need to consult the documentation provided by the SpaceX API.

Documents link: https://docs.spacexdata.com/#bc65ba60-decf-4289-bb04-4ca9df01b9c1

The following will list each of variables.

| **Param**                   | **Sample**                            | **Type**          | **Description**                                                                                         |
| --------------------------- | ------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------- |
| `static_fire_date_utc`      | `2020-05-27T20:22:00.000Z`            | string (ISO-8601) | UTC date-time of the Falcon 9/Heavy static-fire test that precedes launch                               |
| `static_fire_date_unix`     | `1590613320`                          | integer           | Same moment as Unix epoch seconds                                                                       |
| `tbd`                       | `false`                               | boolean           | **T**o **B**e **D**etermined – `true` when the exact launch day (or time) is still uncertain            |
| `net`                       | `false`                               | boolean           | **N**o **E**arlier **T**han flag – `true` when launch can slip but never go earlier than the given date |
| `window`                    | `0`                                   | integer           | Launch-window length (seconds); `0` means an instantaneous launch opportunity                           |
| `rocket`                    | `5e9d0d95eda69973a809d1ec`            | string (ID)       | Mongo-style ID of the rocket document (query `/v4/rockets/:id`)                                         |
| `success`                   | `true`                                | boolean           | High-level mission success flag                                                                         |
| `details`                   | `"Crewed test flight of Crew Dragon"` | string            | Human-readable summary of the mission                                                                   |
| `crew`                      | `["5ebf1b73…d65","5ebf1b73…d66"]`     | array \<string>   | IDs of Dragon crew-member documents                                                                     |
| `ships`                     | `["5ea6ed2e…c908"]`                   | array \<string>   | Support ships (drone ships, fairing catchers, etc.) used on the mission                                 |
| `capsules`                  | `["5e9e1d55…b379"]`                   | array \<string>   | Dragon capsule IDs flown                                                                                |
| `payloads`                  | `["5eb0e4b5…b253"]`                   | array \<string>   | IDs of payload documents carried to orbit                                                               |
| `launchpad`                 | `5e9e4502f5090995de566f86`            | string (ID)       | ID of the physical launchpad used                                                                       |
| `auto_update`               | `true`                                | boolean           | Internal flag indicating the launch entry is auto-synced from SpaceX systems                            |
| `failures`                  | `[]`                                  | array \<object>   | Zero or more failure records (`time`,`altitude`,`reason`) if the mission failed                         |
| `flight_number`             | `94`                                  | integer           | Sequential flight number counted by SpaceX                                                              |
| `name`                      | `"Crew Dragon Demo-2"`                | string            | Mission/launch name displayed publicly                                                                  |
| `date_utc`                  | `2020-05-30T19:22:45.000Z`            | string            | Planned/actual liftoff time in UTC                                                                      |
| `date_unix`                 | `1590866565`                          | integer           | Same as `date_utc` in Unix seconds                                                                      |
| `date_local`                | `2020-05-30T15:22:45-04:00`           | string            | Liftoff time in the launch-site’s local timezone                                                        |
| `date_precision`            | `"hour"`                              | string            | Smallest assured time unit (`year`,`half`,`quarter`,`month`,`day`,`hour`)                               |
| `upcoming`                  | `false`                               | boolean           | `true` for future launches, `false` for past launches                                                   |
| `cores`                     | `[ { core:"5e9e289d…",flight:1,… } ]` | array \<object>   | One entry per first-stage core with reuse & landing metadata                                            |
| `id`                        | `5eb87d46ffd86e000604b388`            | string            | Unique launch document ID                                                                               |
| `fairings.reused`           | `false`                               | boolean           | Whether the payload fairing halves were reused hardware                                                 |
| `fairings.recovery_attempt` | `true`                                | boolean           | Whether a recovery attempt was made                                                                     |
| `fairings.recovered`        | `true`                                | boolean           | Whether at least one fairing half was successfully recovered                                            |
| `fairings.ships`            | `["5ea6ed30…c910"]`                   | array \<string>   | Ship IDs involved in fairing recovery                                                                   |
| `links.patch.small`         | URL                                   | string            | 200 px mission-patch PNG                                                                                |
| `links.patch.large`         | URL                                   | string            | 400 px mission-patch PNG                                                                                |
| `links.reddit.campaign`     | URL                                   | string            | Reddit campaign thread                                                                                  |
| `links.reddit.launch`       | URL                                   | string            | Reddit launch discussion thread                                                                         |
| `links.reddit.media`        | URL                                   | string            | Reddit post with images/album                                                                           |
| `links.reddit.recovery`     | URL                                   | string            | Reddit post covering booster/fairing recovery                                                           |
| `links.flickr.small`        | `[URL,…]`                             | array \<string>   | Small (640 px) Flickr images                                                                            |
| `links.flickr.original`     | `[URL,…]`                             | array \<string>   | Full-resolution Flickr images                                                                           |
| `links.presskit`            | URL                                   | string            | PDF launch press kit (if published)                                                                     |
| `links.webcast`             | URL                                   | string            | Direct webcast link                                                                                     |
| `links.youtube_id`          | `“wBzERcWH4fU”`                       | string            | The YouTube video-ID of the webcast                                                                     |
| `links.article`             | URL                                   | string            | Post-launch article or press release                                                                    |
| `links.wikipedia`           | URL                                   | string            | Wikipedia page for the mission                                                                          |
| `fairings`                  | *object*                              | object            | Parent object that groups the `fairings.*` sub-fields shown above                                       |



From a non-domain expert's perspective, several key factors might determine whether the Falcon 9 rocket can successfully recover its first-stage booster. We can reasonably speculate on the following three points:

First is the "payloads." One might guess that if the satellite or equipment carried by the rocket on launch day is particularly heavy, Falcon 9 would have to consume more fuel to place them into orbit. Consequently, the rocket might have less fuel remaining for the recovery of its first-stage booster, increasing the difficulty of recovery. Conversely, a lighter payload might leave more spare fuel, facilitating an easier and safer landing.

The second factor is the "launchpad." Different launch locations might influence booster recovery success rates. For example, if the rocket is launched from a platform close to the ocean, the first-stage booster might have to land on a drone ship, a process potentially more challenging than returning directly to land. On the other hand, if the launchpad is closer to the recovery location, it might make booster recovery somewhat easier.

Finally, there are the "cores" (the first-stage rocket core). This factor might relate to the booster's experience or the number of times it has been reused. If a booster has already undergone many previous flights, it could exhibit some degree of wear or fatigue, potentially affecting the likelihood of successful recovery. In contrast, a newer booster might be in better condition, increasing its chances of successful recovery.

Although these speculations aren't based on expert analysis, they intuitively help us understand what factors might influence the Falcon 9 first-stage booster's recovery success rate.

Thus, we should focus on the following three variables:

1. payloads
2. launchpad
3. cores

However, since we are focusing on Falcon 9, how can we identify it? This requires the column "rocket," from which we can call another API endpoint to determine whether the rocket is Falcon 9 or not. Well, this project was initially from IBM, and they also included the **flight\_number and date\_utc**. I will also include them, although I think these are just for data-sorting purposes.


Let's focus on the following variables:

1. `rocket`
2. `payloads`
3. `launchpad`
4. `cores`
5. `flight_number`
6. `date_utc`


In [121]:
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]
data.head()

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc
0,5e9d0d95eda69955f709d1eb,[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",1,2006-03-24T22:30:00.000Z
1,5e9d0d95eda69955f709d1eb,[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",2,2007-03-21T01:10:00.000Z
2,5e9d0d95eda69955f709d1eb,"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",3,2008-08-03T03:34:00.000Z
3,5e9d0d95eda69955f709d1eb,[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",4,2008-09-28T23:15:00.000Z
4,5e9d0d95eda69955f709d1eb,[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5,2009-07-13T03:35:00.000Z


To simplify, we initially consider only cases with a single payload. Also, note that Falcon 9 always has a single core. The IBM instruction attempts to filter cores to simplify the project further; however, since Falcon 9 is already single-core, I won't perform this redundant step.


In [122]:
data = data[data['payloads'].map(len) == 1]
data.head()

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc
0,5e9d0d95eda69955f709d1eb,[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",1,2006-03-24T22:30:00.000Z
1,5e9d0d95eda69955f709d1eb,[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",2,2007-03-21T01:10:00.000Z
3,5e9d0d95eda69955f709d1eb,[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",4,2008-09-28T23:15:00.000Z
4,5e9d0d95eda69955f709d1eb,[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5,2009-07-13T03:35:00.000Z
5,5e9d0d95eda69973a809d1ec,[5eb0e4b7b6c3bb0006eeb1e7],5e9e4501f509094ba4566f84,"[{'core': '5e9e289ef359185f2b3b2628', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",6,2010-06-04T18:45:00.000Z


After that, we want to convert the arrays in the cells of the `cores` and `payloads` columns into single values. Note that if the `cores` array contains multiple elements, we will select only the first element to represent the core, as rows containing multiple cores will eventually be removed from the dataset because they indicate rockets other than Falcon 9.


In [123]:
data['cores'] = data['cores'].map(lambda x: x[0])
data['payloads'] = data['payloads'].map(lambda x: x[0])
data.head()

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc
0,5e9d0d95eda69955f709d1eb,5eb0e4b5b6c3bb0006eeb1e1,5e9e4502f5090995de566f86,"{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",1,2006-03-24T22:30:00.000Z
1,5e9d0d95eda69955f709d1eb,5eb0e4b6b6c3bb0006eeb1e2,5e9e4502f5090995de566f86,"{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",2,2007-03-21T01:10:00.000Z
3,5e9d0d95eda69955f709d1eb,5eb0e4b7b6c3bb0006eeb1e5,5e9e4502f5090995de566f86,"{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",4,2008-09-28T23:15:00.000Z
4,5e9d0d95eda69955f709d1eb,5eb0e4b7b6c3bb0006eeb1e6,5e9e4502f5090995de566f86,"{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",5,2009-07-13T03:35:00.000Z
5,5e9d0d95eda69973a809d1ec,5eb0e4b7b6c3bb0006eeb1e7,5e9e4501f509094ba4566f84,"{'core': '5e9e289ef359185f2b3b2628', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",6,2010-06-04T18:45:00.000Z


Later, we might develop models different from those suggested by IBM. However, it's still a good idea to follow their date-based filtering approach. This will allow us to create comparable conditions. Specifically, IBM includes only data up to and including 2020-11-13. We will use Python's `datetime` library to perform this filtering.


In [124]:
# Use pandas' built-in function to convert string-formatted dates from the 'date_utc' column into pandas datetime objects
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Use the datetime library to filter the data, keeping only dates on or before November 13, 2020
data = data[data['date'] <= datetime.date(2020, 11, 13)]
data.head()

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc,date
0,5e9d0d95eda69955f709d1eb,5eb0e4b5b6c3bb0006eeb1e1,5e9e4502f5090995de566f86,"{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",1,2006-03-24T22:30:00.000Z,2006-03-24
1,5e9d0d95eda69955f709d1eb,5eb0e4b6b6c3bb0006eeb1e2,5e9e4502f5090995de566f86,"{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",2,2007-03-21T01:10:00.000Z,2007-03-21
3,5e9d0d95eda69955f709d1eb,5eb0e4b7b6c3bb0006eeb1e5,5e9e4502f5090995de566f86,"{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",4,2008-09-28T23:15:00.000Z,2008-09-28
4,5e9d0d95eda69955f709d1eb,5eb0e4b7b6c3bb0006eeb1e6,5e9e4502f5090995de566f86,"{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",5,2009-07-13T03:35:00.000Z,2009-07-13
5,5e9d0d95eda69973a809d1ec,5eb0e4b7b6c3bb0006eeb1e7,5e9e4501f509094ba4566f84,"{'core': '5e9e289ef359185f2b3b2628', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",6,2010-06-04T18:45:00.000Z,2010-06-04


Up to this point, we have completed a significant amount of work. Let's quickly summarize what we still need to do:

- Use the `rocket` column to identify whether the rocket is a Falcon 9 or not.
- The `payloads` column contains IDs that can be used with the SpaceX API endpoint to retrieve information about the payloads for each launch.
- The `launchpad` column contains IDs, which we can use with another API endpoint to identify the launch locations.
- The `cores` column has a slightly complex structure, even after simplification.

Let's plan how to approach these remaining tasks step-by-step.


**Plan:**
1. Determine the appropriate endpoint URLs, understand the meanings of parameters, and identify the parameters of interest. In this step, we will temporarily ignore the `cores` data.

2. Flatten the `cores` column and use the Core API endpoint to extract additional useful information.


## Step 1:

Let's create a list of constant variables to store these endpoint URLs.

The following is the GET endpoint used to retrieve the name of the rocket.

The URL format is https://api.spacexdata.com/v4/rockets/{{rocket_id}}.

From this endpoint, we are interested in the parameter `name`, which indicates whether the rocket is a Falcon 9 or not.


In [125]:
ROCKET_GET_API = "https://api.spacexdata.com/v4/rockets/"

The following is the GET endpoint used to retrieve the launchpads of the rocket.

The URL format is https://api.spacexdata.com/v4/launchpads/{{launchpads_id}}.

From this endpoint, we are interested in the launch location. There are multiple ways to represent a location, such as using the `region` parameter. However, the most accurate approach is to use the parameters `latitude`, `longitude`, and the launchpad's name (i.e., the parameter `name`).

In [126]:
LAUNCHPAD_API = "https://api.spacexdata.com/v4/launchpads/"

The following is the GET endpoint used to retrieve information about the rocket's payloads https://api.spacexdata.com/v4/payloads/{{launchpads_id}}.

From this endpoint, we are interested in the payload's mass and target orbit. Specifically, the parameters of interest are `mass_kg` and `orbit`.

In [127]:
PAYLOAD_API = "https://api.spacexdata.com/v4/payloads/"

## Step 2:
Now, we have a time to detail the keys of the core column.

| **Param**         | **Sample**                 | **Type**                | **Description**                                                                                                                                                                        |
| ----------------- | -------------------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `core`            | `5e9e289df35918033d3b2623` | string (Mongo-style ID) | Unique identifier for the first-stage booster. Use it with the **`/v4/cores/{id}`** endpoint to pull the core's full service history (reuse count, status, serial number B10xx, etc.). |
| `flight`          | `1`                        | integer                 | How many times **this core** has flown including the current mission (`1` = maiden flight). Can be used as a wear-and-tear or experience indicator.                                    |
| `gridfins`        | `false`                    | boolean                 | Whether the booster was equipped with grid fins for aerodynamic steering during descent. `false` implies the core is not configured for a controlled landing.                          |
| `legs`            | `false`                    | boolean                 | Whether landing legs were installed. Without legs a vertical landing is impossible (the booster will be expended or splash).                                                           |
| `reused`          | `false`                    | boolean                 | Indicates if the core had flown **before** the current mission (`true` ⇔ `flight > 1`). Serves as a simple “new vs. reused” flag.                                                      |
| `landing_attempt` | `false`                    | boolean                 | Did SpaceX intend to recover the booster on this flight? If `false`, the fields below are typically `null` and the mission should be excluded when modeling landing success.           |
| `landing_success` | `null`                     | boolean \| null         | Outcome of the landing attempt (`true`, `false`, or `null` when no attempt). This is usually the target variable for recovery prediction models.                                       |
| `landing_type`    | `null`                     | string \| null          | Method used for recovery when attempted: `"RTLS"` (Return-To-Launch-Site), `"ASDS"` (drone-ship), `"Ocean"` (soft splashdown), etc.                                                    |
| `landpad`         | `null`                     | string \| null          | ID of the landing zone or droneship (e.g., `5e9e3032383ecb6bb234e7ca` for “OCISLY”). Query **`/v4/landpads/{id}`** for coordinates and name.                                           |


Too many `flight` instances can increase the risk of failure, so this variable is an important feature. The presence of `gridfins` enhances attitude control during descent, improving landing accuracy. The use of `legs` is indispensable for vertical landings. The `reused` flag indicates whether the booster has flown before; although it overlaps with `flight`, we will keep it for consistency with the IBM project.

The parameter `landing_success` will serve as the target variable when training our model. Additionally, `landing_type` is significant because sea landings generally involve greater recovery challenges. Finally, `landpad` could also matter for reasons similar to those affecting launchpads, such as local weather conditions.

We will ignore `landing_attempt` because it merely states whether SpaceX intended to recover the booster before launch and does not directly influence the outcome.

The only key we have not covered yet is `core`. Core information can be retrieved from the Core API endpoint:

[https://api.spacexdata.com/v4/cores/{id}](https://api.spacexdata.com/v4/cores/{{id}})

The crucial parameter here is `block`, representing the hardware version. The field `reuse_count` tells us how many successful flights the core has previously completed; this might appear redundant because we already have `flight`. Generally, newer cores (with higher block numbers) perform better. The IBM project also records the core's `serial` number, possibly for future reference or searches.

I believe `rtls_attempts` and `asds_attempts` could also be valuable, even though they are not included in the IBM dataset. These fields reveal how many previous RTLS or ASDS landing attempts the core has made. For instance, if the current landing is ASDS and the core has many prior ASDS attempts, that history might indicate its suitability for sea landings. Of course, there is some trade-off with `reuse_count` when interpreting these values.


In [128]:
CORES_API = "https://api.spacexdata.com/v4/cores/"

Now that we have a list of variables we want to focus on, we will first define these variables in an array and then retrieve the corresponding data to populate this array.


In [129]:
#Global variables
# Rocket:
BoosterVersion = []

# Payloads:
PayloadMass = []
Orbit = []

# LaunchPads:
LaunchSite = []
Longitude = []
Latitude = []

# Cores
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []

Let's create a function to retreive and append the data to these arrays.

In [130]:
def getExtraData(eachh_data):
    rocket = eachh_data['rocket']
    launchpad = eachh_data['launchpad']
    payload = eachh_data['payloads']
    # Corrected: Access 'cores' column
    cores_data = eachh_data['cores']

    if rocket:
        response = requests.get(ROCKET_GET_API + str(rocket))
        if response.status_code == 200:
            rocket_data = response.json()
            BoosterVersion.append(rocket_data['name'])
        else:
            # Consider handling this exception gracefully instead of raising
            # to allow processing other rows if a single API call fails.
            raise Exception("Failed to fetch rocket data for ID: " + str(rocket))

    if launchpad:
        response = requests.get(LAUNCHPAD_API + str(launchpad))
        if response.status_code == 200:
            launchpad_data = response.json()
            Longitude.append(launchpad_data['longitude'])
            Latitude.append(launchpad_data['latitude'])
            LaunchSite.append(launchpad_data['name'])
        else:
            # Consider handling this exception gracefully.
            raise Exception("Failed to fetch launchpad data for ID: " + str(launchpad))

    if payload:
        # Assuming eachh_data['payloads'] is already a single payload ID string due to previous filtering
        response = requests.get(PAYLOAD_API + str(payload))
        if response.status_code == 200:
            payload_data = response.json()
            PayloadMass.append(payload_data.get('mass_kg', None)) # Use .get() for robustness
            Orbit.append(payload_data.get('orbit', None)) # Use .get() for robustness
        else:
             # Consider handling this exception gracefully.
            raise Exception("Failed to fetch payload data for ID: " + str(payload))

    # Corrected logic for processing cores data
    if cores_data and 'core' in cores_data and cores_data['core']: # Check if cores_data is not empty and contains 'core' key with a non-None value
        core_id = cores_data['core']
        response = requests.get(CORES_API + str(core_id))

        if response.status_code == 200:
            core_api_data = response.json()
            Block.append(core_api_data.get('block', None))
            ReusedCount.append(core_api_data.get('reuse_count', None))
            Serial.append(core_api_data.get('serial', None))
        else:
             # Consider handling this exception gracefully.
            raise Exception(f"Failed to fetch core API data for ID: {core_id}")
    else:
        # Append None for cores data if the core ID is missing or None
        Block.append(None)
        ReusedCount.append(None)
        Serial.append(None)

    # These fields are directly from the 'cores' dictionary in the original data
    Outcome.append(str(cores_data.get('landing_success', None))+' '+str(cores_data.get('landing_type', None)))
    Flights.append(cores_data.get('flight', None))
    GridFins.append(cores_data.get('gridfins', None))
    Reused.append(cores_data.get('reused', None))
    Legs.append(cores_data.get('legs', None))
    LandingPad.append(cores_data.get('landpad', None))

Let's apply this function to each datapoint.

In [131]:
data.apply(getExtraData, axis = 1)

Unnamed: 0,0
0,
1,
3,
4,
5,
...,...
101,
102,
103,
104,


Now, we can then turn this new data into the new dataframe

In [132]:
launch_dict = {'FlightNumber': list(data['flight_number']),
'Date': list(data['date']),
'BoosterVersion':BoosterVersion,
'PayloadMass':PayloadMass,
'Orbit':Orbit,
'LaunchSite':LaunchSite,
'Outcome':Outcome,
'Flights':Flights,
'GridFins':GridFins,
'Reused':Reused,
'Legs':Legs,
'LandingPad':LandingPad,
'Block':Block,
'ReusedCount':ReusedCount,
'Serial':Serial,
'Longitude': Longitude,
'Latitude': Latitude}


In [133]:
# Create a data from launch_dict
data = pd.DataFrame.from_dict(launch_dict)
data['Date'] = pd.to_datetime(launch_dict['Date'])



Finally, we can filter the dataframe to only include the Falcon 9.

In [134]:
data_falcon9 = data[data['BoosterVersion'] == 'Falcon 9'].reset_index(drop=True)
data_falcon9

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
1,8,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
2,10,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
3,11,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
4,12,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,102,2020-09-03,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,2,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1060,-80.603956,28.608058
86,103,2020-10-06,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,3,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,13,B1058,-80.603956,28.608058
87,104,2020-10-18,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,6,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1051,-80.603956,28.608058
88,105,2020-10-24,Falcon 9,15600.0,VLEO,CCSFS SLC 40,True ASDS,3,True,True,True,5e9e3033383ecbb9e534e7cc,5.0,12,B1060,-80.577366,28.561857



Now that we have removed some values we should reset the FlgihtNumber column

In [135]:
data_falcon9.loc[:,'FlightNumber'] = list(range(1, data_falcon9.shape[0]+1))
data_falcon9

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
1,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
2,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,86,2020-09-03,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,2,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1060,-80.603956,28.608058
86,87,2020-10-06,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,3,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,13,B1058,-80.603956,28.608058
87,88,2020-10-18,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,6,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1051,-80.603956,28.608058
88,89,2020-10-24,Falcon 9,15600.0,VLEO,CCSFS SLC 40,True ASDS,3,True,True,True,5e9e3033383ecbb9e534e7cc,5.0,12,B1060,-80.577366,28.561857


# Acknowledgments

This project was originally conceptualized and implemented by the following contributors:

- <a href="https://www.linkedin.com/in/yan-luo-96288783/">Yan Luo</a>  
- <a href="https://www.linkedin.com/in/nayefaboutayoun/">Nayef Abou Tayoun</a>  
- IBM Corporation
