# Collecting Wrangling Data from SpaceX Api

### Project Description

In this capstone project, we aim to predict whether the Falcon 9 first stage will successfully land. 

SpaceX advertises Falcon 9 rocket launches at a cost of 62 million USD, whereas other providers charge upwards of  165 million USD per launch. A significant portion of SpaceX’s cost savings comes from its ability to reuse the first stage of the rocket.

By predicting the success of the first stage landing, we can estimate the cost of a launch. This information can be valuable for alternative companies that may want to compete with SpaceX for launch contracts.

In this lab, we will:
- Collect data from the SpaceX API.
- Ensure the data is properly formatted for analysis.

## Goal / Objectives

### Objective
Reduce the cost of a launch by determining whether SpaceX can reuse the first stage of its rockets.

### Tasks
- Perform basic data wrangling and formatting.
- Retrieve data from the SpaceX API.
- Clean and preprocess the requested data.

## Import Libraries and Define Auxiliary Functions

In [None]:
# Requests allows us to make HTTP requests which we will use to get data from an API
import requests
# Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
# NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np
# Datetime is a library that allows us to represent dates
import datetime

# Setting this option will print all collumns of a dataframe
pd.set_option('display.max_columns', None)
# Setting this option will print all of the data in a feature
pd.set_option('display.max_colwidth', None)

## Request and parse the SpaceX launch data using the GET request

In [None]:
# Individual data calling API End point
# Takes the dataset and uses the rocket column to call the API and append the data to the list
def getBoosterVersion(data):
    for x in data['rocket']:
        print(x)
        if x:
            response = requests.get("https://api.spacexdata.com/v4/rockets/" + str(x)).json()
            BoosterVersion.append(response['name'])

In [None]:
# Takes the dataset and uses the launchpad column to call the API and append the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:
        if x:
            response = requests.get("https://api.spacexdata.com/v4/launchpads/" + str(x)).json()
            Longitude.append(response['longitude'])
            Latitude.append(response['latitude'])
            LaunchSite.append(response['name'])

In [None]:
# Takes the dataset and uses the payloads column to call the API and append the data to the lists
def getPayloadData(data):
    for load in data['payloads']:
        if load:
            response = requests.get("https://api.spacexdata.com/v4/payloads/" + load).json()
            PayloadMass.append(response['mass_kg'])
            Orbit.append(response['orbit'])

In [None]:
# Takes the dataset and uses the cores column to call the API and append the data to the lists
def getCoreData(data):
    for core in data['cores']:
        if core['core'] != None:
            response = requests.get("https://api.spacexdata.com/v4/cores/" + core['core']).json()
            Block.append(response['block'])
            ReusedCount.append(response['reuse_count'])
            Serial.append(response['serial'])
        else:
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)
        Outcome.append(str(core['landing_success']) + ' ' + str(core['landing_type']))
        Flights.append(core['flight'])
        GridFins.append(core['gridfins'])
        Reused.append(core['reused'])
        Legs.append(core['legs'])
        LandingPad.append(core['landpad'])

In [None]:
spacex_url = "https://api.spacexdata.com/v4/launches/past"

In [None]:
response = requests.get(spacex_url)
print(response.content[:300])

In [None]:
static_json_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'

In [None]:
response = requests.get(static_json_url)
data = pd.DataFrame(response.json())

In [None]:
data.head()

In [None]:
# Lets take a subset of our dataframe keeping only the features we want and the flight number, and date_utc.
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# We will remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
data = data[data['cores'].map(len) == 1]
data = data[data['payloads'].map(len) == 1]

# Since payloads and cores are lists of size 1 we will also extract the single value in the list and replace the feature.
data['cores'] = data['cores'].map(lambda x: x[0])
data['payloads'] = data['payloads'].map(lambda x: x[0])

# We also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Using the date we will restrict the dates of the launches
data = data[data['date'] <= datetime.date(2020, 11, 13)]

In [None]:
# # FUNCTION FOR CALLING API WITH EXCEPTION HANDING
# def load_API_call_space(static_url):
#     try:
#         response = requests.get(static_url)
#         response.raise_for_status()
#         json_data = response.json()
#         return pd.DataFrame(json_data)
#     except json.decoder.JSONDecodeError as json_error:
#         logging.error(f"JSON decoding error: {json_error}")
#         return pd.DataFrame()

In [37]:
#Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

In [38]:
# Check the variable
BoosterVersion

[]

In [39]:
# Call getBoosterVersion
getBoosterVersion(data)

5e9d0d95eda69955f709d1eb
5e9d0d95eda69955f709d1eb
5e9d0d95eda69955f709d1eb
5e9d0d95eda69955f709d1eb
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec


In [40]:
# Test The data is populating perfectly
getBoosterVersion(data)
BoosterVersion[0:5]

5e9d0d95eda69955f709d1eb
5e9d0d95eda69955f709d1eb
5e9d0d95eda69955f709d1eb
5e9d0d95eda69955f709d1eb
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec
5e9d0d95eda69973a809d1ec


['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

In [41]:
# Call getLaunchSite
getLaunchSite(data)

In [None]:
# Call getPayloadData
getPayloadData(data)

In [None]:
# Call getCoreData
getCoreData(data)

## Finally, let's construct our dataset using the data we have obtained. We we combine the columns into a dictionary.

In [None]:
launch_dict = {
    'FlightNumber': list(data['flight_number']),
    'Date': list(data['date']),
    'BoosterVersion': BoosterVersion,
    'PayloadMass': PayloadMass,
    'Orbit': Orbit,
    'LaunchSite': LaunchSite,
    'Outcome': Outcome,
    'Flights': Flights,
    'GridFins': GridFins,
    'Reused': Reused,
    'Legs': Legs,
    'LandingPad': LandingPad,
    'Block': Block,
    'ReusedCount': ReusedCount,
    'Serial': Serial,
    'Longitude': Longitude,
    'Latitude': Latitude
}

In [None]:
for key, value in launch_dict.items():
    print(f"{key}: {len(value)}")

In [None]:
# Create a data from launch_dict
df = pd.DataFrame(launch_dict)
df.head()

In [None]:
## Show info
df.info()

In [533]:
# Check the null value
nl = df.isna().sum()
nl = nl[nl > 0]
nl

PayloadMass     6
LandingPad     30
Block           4
dtype: int64

In [534]:
data['BoosterVersion'] = data[data['BoosterVersion']!='Falcon 1']

KeyError: 'BoosterVersion'