## Data Collection Using API



- **Define and request the response** from the SpaceX API:
  - URL: `https://api.spacexdata.com/v4/launches/past`

- **Check API response**:
  - Ensure response health status is `200`.

- **Decode response content** as JSON using `json.normalize()`.

- **Keep the following rows** from the response:
  - Rocket
  - Payload
  - Launchpad
  - Cores
  - Flight number
  - Date_utc

- **Filter the data** based on the following criteria:
  - Core == 1
  - Payload == 1
  - Date before 2023

- **Define functions** for handling different parts of the data:
  - Get booster data
  - Get launch site
  - Get payload data
  - Get core data

- **Define the data to fetch** from the API:
  - Booster version
  - Payload mass
  - Orbit
  - Launch site
  - Outcome
  - Flight number
  - Gridfins
  - Reused
  - Legs
  - Landing pad
  - Block
  - Reused counts
  - Serial
  - Longitude
  - Latitude

- **Assemble the data** into a single dictionary with format `{string: list}`.

- **Convert the dictionary into a DataFrame** for easier manipulation.

- **Filter out booster == 'Falcon 9'**.

- **Order the data by Date** and assign flight numbers.

- **Check for null values** and apply suitable methods to clean the data.

- **Save the data** in a CSV file for future use.


In [37]:
# import libraries and set Auxiliary values


import numpy as np
import pandas as pd
import requests
import datetime



In [38]:
# Define and request the response from the SpaceX API:

response = requests.get(url= r'https://api.spacexdata.com/v4/launches/past')
# print(response.content)  # To check content of Response

print(response.status_code) # Check Result status of response; Crucial for monitoring API health



200


In [39]:
# jsonify the repsonse

json_data = pd.json_normalize(response.json()) #converted a json data into pandas dataframe

json_data.columns

Index(['static_fire_date_utc', 'static_fire_date_unix', 'net', 'window',
       'rocket', 'success', 'failures', 'details', 'crew', 'ships', 'capsules',
       'payloads', 'launchpad', 'flight_number', 'name', 'date_utc',
       'date_unix', 'date_local', 'date_precision', 'upcoming', 'cores',
       'auto_update', 'tbd', 'launch_library_id', 'id', 'fairings.reused',
       'fairings.recovery_attempt', 'fairings.recovered', 'fairings.ships',
       'links.patch.small', 'links.patch.large', 'links.reddit.campaign',
       'links.reddit.launch', 'links.reddit.media', 'links.reddit.recovery',
       'links.flickr.small', 'links.flickr.original', 'links.presskit',
       'links.webcast', 'links.youtube_id', 'links.article', 'links.wikipedia',
       'fairings'],
      dtype='object')

In [40]:
#Filter the required Data columns:

json_data = json_data[['rocket' , 
                      'payloads',
                      'launchpad',
                      'cores',
                      'flight_number',
                      'date_utc']]

print(json_data.columns)
print(json_data.head(1))


Index(['rocket', 'payloads', 'launchpad', 'cores', 'flight_number',
       'date_utc'],
      dtype='object')
                     rocket                    payloads  \
0  5e9d0d95eda69955f709d1eb  [5eb0e4b5b6c3bb0006eeb1e1]   

                  launchpad  \
0  5e9e4502f5090995de566f86   

                                               cores  flight_number  \
0  [{'core': '5e9e289df35918033d3b2623', 'flight'...              1   

                   date_utc  
0  2006-03-24T22:30:00.000Z  


In [41]:
# Filter out the Data as per following Criteria:

json_data = json_data[json_data['cores'].map(len)==1] #core present in Aircraft
json_data = json_data[json_data['payloads'].map(len)==1] #engine present in Aircraft


#since cores and payload are list of size 1, extract that value & make it as single value
json_data['cores'] = json_data['cores'].map(lambda x:x[0])
json_data['payloads'] = json_data['payloads'].map(lambda x:x[0])


json_data['date']       = pd.to_datetime(json_data['date_utc']).dt.date
json_data = json_data[json_data['date'] <= datetime.date(2024, 12, 31)]


json_data.shape

(172, 7)

In [42]:
json_data.head(2)

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc,date
0,5e9d0d95eda69955f709d1eb,5eb0e4b5b6c3bb0006eeb1e1,5e9e4502f5090995de566f86,"{'core': '5e9e289df35918033d3b2623', 'flight':...",1,2006-03-24T22:30:00.000Z,2006-03-24
1,5e9d0d95eda69955f709d1eb,5eb0e4b6b6c3bb0006eeb1e2,5e9e4502f5090995de566f86,"{'core': '5e9e289ef35918416a3b2624', 'flight':...",2,2007-03-21T01:10:00.000Z,2007-03-21


In [55]:
#extractable data from Cores
print(json_data.loc[0,'cores'])


{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}


In [63]:
# Define lists of usable parameters:

BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

In [64]:
# fetch booster version
def get_booster_version(data):
    for x in data['rocket']:
        response = requests.get(url= f"https://api.spacexdata.com/v4/rockets/{str(x)}").json()
        BoosterVersion.append(response['name'])

get_booster_version(json_data)
print(len(BoosterVersion))


172


In [65]:
# Fetch launchsite 

def get_launchsite(data):
    for x in data['launchpad']:
        response = requests.get(url = f"https://api.spacexdata.com/v4/launchpads/{str(x)}").json()
        LaunchSite.append(response['name'])
        Longitude.append(response['longitude'])
        Latitude.append(response['latitude'])


get_launchsite(json_data)
print(f"{len(LaunchSite)} --- {len(Longitude)} --- {len(Latitude)}")

172 --- 172 --- 172


In [67]:
# fetch Payload

def get_payload(data):
    for x in data['payloads']:
        response = requests.get(f"https://api.spacexdata.com/v4/payloads/{str(x)}").json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

get_payload(json_data)

print(f"{len(PayloadMass)} --- {len(Orbit)}")

172 --- 172


In [68]:
def get_core(data):
        for core in data['cores']:
            if core['core'] != None:
                response = requests.get(f"https://api.spacexdata.com/v4/cores/{core['core']}").json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])



get_core(json_data)
print(f" {len(Block)} --- {len(ReusedCount)} --- {len(Serial)} --- {len(Outcome)} ")
print(f" {len(Flights)} --- {len(GridFins)} --- {len(Legs)} --- {len(LandingPad)} ")

 172 --- 172 --- 172 --- 172 
 172 --- 172 --- 172 --- 172 


In [69]:
# assembling the Data


assemble_data = {
    'FlightNumber' : list(json_data['flight_number']),
    'Date' : list(json_data['date']),
    'BoosterVersion':BoosterVersion,
    'PayloadMass':PayloadMass,
    'Orbit':Orbit,
    'LaunchSite':LaunchSite,
    'Outcome':Outcome,
    'Flights':Flights,
    'GridFins':GridFins,
    'Reused':Reused,
    'Legs':Legs,
    'LandingPad':LandingPad,
    'Block':Block,
    'ReusedCount':ReusedCount,
    'Serial':Serial,
    'Longitude': Longitude,
    'Latitude': Latitude
}

launch_data = pd.DataFrame(assemble_data)

In [70]:
launch_data.head(2)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721


In [72]:
launch_data.shape

(172, 17)

In [74]:
# Filter the Falcon 9 data

launch_data_falcon9 = launch_data[launch_data['BoosterVersion']=='Falcon 9']
launch_data_falcon9.shape

(168, 17)

In [75]:
launch_data_falcon9.loc[:,"FlightNumber"] = list(range(1, launch_data_falcon9.shape[0]+1))
launch_data_falcon9

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
167,164,2022-08-28,Falcon 9,13260.0,VLEO,KSC LC 39A,True ASDS,2,True,True,True,5e9e3033383ecb075134e7cd,5.0,1,B1069,-80.603956,28.608058
168,165,2022-08-31,Falcon 9,13260.0,VLEO,VAFB SLC 4E,True ASDS,7,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,6,B1063,-120.610829,34.632093
169,166,2022-09-17,Falcon 9,13260.0,VLEO,CCSFS SLC 40,True ASDS,6,True,True,True,5e9e3033383ecbb9e534e7cc,5.0,5,B1067,-80.577366,28.561857
170,167,2022-09-24,Falcon 9,13260.0,VLEO,CCSFS SLC 40,True ASDS,4,True,True,True,5e9e3033383ecbb9e534e7cc,5.0,0,B1072,-80.577366,28.561857


In [79]:
launch_data_falcon9.isnull().sum()


FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass       22
Orbit              1
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

In [81]:
launch_data_falcon9.to_csv("Collected_data_falcon_9.csv", index=False)