# Collecting data from an API

API stands for Application Programming Interface. It is a set of protocols, routines, and tools that allows different software applications to communicate with each other. An API specifies how software components should interact, making it easier for developers to integrate different applications and services together without having to understand the underlying code. APIs can be used to perform various tasks, such as retrieving data, submitting requests, or accessing services provided by other software applications.

In this notebook we will collect data and make sure that it is in the correct format from an API

In [2]:
#First, we import the following libraries
# Requests allows us to make HTTP requests which we will use to get data from an API
import requests
# Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
# NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np
# Datetime is a library that allows us to represent dates
import datetime
import requests
import json

In [3]:
#Now let's start requesting rocket launch data from SpaceX API with the following URL:
spacex_url="https://api.spacexdata.com/v4/launches/past"

In [4]:
response = requests.get(spacex_url)

In [6]:
# Use json_normalize method to convert the json result into a dataframe
datas= requests.get(spacex_url)
results = json.loads(datas.text)
data=pd.DataFrame(results)



In [7]:
# Get the head of the dataframe
data.head()

Unnamed: 0,auto_update,capsules,cores,crew,date_local,date_precision,date_unix,date_utc,details,failures,fairings,flight_number,id,launch_library_id,launchpad,links,name,net,payloads,rocket,ships,static_fire_date_unix,static_fire_date_utc,success,tbd,upcoming,window
0,True,[],"[{'core': '5e9e289df35918033d3b2623', 'flight'...",[],2006-03-25T10:30:00+12:00,hour,1143239400,2006-03-24T22:30:00.000Z,Engine failure at 33 seconds and loss of vehicle,"[{'time': 33, 'altitude': None, 'reason': 'mer...","{'reused': False, 'recovery_attempt': False, '...",1,5eb87cd9ffd86e000604b32a,,5e9e4502f5090995de566f86,{'patch': {'small': 'https://images2.imgbox.co...,FalconSat,False,[5eb0e4b5b6c3bb0006eeb1e1],5e9d0d95eda69955f709d1eb,[],1142554000.0,2006-03-17T00:00:00.000Z,False,False,False,0.0
1,True,[],"[{'core': '5e9e289ef35918416a3b2624', 'flight'...",[],2007-03-21T13:10:00+12:00,hour,1174439400,2007-03-21T01:10:00.000Z,Successful first stage burn and transition to ...,"[{'time': 301, 'altitude': 289, 'reason': 'har...","{'reused': False, 'recovery_attempt': False, '...",2,5eb87cdaffd86e000604b32b,,5e9e4502f5090995de566f86,{'patch': {'small': 'https://images2.imgbox.co...,DemoSat,False,[5eb0e4b6b6c3bb0006eeb1e2],5e9d0d95eda69955f709d1eb,[],,,False,False,False,0.0
2,True,[],"[{'core': '5e9e289ef3591814873b2625', 'flight'...",[],2008-08-03T15:34:00+12:00,hour,1217734440,2008-08-03T03:34:00.000Z,Residual stage 1 thrust led to collision betwe...,"[{'time': 140, 'altitude': 35, 'reason': 'resi...","{'reused': False, 'recovery_attempt': False, '...",3,5eb87cdbffd86e000604b32c,,5e9e4502f5090995de566f86,{'patch': {'small': 'https://images2.imgbox.co...,Trailblazer,False,"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006e...",5e9d0d95eda69955f709d1eb,[],,,False,False,False,0.0
3,True,[],"[{'core': '5e9e289ef3591855dc3b2626', 'flight'...",[],2008-09-28T11:15:00+12:00,hour,1222643700,2008-09-28T23:15:00.000Z,Ratsat was carried to orbit on the first succe...,[],"{'reused': False, 'recovery_attempt': False, '...",4,5eb87cdbffd86e000604b32d,,5e9e4502f5090995de566f86,{'patch': {'small': 'https://images2.imgbox.co...,RatSat,False,[5eb0e4b7b6c3bb0006eeb1e5],5e9d0d95eda69955f709d1eb,[],1221869000.0,2008-09-20T00:00:00.000Z,True,False,False,0.0
4,True,[],"[{'core': '5e9e289ef359184f103b2627', 'flight'...",[],2009-07-13T15:35:00+12:00,hour,1247456100,2009-07-13T03:35:00.000Z,,[],"{'reused': False, 'recovery_attempt': False, '...",5,5eb87cdcffd86e000604b32e,,5e9e4502f5090995de566f86,{'patch': {'small': 'https://images2.imgbox.co...,RazakSat,False,[5eb0e4b7b6c3bb0006eeb1e6],5e9d0d95eda69955f709d1eb,[],,,True,False,False,0.0


In [8]:
# Lets take a subset of our dataframe keeping only the following features.
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]


# We also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Using the date we will restrict the dates of the launches
data = data[data['date'] <= datetime.date(2020, 11, 13)]

In [9]:
#We visualize our data again
data.head()

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc,date
0,5e9d0d95eda69955f709d1eb,[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,"[{'core': '5e9e289df35918033d3b2623', 'flight'...",1,2006-03-24T22:30:00.000Z,2006-03-24
1,5e9d0d95eda69955f709d1eb,[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef35918416a3b2624', 'flight'...",2,2007-03-21T01:10:00.000Z,2007-03-21
2,5e9d0d95eda69955f709d1eb,"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006e...",5e9e4502f5090995de566f86,"[{'core': '5e9e289ef3591814873b2625', 'flight'...",3,2008-08-03T03:34:00.000Z,2008-08-03
3,5e9d0d95eda69955f709d1eb,[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef3591855dc3b2626', 'flight'...",4,2008-09-28T23:15:00.000Z,2008-09-28
4,5e9d0d95eda69955f709d1eb,[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,"[{'core': '5e9e289ef359184f103b2627', 'flight'...",5,2009-07-13T03:35:00.000Z,2009-07-13


## Data Wrangling

In [11]:
#We can see below that we don't have any missing values in our data set
data.isnull().sum()

rocket           0
payloads         0
launchpad        0
cores            0
flight_number    0
date_utc         0
date             0
dtype: int64

In [None]:
#Then, our dataset is ready to be exported as a csv document to perform visualization analysis or a machine learning algorithm
data.to_csv('dataset_part_1.csv', index=False)