<a href="https://colab.research.google.com/github/Mick2007/3MTT-Data-science-course/blob/main/09_Intro_to_Web_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# J. Application Programming Interface (API)

An API simply connects apps together. There is usually a documentation available for any API documentation you are to use. It is important that you give them a read.

API's are also used to retrieve data from remote websites. To use an API, you make a request to a remote web server and retrieve the data you need. We use API in cases where the data quickly changes, e.g stock market data or in cases where we want a small pece of a much larger dataset, e,g twitter.


**Some communication protocols**
- FTP - File Transfer Protocol
- SFTP - Secure File Transfer Protocol
- SMTP - Simple Mail Transfer Protocol
- HTTP - Hyper Text Transfer Protocol
- HTTPS - Secure HTTP

**HTTP request modes** include: POST(create), GET(read/retrieve), PUT(update) and DELETE(delete).


**API Endpoint**: is a digital location where an API receives requests about a specific resource on its server.


**Making an API request**

- To get data, we make a HTTP request to a webserver
- The server then replies with our data. The requests module in python is used to do this
- There are many different we can make as well. GET request is the most common, used to retrieve data.


The image below provides a breakdown of the components of a URL

![URL_anatomy](https://drive.google.com/uc?export=view&id=1DoM5rPVH6JTv5i2gxzjMg8MsdduLPZjU)

In [13]:
#How to send a request to open Google home page in python

import requests
URL = 'https://google.com'
response = requests.get(URL)


#Alternatively, to send a request to open Google home page,
#you simply open your browser, type google.com in the url secton and hit enter

## Request and Response from Google Web server

In [2]:
#import library
import requests

In [3]:
#make a get request to fetch search results on 'hannah igboke'

response = requests.get('https://google.com/search?q=hannah+igboke')

In [4]:
#printing response headers

print(response.headers)

{'Content-Type': 'text/html; charset=ISO-8859-1', 'Content-Security-Policy': "object-src 'none';base-uri 'self';script-src 'nonce-5Uc3PM8PqCviIfred9XsOA' 'strict-dynamic' 'report-sample' 'unsafe-eval' 'unsafe-inline' https: http:;report-uri https://csp.withgoogle.com/csp/gws/xsrp", 'Accept-CH': 'Sec-CH-Prefers-Color-Scheme', 'P3P': 'CP="This is not a P3P policy! See g.co/p3phelp for more info."', 'Content-Encoding': 'gzip', 'Date': 'Tue, 24 Jun 2025 13:37:06 GMT', 'Server': 'gws', 'X-XSS-Protection': '0', 'X-Frame-Options': 'SAMEORIGIN', 'Expires': 'Tue, 24 Jun 2025 13:37:06 GMT', 'Cache-Control': 'private', 'Set-Cookie': 'AEC=AVh_V2g4Kdv_jMmbFnevir3OGcaPQ6jr33EFWZpm2zINHDRrOAQSPC4KzgE; expires=Sun, 21-Dec-2025 13:37:06 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax, NID=524=jDrAxsbDOPKg46vJfG0TK6r8WFUUqojMH1ipuRxwjLs-PBtba5yWAJTENgZDcyrf3yvOiL-8FkD7KrqugZLQgJeZe5GIVXXwl_XvH3sCz_iRytMWMITZNo88eclbgZRtkNU_CIa4hIiwFSTBmvukoFXV1x_cBxEovtHrcgqczF9Sgf6jtme4qgQSMzY8MwPUazIjs

In [5]:
#checking the response url

print(response.url)

https://www.google.com/search?q=hannah+igboke


In [6]:
#check the status code of the response

print(response.status_code)

#200 means it was successful

200


In [7]:
#checking response encoding

print(response.encoding)

ISO-8859-1


In [8]:
#retrieving header properties the response content type

print(response.headers['Content-Type'])
print(response.headers['Cache-Control'])

text/html; charset=ISO-8859-1
private


In [14]:
#text response content
#this is useful if the response body contains text data

print(response.text[0:100])

#you can also try
print(response.text)

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp, " name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="9gf6Gb4C_lBDWn4az6uW8g">(function(){var _g={kEI:'sapaaKvPC7nNkPIP-ZKf8Qs',kEXPI:'0,202792,62,2,610014,1284,2886154,131,945,538661,14112,34679,30022,255377,105524,94242,153078,23156,19568,15664,5226018,364,36812278,25228681,113800,10188,14280,14115,62507,2661,3433,3319,23879,9138,4600,328,6225,2949,34872,12755,3614,9975,15048,8211,3286,4134,30380,28333,48280,625,5307,353,14656,422

In [10]:
#for cases where the response body contains non-text data
# binary response content

print(response.content[0:100])

b'<!DOCTYPE html><html lang="en"><head><title>Google Search</title><style>body{background-color:#fff}<'


In [11]:
#JSON response content

print(response.json())

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The error lets us know that no json data found

## Status code

- 1xx = information
- 2xx = success
- 3xx = redirection

    - 301: the server is redirecting you to a different endpoint. This happens when a company switches domain names or an endpoint name is changed
    
    
- 4xx = client error

    - 400: bad request caused by not sending the right details
    - 401: server thinks you're not authenticated. This happens when you don't send the right credentials to access an API
    - 403: resource you're trying to access is forbidden. You don't have the right permission to access it
    - 404: the resource tried was not found on the server


- 5xx = server error

In [15]:
#trying an endpoint that does not exist

response = requests.get('https://google.com/abcd')
print(response.status_code)

#an ednpoint that exists
response = requests.get('https://google.com/search')
print(response.status_code)

404
200


## Query parameters

In [16]:
#request without query parameter

import requests
response = requests.get('https://google.com/search')
print(response.status_code)

200


In [17]:
#request with query parameter

import requests

parameters = {'q': 'Hannah Igboke'}
response = requests.get('https://google.com/search', params = parameters)
print(response.status_code)

200


## Request and Response from ISS API

Using OpenNotify API. It has several API endpoints. Like I mentioned earlier, you can find the other API endpoints in the documentation: http://open-notify.org/Open-Notify-API/.

**Endpoints:**

1. `/iss-now.json`

This api returns the current location of the ISS. It returns the current latitude and longitude of the space station with a unix timestamp for the time the location was valid. This API takes no inputs. Remember that this data is not static and changes rapidly.

In [18]:
#making a request to get the latest position of ISS from the opennotify api

import requests
response = requests.get('http://api.open-notify.org/iss-now.json')
print(response.headers)
print(response.url)
print(response.status_code)
print(response.encoding)

{'Server': 'nginx/1.10.3', 'Date': 'Tue, 24 Jun 2025 14:06:02 GMT', 'Content-Type': 'application/json', 'Content-Length': '113', 'Connection': 'keep-alive', 'access-control-allow-origin': '*'}
http://api.open-notify.org/iss-now.json
200
utf-8


In [19]:
#printing text content

print(response.text)

{"message": "success", "timestamp": 1750773962, "iss_position": {"longitude": "-5.0492", "latitude": "-31.5039"}}


In [20]:
#printing binary content

print(response.content)

b'{"message": "success", "timestamp": 1750773962, "iss_position": {"longitude": "-5.0492", "latitude": "-31.5039"}}'


In [21]:
#printing json content

print(response.json())

{'message': 'success', 'timestamp': 1750773962, 'iss_position': {'longitude': '-5.0492', 'latitude': '-31.5039'}}


2. `/astros.json`

This API returns the current number of people in space.

In [22]:
import requests

response = requests.get('http://api.open-notify.org/astros.json')
print(response.headers)

{'Server': 'nginx/1.10.3', 'Date': 'Tue, 24 Jun 2025 14:06:02 GMT', 'Content-Type': 'application/json', 'Content-Length': '587', 'Connection': 'keep-alive', 'access-control-allow-origin': '*'}


In [23]:
data = response.json()

print(type(data))

#to verify that the response data is a dictionary

<class 'dict'>


In [24]:
data

{'people': [{'craft': 'ISS', 'name': 'Oleg Kononenko'},
  {'craft': 'ISS', 'name': 'Nikolai Chub'},
  {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'},
  {'craft': 'ISS', 'name': 'Matthew Dominick'},
  {'craft': 'ISS', 'name': 'Michael Barratt'},
  {'craft': 'ISS', 'name': 'Jeanette Epps'},
  {'craft': 'ISS', 'name': 'Alexander Grebenkin'},
  {'craft': 'ISS', 'name': 'Butch Wilmore'},
  {'craft': 'ISS', 'name': 'Sunita Williams'},
  {'craft': 'Tiangong', 'name': 'Li Guangsu'},
  {'craft': 'Tiangong', 'name': 'Li Cong'},
  {'craft': 'Tiangong', 'name': 'Ye Guangfu'}],
 'number': 12,
 'message': 'success'}

### Finding the number and names of astronaunts

In [25]:
data.values()

dict_values([[{'craft': 'ISS', 'name': 'Oleg Kononenko'}, {'craft': 'ISS', 'name': 'Nikolai Chub'}, {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'}, {'craft': 'ISS', 'name': 'Matthew Dominick'}, {'craft': 'ISS', 'name': 'Michael Barratt'}, {'craft': 'ISS', 'name': 'Jeanette Epps'}, {'craft': 'ISS', 'name': 'Alexander Grebenkin'}, {'craft': 'ISS', 'name': 'Butch Wilmore'}, {'craft': 'ISS', 'name': 'Sunita Williams'}, {'craft': 'Tiangong', 'name': 'Li Guangsu'}, {'craft': 'Tiangong', 'name': 'Li Cong'}, {'craft': 'Tiangong', 'name': 'Ye Guangfu'}], 12, 'success'])

In [26]:
#how many people are in space?

print(data['number'])

12


In [27]:
#who are these people

print(data['people'])

[{'craft': 'ISS', 'name': 'Oleg Kononenko'}, {'craft': 'ISS', 'name': 'Nikolai Chub'}, {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'}, {'craft': 'ISS', 'name': 'Matthew Dominick'}, {'craft': 'ISS', 'name': 'Michael Barratt'}, {'craft': 'ISS', 'name': 'Jeanette Epps'}, {'craft': 'ISS', 'name': 'Alexander Grebenkin'}, {'craft': 'ISS', 'name': 'Butch Wilmore'}, {'craft': 'ISS', 'name': 'Sunita Williams'}, {'craft': 'Tiangong', 'name': 'Li Guangsu'}, {'craft': 'Tiangong', 'name': 'Li Cong'}, {'craft': 'Tiangong', 'name': 'Ye Guangfu'}]


In [28]:
#returning the names of these astronaunts

people_in_space = data['people']
for ast in people_in_space:
    print(ast['name'])

Oleg Kononenko
Nikolai Chub
Tracy Caldwell Dyson
Matthew Dominick
Michael Barratt
Jeanette Epps
Alexander Grebenkin
Butch Wilmore
Sunita Williams
Li Guangsu
Li Cong
Ye Guangfu


## Converting the JSON data from API to dataframe

Many APIs return data in JSON format. We need to convert this JSON file into a dataframe that can be easily analyzed.

**Steps**:

- Use the request library to make the https requests to the API endpoint and retrieve the data. The API key, if required, needs to be included in the request headers

- Once you have obtained the data, you can push it in a pandas df to analyze and process it.

### Way 1

In [29]:
import requests
import pandas as pd

response = requests.get('http://api.open-notify.org/astros.json')

data = response.json()

df = pd.DataFrame(data)

df

Unnamed: 0,people,number,message
0,"{'craft': 'ISS', 'name': 'Oleg Kononenko'}",12,success
1,"{'craft': 'ISS', 'name': 'Nikolai Chub'}",12,success
2,"{'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'}",12,success
3,"{'craft': 'ISS', 'name': 'Matthew Dominick'}",12,success
4,"{'craft': 'ISS', 'name': 'Michael Barratt'}",12,success
5,"{'craft': 'ISS', 'name': 'Jeanette Epps'}",12,success
6,"{'craft': 'ISS', 'name': 'Alexander Grebenkin'}",12,success
7,"{'craft': 'ISS', 'name': 'Butch Wilmore'}",12,success
8,"{'craft': 'ISS', 'name': 'Sunita Williams'}",12,success
9,"{'craft': 'Tiangong', 'name': 'Li Guangsu'}",12,success


In [30]:
#To remove the message and number columns

astronauts = pd.DataFrame(data['people'])
astronauts

Unnamed: 0,craft,name
0,ISS,Oleg Kononenko
1,ISS,Nikolai Chub
2,ISS,Tracy Caldwell Dyson
3,ISS,Matthew Dominick
4,ISS,Michael Barratt
5,ISS,Jeanette Epps
6,ISS,Alexander Grebenkin
7,ISS,Butch Wilmore
8,ISS,Sunita Williams
9,Tiangong,Li Guangsu


### Way 2

Using pd.json_normalize(). The `json_normalize` function in pandas is used to normalize semi-structures JSON data into a flat table.

In [31]:
response = requests.get('http://api.open-notify.org/astros.json')

data = response.json()

df = pd.json_normalize(data, record_path)
df

NameError: name 'record_path' is not defined

In [None]:
#It didn't come out as expected. We want the results to be like that of way 1

response = requests.get('http://api.open-notify.org/astros.json')

data = response.json()

df = pd.json_normalize(data, record_path=['people'])
df

### Way 3

In [None]:
df = pd.read_json('http://api.open-notify.org/astros.json')
df

In [None]:
#Again we are only intrested in the people column


astronauts = pd.json_normalize(df['people'])
astronauts

## WORKING WITH JSON

- Convertng python objects to json and vice versa using **dumps() and loads()**
- Saving the data to json files and vice versa using **dump() and load()**


### Python objects and json

### dumps() and loads()

We can convert lists, tuples, and dictionaries (python objects) to JSON strings and convert JSON strings to lists and dictionaries.

- `json.dumps(python_object)`: converts python objects to json strings

- `json.loads(json_string)`: converts json string to python object

In [None]:
#first import the json file

import json

char = ['Black widow', 'Hawkeye', 'Katiness', 'Iron man']

print(type(char))

In [None]:
#converting python object to json string

char_string = json.dumps(char)
print(type(char_string))

In [None]:
#converting json string to python objects
char_object = json.loads(char_string)

print(type(char_object))

This same concept applies also to dictionaries and tuples.

### Creating, and reading JSON files

### dump() and load()

The json module has two main methods for doing this

- `json.dump(data, file_object)`: writes data into a json file format
- `json.load(file_object)`: for reading a json file


In [None]:
#making a dictionary

fast_food_franchise = {
    'Subway': 24722,
    'McDonalds': 14098,
    'Starbucks': 10821,
    'Pizza Hut': 34599
}

print(type(fast_food_franchise))

In [None]:
#writing the data in  the dictionary to a file

import json

file_name = 'my_fast_food.json'

with open(file_name, 'w') as f:
    json.dump(fast_food_franchise, f)

f = open('my_fast_food.json', 'r')
data = f.read()

print(data)
print(type(data))
f.close()

In [None]:
#reading the data from the json file

file = 'my_fast_food.json'

with open(file, 'r') as f:
    data = json.load(f)

print(data)
print(type(data))

## Extracting crypto data using Coingecko API

Root_url = "https://api.coingecko.com/api/v3"

This is the link to the public API documentation.

Coingecko was founded in 2014 in order to democratize the access of crypto data and empower users with actionable insights.

API End points:

- Ping: `/ping`
- Coin list: `/coin/list`
- Coin market: `/coins/markets`
- Coin history: `/coins/{id}/history`
- Coin market chart: `coins/{id}/market_chart`

### Endpoint Ping

In [None]:
import requests

root_url = "https://api.coingecko.com/api/v3"
endpoint = "/ping"

response =requests.get(root_url+endpoint)

response.status_code

In [None]:
response.headers

In [None]:
print(response.headers['Content-Type'])

print(response.text)

### Endpoint coin list

Use this to obtain all the coins id in order to make API calls

In [None]:
import requests

endpoint = '/coins/list'

response = requests.get(root_url+endpoint)

response

In [None]:
#to retrieve the coins list

response.json()[:5]

In [None]:
#converting to a dataframe

import pandas as pd

data = response.json()

df = pd.DataFrame(data) #You can also use df = pd.read_json(root_url+endpoint)
df.head()

In [None]:
df.shape

In [None]:
#to retrieve data for btc, eth and doge coins

df.loc[(df.symbol=='btc') | (df.symbol=='eth') | (df.symbol=='doge')]

### Endpoint coin markets

Use this to obtain all the coins market data (price, market cap, volume)

In [None]:
import requests

endpoint = '/coins/markets'

parameters = {'vs_currency': 'usd', 'ids':'bitcoin, dogecoin, ethereum'}

response = requests.get(root_url+endpoint, params=parameters)
response

In [None]:
response.json()

In [None]:
#converting to a dataframe

df = pd.json_normalize(response.json())
df.head()

### Endpoint coins history

Get historical data (name, price, market, stats) at a given date for a coin

In [None]:
#retrieve today's date
from datetime import date

today = date.today().strftime('%d-%m-%Y')

print('Today: ', today)

In [None]:
import requests

uid = 'bitcoin'

root_url = 'https://api.coingecko.com/api/v3'
endpoint = f'/coins/{uid}/history' #don't forget the 'f' string, it is crucial for the {uid} value to be collected
parameter = {'date': today}

response = requests.get(root_url+endpoint, params = parameter)

response

In [None]:
data = response.json()
data

In [None]:
#storing the data in a dataframe

url = f'https://api.coingecko.com/api/v3/coins/bitcoin/history?date={today}'

df = pd.read_json(url)
df.head()

**The output is not as expected wheen we compare this dataframe with the dictioanry form output in the code block above. Let's fix this.**

In [None]:
#saving the data in a json file
file_name = f'bitcoin_history_{today}.json'

with open(file_name, 'w') as f:
    json.dump(data, f)


In [None]:
#lets have a look at the the keys of our data

data.keys()

In [None]:
#retrieving market data

market_data = data['market_data']
print(market_data)

In [None]:
df = pd.DataFrame(market_data)
df.head()

In [None]:
df.shape

In [None]:
#to change the indexing of the data

df = df.reset_index()
df.head()

In [None]:
#renaming the index column
df = df.rename(columns={'index': 'symbol'})
df.head()

### Endpoint `coins/{id}/market_chart`

Gets historical market data including price, market cap, and 24hour volume(granularity auto)

- Data granularity is automatic (and cannot be adjusted)
- 1 day from current time = 5 minute interval data
- 1-90 days from current time = hourly data
- Above 90 days from current time = dail data (00:00 UTC)

In [None]:
import requests

uid = 'bitcoin'

root_url = 'https://api.coingecko.com/api/v3'
endpoint = f'/coins/{uid}/market_chart'
parameters = {'vs_currency': 'usd', 'days': '1'}

response = requests.get(root_url+endpoint, params =parameters)
response.status_code

In [None]:
response.json()

In [None]:
data = response.json()
data.keys()

In [None]:
#retrieves the first 5 prices from the data
data['prices'][:5]

In [None]:
#retrieves the first 5 market caps
data['market_caps'][:5]

In [None]:
#creating a dataframe

df = pd.DataFrame(data)
df.head()

In [None]:
#notice that for each of the columns we have a timestamp attached in each list
#we need to extract this

#to see what it looks like
df['prices'].str[0]

In [None]:
#adding a timestamp column
#str[0] extracts the first character from each price string in the prices column

df['timestamp'] = df['prices'].str[0]
df.head()

In [None]:
#lets further simplify the table

df['prices'] = df['prices'].str[1]
df['market_caps'] = df['market_caps'].str[1]
df['total_volumes'] = df['total_volumes'].str[1]

df.head()

In [None]:
#Rearranging the dataframe

df = df[['timestamp', 'prices', 'market_caps', 'total_volumes']]
df

In [None]:
#to obtain more information on the the dataframe

df.describe()

In [None]:
df.info()

In [None]:
#the timestamp column is in integer instead of datetime data type
pd.to_datetime(df['timestamp'])

In [None]:
#we can see that the unit for the output is in nano seconds
#converting to microseconds

pd.to_datetime(df['timestamp'], unit = 'ms')

In [None]:
#this is good to go, lets replace the timestamp column

df['timestamp'] = pd.to_datetime(df['timestamp'], unit = 'ms')
df.info()

In [None]:
df.head()

In [None]:
df.describe()