# Interacting with HTML and Web APIs

Many websites have public APIs providing data feeds via JSON or some other format. There are a number of ways to access these APIs from Python; one easy-to-use method that I recommend is the requests package (http://docs.python-requests.org). To search for the words “python pandas” on Twitter, we can make an HTTP GET request like so:

In [56]:
import requests
from pandas import DataFrame

In [20]:
url ='https://api.publicapis.org/entries'

In [21]:
resp = requests.get(url)

In [22]:
resp

<Response [200]>

The Response object’s text attribute contains the content of the GET query. Many web APIs will return a JSON string that must be loaded into a Python object:

In [23]:
import json

In [27]:
data = json.loads(resp.text)

In [25]:
data.keys()

dict_keys(['count', 'entries'])

The results field in the response contains a list of public APIs, each of which is represented as a Python dict that looks like:

In [54]:
data

{'count': 1425,
 'entries': [{'API': 'AdoptAPet',
   'Description': 'Resource to help get pets adopted',
   'Auth': 'apiKey',
   'HTTPS': True,
   'Cors': 'yes',
   'Link': 'https://www.adoptapet.com/public/apis/pet_list.html',
   'Category': 'Animals'},
  {'API': 'Axolotl',
   'Description': 'Collection of axolotl pictures and facts',
   'Auth': '',
   'HTTPS': True,
   'Cors': 'no',
   'Link': 'https://theaxolotlapi.netlify.app/',
   'Category': 'Animals'},
  {'API': 'Cat Facts',
   'Description': 'Daily cat facts',
   'Auth': '',
   'HTTPS': True,
   'Cors': 'no',
   'Link': 'https://alexwohlbruck.github.io/cat-facts/',
   'Category': 'Animals'},
  {'API': 'Cataas',
   'Description': 'Cat as a service (cats pictures and gifs)',
   'Auth': '',
   'HTTPS': True,
   'Cors': 'no',
   'Link': 'https://cataas.com/',
   'Category': 'Animals'},
  {'API': 'Cats',
   'Description': 'Pictures of cats from Tumblr',
   'Auth': 'apiKey',
   'HTTPS': True,
   'Cors': 'no',
   'Link': 'https://docs

We can then make a list of the APIs fields of interest then pass the results list to DataFrame:

In [68]:
api_fields = ['API', 'Description', 'Auth', 'HTTPS', 'Cors', 'Link', 'Category']

In [71]:
api = DataFrame(data['entries'], columns= api_fields)

In [74]:
api.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1425 entries, 0 to 1424
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   API          1425 non-null   object
 1   Description  1425 non-null   object
 2   Auth         1425 non-null   object
 3   HTTPS        1425 non-null   bool  
 4   Cors         1425 non-null   object
 5   Link         1425 non-null   object
 6   Category     1425 non-null   object
dtypes: bool(1), object(6)
memory usage: 68.3+ KB


Each row in the DataFrame now has the extracted data from each API:

In [97]:
api.iloc[660]

API                                           CitySDK
Description      Open APIs for select European cities
Auth                                                 
HTTPS                                            True
Cors                                          unknown
Link           http://www.citysdk.eu/citysdk-toolkit/
Category                                    Geocoding
Name: 660, dtype: object