# From Web API in JSON Format to Pandas with Python

Credits and References:
- By: Justin Chae | @justinhchae | https://medium.com/@jhc154 | https://www.linkedin.com/in/justin-chae
- https://colab.research.google.com/drive/1yD3aOCI4XFrfpBqNXlxStU5veRPHvaL0?usp=sharing
- https://plainenglish.io/blog/from-api-to-pandas-getting-json-data-with-python-df127f699b6b

- https://medium.com/swlh/handle-json-data-using-json-and-pandas-in-python-9ff6bbd0d356
- https://opendata.dc.gov/
(Go to API explorer of the chosen dataset and select the Query URL)

- https://medium.com/@technige/what-does-requests-offer-over-urllib3-in-2022-e6a38d9273d9
- https://www.zenrows.com/blog/urllib3-vs-requests#feature-comparison

An example of getting JSON data from an Open Data site with Python into a Pandas Dataframe.

# **Step 0 - Import and Install Libraries**

In [None]:
#do this if needed.
!pip install certifi

In [1]:
# urllib3 is a powerful, user-friendly HTTP client for Python
# to handle  data retrieval
import urllib3
from urllib3 import request

# to handle certificate verification
import certifi

# to manage json data
import json

# for pandas dataframes
import pandas as pd

# **Step 1 - Set Up Handler for Certificates and SSL Warnings**

In [2]:
# handle certificate verification and SSL warnings:
# reference https://urllib3.readthedocs.io/en/latest/user-guide.html#ssl

http = urllib3.PoolManager(
    cert_reqs='CERT_REQUIRED',
    ca_certs=certifi.where())

# Creating a PoolManager instance for sending requests with outout any verification or warning handlings
#http = urllib3.PoolManager()

In [3]:
type(http)

urllib3.poolmanager.PoolManager

# **Step 2 - Get Data from Web API with Requests**

In [4]:
# get data from the API; replace url with target source
url = 'https://maps2.dcgis.dc.gov/dcgis/rest/services/FEEDS/MPD/MapServer/2/query?where=1%3D1&outFields=*&outSR=4326&f=json'
#url ='https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Education_WebMercator/MapServer/5/query?where=1%3D1&outFields=*&outSR=4326&f=json'

r = http.request('GET', url)
r.status

200

# **Step 3 - Decode JSON Data to a Dict**

In [5]:
type(r.data.decode('utf-8'))

str

In [6]:
# decode json data/string into a Python dict object
data = json.loads(r.data.decode('utf-8'))

type(data)

dict

In [7]:
data

{'displayFieldName': 'CCN',
 'fieldAliases': {'CCN': 'CCN',
  'REPORT_DAT': 'REPORT_DATE',
  'SHIFT': 'SHIFT',
  'METHOD': 'METHOD',
  'OFFENSE': 'OFFENSE',
  'BLOCK': 'BLOCK',
  'XBLOCK': 'XBLOCK',
  'YBLOCK': 'YBLOCK',
  'WARD': 'WARD',
  'ANC': 'ANC',
  'DISTRICT': 'DISTRICT',
  'PSA': 'PSA',
  'NEIGHBORHOOD_CLUSTER': 'NEIGHBORHOOD_CLUSTER',
  'BLOCK_GROUP': 'BLOCK_GROUP',
  'CENSUS_TRACT': 'CENSUS_TRACT',
  'VOTING_PRECINCT': 'VOTING_PRECINCT',
  'LATITUDE': 'LATITUDE',
  'LONGITUDE': 'LONGITUDE',
  'BID': 'BID',
  'START_DATE': 'START_DATE',
  'END_DATE': 'END_DATE',
  'OBJECTID': 'OBJECTID',
  'OCTO_RECORD_ID': 'OCTO_RECORD_ID'},
 'geometryType': 'esriGeometryPoint',
 'spatialReference': {'wkid': 4326, 'latestWkid': 4326},
 'fields': [{'name': 'CCN',
   'type': 'esriFieldTypeString',
   'alias': 'CCN',
   'length': 8},
  {'name': 'REPORT_DAT',
   'type': 'esriFieldTypeDate',
   'alias': 'REPORT_DATE',
   'length': 8},
  {'name': 'SHIFT',
   'type': 'esriFieldTypeString',
   'alia

# **Step 4 - Normalize the data dict into a DataFrame**

In [8]:
# normalize the data dict and read it into a dataframe
# in this dataset, the data to extract is under 'features'

df = pd.json_normalize(data, 'features')

# print the first rows and header of the dataframe
df.head(10)

Unnamed: 0,attributes.CCN,attributes.REPORT_DAT,attributes.SHIFT,attributes.METHOD,attributes.OFFENSE,attributes.BLOCK,attributes.XBLOCK,attributes.YBLOCK,attributes.WARD,attributes.ANC,...,attributes.VOTING_PRECINCT,attributes.LATITUDE,attributes.LONGITUDE,attributes.BID,attributes.START_DATE,attributes.END_DATE,attributes.OBJECTID,attributes.OCTO_RECORD_ID,geometry.x,geometry.y
0,20118684,1597885268000,EVENING,OTHERS,MOTOR VEHICLE THEFT,2400 - 2499 BLOCK OF 1ST STREET NW,398946.0,139329.0,5,5E,...,Precinct 135,38.921833,-77.012154,,1597869016000,1597875000000.0,625579101,,-77.012157,38.921841
1,20106669,1595763025000,DAY,OTHERS,THEFT/OTHER,900 - 999 BLOCK OF L STREET NW,397831.0,137367.0,2,2G,...,Precinct 129,38.904156,-77.025006,,1595757880000,1595763000000.0,625579102,,-77.025008,38.904164
2,20167043,1606091986000,EVENING,OTHERS,MOTOR VEHICLE THEFT,1630 - 1699 BLOCK OF EUCLID STREET NW,396671.0,139479.0,1,1C,...,Precinct 24,38.923178,-77.03839,,1606090256000,,625579103,,-77.038392,38.923186
3,20127013,1599345840000,EVENING,OTHERS,MOTOR VEHICLE THEFT,1200 - 1399 BLOCK OF DELAWARE AVENUE SW,398747.0,134091.0,6,6D,...,Precinct 127,38.874647,-77.01444,,1598997653000,1599340000000.0,625579104,,-77.014442,38.874655
4,20163565,1605501012000,MIDNIGHT,OTHERS,MOTOR VEHICLE THEFT,1821 - 1899 BLOCK OF 3RD STREET NE,399824.0,138591.0,5,5F,...,Precinct 75,38.915185,-77.002029,,1605478555000,1605483000000.0,625579122,,-77.002032,38.915193
5,20087101,1591992620000,EVENING,OTHERS,THEFT/OTHER,1000 - 1099 BLOCK OF 18TH STREET NW,396383.0,137254.0,2,2C,...,Precinct 17,38.903134,-77.041699,GOLDEN TRIANGLE,1591983023000,1591991000000.0,625579141,,-77.041702,38.903142
6,20038338,1583265319000,DAY,OTHERS,THEFT/OTHER,1100 - 1129 BLOCK OF CONNECTICUT AVENUE NW,396504.0,137376.0,2,2C,...,Precinct 17,38.904233,-77.040305,GOLDEN TRIANGLE,1583262313000,1583263000000.0,625579157,,-77.040307,38.904241
7,20010763,1579374076000,DAY,OTHERS,THEFT F/AUTO,2200 - 2228 BLOCK OF MARTIN LUTHER KING JR AVE...,400867.0,132951.0,8,8A,...,Precinct 114,38.864378,-76.99001,ANACOSTIA,1579366870000,1579367000000.0,625579175,,-76.990012,38.864385
8,20051063,1585251851000,EVENING,KNIFE,ASSAULT W/DANGEROUS WEAPON,1500 - 1699 BLOCK OF ALABAMA AVENUE SE,401601.0,130970.0,8,8E,...,Precinct 120,38.846531,-76.981557,,1585237021000,,625579178,,-76.981559,38.846539
9,20097458,1593996814000,EVENING,OTHERS,THEFT/OTHER,1200 - 1299 BLOCK OF HALF STREET SE,399352.0,134192.0,8,8F,...,Precinct 131,38.875557,-77.007468,CAPITOL RIVERFRONT,1593992096000,1593997000000.0,625579215,,-77.00747,38.875565


In [9]:
df.shape

(1000, 25)