# What is JSON?

JSON is also a data Serialization/De-serialization format for data just like XML.

One advantage however that JSON has over XML is that it is simpiler in situations where complexity of the program
is low

The notation is based on the literal object notation from the Javascript programming language

In JSON data is basically represented in Nested lists (or dictionaries) vs. say a mark-up like XML

In [1]:
## Here is an example at work of Reading in the JSON file
import json # We have to import the json package

# Here is the example JSON data format 
# its like a dictionary notice the {} brackets
data = '''
{
  "name" : "Chuck",
  "phone" : {
    "type" : "intl",
    "number" : "+1 734 303 4456"
   },
   "email" : {
     "hide" : "yes"
   }
}'''

info = json.loads(data) # using the loads() function in json lib to read in the data into variable info
    # The nice part, what we get back when we read in this data is that info is a python 'dictionary'
# Note. the python dictonary syntax is used to print the data below
print('Name:', info["name"]) 
print('Hide:', info["email"]["hide"]) # here a dictonary key 'email' is further referenced with key 'hide'
    # remember since info["email"] is a dictionary we reference that with a key item


Name: Chuck
Hide: yes


In [2]:
# This next example shows how extract JSON data when that is a list
import json

# Now this data is like a list in JSON
# its like a list notice the [] brackets
data = '''
[
  { "id" : "001",
    "x" : "2",
    "name" : "Chuck"
  } ,
  { "id" : "009",
    "x" : "7",
    "name" : "Brent"
  }
]'''

info = json.loads(data)
print(type(info))
print('User count:', len(info))

# Basically now that we have a list in python we can for-loop right through it
for item in info:
    print('Name', item['name'])
    print('Id', item['id'])
    print('Attribute', item['x'])


<class 'list'>
User count: 2
Name Chuck
Id 001
Attribute 2
Name Brent
Id 009
Attribute 7


# Idea of Service Oriented Approach

Most applications use a form of web-service

An API are essentially the published "rules" applications must follow to make use of the "services"

Here is an example of code that does an API request to Googles geocoded API. Below is the API syntax for the requested data

https://maps.googleapis.com/maps/api/geocode/json?address=Ann+Arbor%2C+MI&key=AIzaSyBxRzMh7hZA5MEh2VYtbC_O6D3hUw6ibUY
* The &key part is the API key provided by google cloud 
* after json? we write the components of the request that we want
* '+' means spaces
* % means commas


Below is the data from the API request above

In [3]:
# This code below is used to get the request
import urllib.request, urllib.parse, urllib.error
import json
import ssl

api_key = 'AIzaSyBxRzMh7hZA5MEh2VYtbC_O6D3hUw6ibUY' # Note. here I put my Google Maps API key
# If you have a Google Places API key, enter it here
# api_key = 'AIzaSy___IDByT70'
# https://developers.google.com/maps/documentation/geocoding/intro

if api_key is False:
    api_key = 42
    serviceurl = 'http://py4e-data.dr-chuck.net/json?'
else :
    serviceurl = 'https://maps.googleapis.com/maps/api/geocode/json?'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

while True:
    address = input('Enter location: ')
    if len(address) < 1: break

    parms = dict()
    parms['address'] = address
    if api_key is not False: parms['key'] = api_key
    url = serviceurl + urllib.parse.urlencode(parms) # concatenates the serviceurl with the (input + api key)

    print('Retrieving', url)
    uh = urllib.request.urlopen(url, context=ctx)
    data = uh.read().decode()
    print('Retrieved', len(data), 'characters')

    try:
        js = json.loads(data)
    except:
        js = None

    if not js or 'status' not in js or js['status'] != 'OK':
        print('==== Failure To Retrieve ====')
        print(data)
        continue

    print(json.dumps(js, indent=4))

    lat = js['results'][0]['geometry']['location']['lat']
    lng = js['results'][0]['geometry']['location']['lng']
    print('lat', lat, 'lng', lng)
    location = js['results'][0]['formatted_address']
    print(location)


Enter location: Brandon, FL
Retrieving https://maps.googleapis.com/maps/api/geocode/json?address=Brandon%2C+FL&key=AIzaSyBxRzMh7hZA5MEh2VYtbC_O6D3hUw6ibUY
Retrieved 1734 characters
{
    "results": [
        {
            "address_components": [
                {
                    "long_name": "Brandon",
                    "short_name": "Brandon",
                    "types": [
                        "locality",
                        "political"
                    ]
                },
                {
                    "long_name": "Hillsborough County",
                    "short_name": "Hillsborough County",
                    "types": [
                        "administrative_area_level_2",
                        "political"
                    ]
                },
                {
                    "long_name": "Florida",
                    "short_name": "FL",
                    "types": [
                        "administrative_area_level_1",
                     

In [4]:
# Lets change up the output that we fetched
import urllib.request, urllib.parse, urllib.error
import json
import ssl

api_key = 'AIzaSyBxRzMh7hZA5MEh2VYtbC_O6D3hUw6ibUY' # Note. here I put my Google Maps API key
# If you have a Google Places API key, enter it here
# api_key = 'AIzaSy___IDByT70'
# https://developers.google.com/maps/documentation/geocoding/intro

if api_key is False:
    api_key = 42
    serviceurl = 'http://py4e-data.dr-chuck.net/json?'
else :
    serviceurl = 'https://maps.googleapis.com/maps/api/geocode/json?'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

while True:
    address = input('Enter location: ')
    if len(address) < 1: break

    parms = dict()
    parms['address'] = address
    if api_key is not False: parms['key'] = api_key
    url = serviceurl + urllib.parse.urlencode(parms) # concatenates the serviceurl with the (input + api key)

    print('Retrieving', url)
    uh = urllib.request.urlopen(url, context=ctx)
    data = uh.read().decode()
    print('Retrieved', len(data), 'characters')

    try:
        js = json.loads(data)
    except:
        js = None

    if not js or 'status' not in js or js['status'] != 'OK':
        print('==== Failure To Retrieve ====')
        print(data)
        continue

    print(json.dumps(js, indent=4))
    
    county = js["results"][0]["address_components"]
    print(county)
    #lat = js['results'][0]['geometry']['location']['lat']
    #lng = js['results'][0]['geometry']['location']['lng']
    #print('lat', lat, 'lng', lng)
    #location = js['results'][0]['formatted_address']
    #print(location)


Enter location: Tampa, FL
Retrieving https://maps.googleapis.com/maps/api/geocode/json?address=Tampa%2C+FL&key=AIzaSyBxRzMh7hZA5MEh2VYtbC_O6D3hUw6ibUY
Retrieved 1733 characters
{
    "results": [
        {
            "address_components": [
                {
                    "long_name": "Tampa",
                    "short_name": "Tampa",
                    "types": [
                        "locality",
                        "political"
                    ]
                },
                {
                    "long_name": "Hillsborough County",
                    "short_name": "Hillsborough County",
                    "types": [
                        "administrative_area_level_2",
                        "political"
                    ]
                },
                {
                    "long_name": "Florida",
                    "short_name": "FL",
                    "types": [
                        "administrative_area_level_1",
                        "poli

In [5]:
# Now lets look at an example of abstracting data from a Twitter API

# Twitter API request Program

In [6]:
# Import Build for running the program
## remember twurl.py is a program important to this code
## remember hidden.py is a program important to this code 
import urllib.request, urllib.parse, urllib.error
from twurl import augment
import ssl
import oauth

In [7]:
# Maine function call to connect to url
print('* Calling Twitter...')
url = augment('https://api.twitter.com/1.1/statuses/user_timeline.json',
              {'screen_name': 'TheNewStat1', 'count': '2'})
print(url)

* Calling Twitter...
https://api.twitter.com/1.1/statuses/user_timeline.json?oauth_consumer_key=vNFR5bFTCJlQ5h9UmjB6MxSYu&oauth_timestamp=1587088504&oauth_nonce=79848521&oauth_version=1.0&screen_name=TheNewStat1&count=2&oauth_token=1159864470103044096-KwPPomJUxFKje19QO0Ox29qsjEGPB5&oauth_signature_method=HMAC-SHA1&oauth_signature=UJ7sUvzj%2BBo5sQySQQFmyjGRWww%3D


In [8]:
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# Make the data connection and open url
connection = urllib.request.urlopen(url, context=ctx)
data = connection.read()
print(data)

b'[{"created_at":"Thu Apr 16 14:01:14 +0000 2020","id":1250786332391231491,"id_str":"1250786332391231491","text":"The need for Machine Learning your Biz! #DataScience #MachineLearning #BigData #Statistics #Rprogramming\\u2026 https:\\/\\/t.co\\/4br1LABo7t","truncated":true,"entities":{"hashtags":[{"text":"DataScience","indices":[40,52]},{"text":"MachineLearning","indices":[53,69]},{"text":"BigData","indices":[70,78]},{"text":"Statistics","indices":[79,90]},{"text":"Rprogramming","indices":[91,104]}],"symbols":[],"user_mentions":[],"urls":[{"url":"https:\\/\\/t.co\\/4br1LABo7t","expanded_url":"https:\\/\\/twitter.com\\/i\\/web\\/status\\/1250786332391231491","display_url":"twitter.com\\/i\\/web\\/status\\/1\\u2026","indices":[106,129]}]},"source":"\\u003ca href=\\"https:\\/\\/www.hootsuite.com\\" rel=\\"nofollow\\"\\u003eHootsuite Inc.\\u003c\\/a\\u003e","in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_s

In [9]:
print ('======================================')
headers = dict(connection.getheaders())
print(headers)

{'cache-control': 'no-cache, no-store, must-revalidate, pre-check=0, post-check=0', 'connection': 'close', 'content-disposition': 'attachment; filename=json.json', 'content-length': '5652', 'content-type': 'application/json;charset=utf-8', 'date': 'Fri, 17 Apr 2020 01:55:09 GMT', 'expires': 'Tue, 31 Mar 1981 05:00:00 GMT', 'last-modified': 'Fri, 17 Apr 2020 01:55:09 GMT', 'pragma': 'no-cache', 'server': 'tsa_b', 'set-cookie': 'guest_id=v1%3A158708850956855931; Max-Age=63072000; Expires=Sun, 17 Apr 2022 01:55:09 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None', 'status': '200 OK', 'strict-transport-security': 'max-age=631138519', 'x-access-level': 'read-write', 'x-app-rate-limit-limit': '100000', 'x-app-rate-limit-remaining': '99999', 'x-app-rate-limit-reset': '1587174909', 'x-connection-hash': '5cd707837806f17fd70bd4b530bd6910', 'x-content-type-options': 'nosniff', 'x-frame-options': 'SAMEORIGIN', 'x-rate-limit-limit': '900', 'x-rate-limit-remaining': '899', 'x-rate-limit-reset

This next program is apart of twitter1.py example (The program also works)

In [11]:
# Import build
import urllib.request, urllib.parse, urllib.error
import twurl
import ssl

In [12]:
# Establish twitter url for API
TWITTER_URL = 'https://api.twitter.com/1.1/statuses/user_timeline.json'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

In [13]:
# Loop for absracting the information
while True:
    print('')
    acct = input('Enter Twitter Account:')
    if (len(acct) < 1): break
    url = twurl.augment(TWITTER_URL,
                        {'screen_name': acct, 'count': '2'})
    print('Retrieving', url)
    connection = urllib.request.urlopen(url, context=ctx)
    data = connection.read().decode()
    print(data[:250])
    headers = dict(connection.getheaders())
    # print headers
    print('Remaining', headers['x-rate-limit-remaining'])


Enter Twitter Account:TheNewStat1
Retrieving https://api.twitter.com/1.1/statuses/user_timeline.json?oauth_consumer_key=vNFR5bFTCJlQ5h9UmjB6MxSYu&oauth_timestamp=1587088560&oauth_nonce=06323729&oauth_version=1.0&screen_name=TheNewStat1&count=2&oauth_token=1159864470103044096-KwPPomJUxFKje19QO0Ox29qsjEGPB5&oauth_signature_method=HMAC-SHA1&oauth_signature=cRln78Sk3CS06IwGPrdjoaQuGs4%3D
[{"created_at":"Thu Apr 16 14:01:14 +0000 2020","id":1250786332391231491,"id_str":"1250786332391231491","text":"The need for Machine Learning your Biz! #DataScience #MachineLearning #BigData #Statistics #Rprogramming\u2026 https:\/\/t.co\/4br1LABo7t"
Remaining 898

Enter Twitter Account:


# Pay RatesBehind API Usage

Bottom Line:
* The computing resources to run say Google's API is "not" free
* Data provided by these API's is valuable
* Data providors can limit the number of API requests per day
* Data providors can demand an "API" key
* Data providors can charge for usage
* Of course the data providors are big bussiness that can change the rules along the way

For Twitter for example you can use the API for free but must authenticate (get an account with them)

In [15]:
### Here is an example of pulling data from a twitter API

# Here are the libraries we need to import
import urllib.request, urllib.parse, urllib.error
import twurl
import json
import ssl

# https://apps.twitter.com/
# Create App and get the four strings, put them in hidden.py

# The twitter API url
TWITTER_URL = 'https://api.twitter.com/1.1/friends/list.json'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

while True:
    print('')
    acct = input('Enter Twitter Account:') # Enter the account name
    if (len(acct) < 1): break
    url = twurl.augment(TWITTER_URL,
                        {'screen_name': acct, 'count': '5'})  # gives us the workable encoded url for twitter APIs
    print('Retrieving', url) # prints the url
    connection = urllib.request.urlopen(url, context=ctx) # handle connection
    data = connection.read().decode() # decode to a string of the json

    js = json.loads(data)
    print(json.dumps(js, indent=2)) # view of the cleaned up fetched JSON

    headers = dict(connection.getheaders()) # Gives us the header of the JSON
    print('Remaining', headers['x-rate-limit-remaining']) # Gives the remaining number of allowed requests
    
    # Loop through the JSON to get information from the account
    for u in js['users']: 
        print(u['screen_name'])
        if 'status' not in u:
            print('   * No status found')
            continue
        s = u['status']['text']
        print('  ', s[:50])



Enter Twitter Account:Mansi11424
Retrieving https://api.twitter.com/1.1/friends/list.json?oauth_consumer_key=vNFR5bFTCJlQ5h9UmjB6MxSYu&oauth_timestamp=1587088676&oauth_nonce=51682768&oauth_version=1.0&screen_name=Mansi11424&count=5&oauth_token=1159864470103044096-KwPPomJUxFKje19QO0Ox29qsjEGPB5&oauth_signature_method=HMAC-SHA1&oauth_signature=MPXCuGU1gNNPQhvv%2BZH4cym1YKw%3D
{
  "users": [
    {
      "id": 1059470996292464640,
      "id_str": "1059470996292464640",
      "name": "Arwa",
      "screen_name": "tushanlu",
      "location": "",
      "description": "LUHAN...... suffering from success",
      "url": "https://t.co/iGnXuEtXI7",
      "entities": {
        "url": {
          "urls": [
            {
              "url": "https://t.co/iGnXuEtXI7",
              "expanded_url": "https://youtu.be/5YX1UWHOChk",
              "display_url": "youtu.be/5YX1UWHOChk",
              "indices": [
                0,
                23
              ]
            }
          ]
        },
 

Enter Twitter Account:
