## eXtensible Markup Language XML

We will see that we have the  ability to exchange data between applications using HyperText
Transport Protocol (HTTP) and a way to represent complex data that we are sending
back and forth between these applications using eXtensible Markup Language
(XML) or JavaScript Object Notation (JSON).

XML looks very similar to HTML.

## Parsing XML

Often it is helpful to think of an XML document as a tree structure where there
is a top tag and other tags are drawn as children of their parent nodes.

In [16]:
#xml1.py

import xml.etree.ElementTree as ET

data = '''
<person>
  <name>Dr. Chuck</name>
  <phone type="intl"> +1 734 303 4456 </phone>
   <email hide="yes"> dr_chuck@umich.edu
   </email>
</person>'''

tree = ET.fromstring(data)   #convert from XML string into a tree of XML nodes
print('Name:', tree.find('name').text)  #search through tree and retrieve node that matches tag
print('Phone:', tree.find('phone').text)
print('email:', tree.find('email').text, 'hide:', tree.find('email').get('hide'))

Name: Dr. Chuck
Phone:  +1 734 303 4456 
email:  dr_chuck@umich.edu
    hide: yes


Using a loop to process all of the nodes in the XML tree.

In [1]:
#xml2.py

import xml.etree.ElementTree as ET
#https://docs.python.org/3/library/xml.etree.elementtree.html

input = '''
<stuff>
    <users>
        <user x="2">
            <id>001</id>
            <name>Chuck</name>
        </user>
        <user x="7">
            <id>009</id>
            <name>Brent</name>
        </user>
    </users>
</stuff>'''

stuff = ET.fromstring(input)
lst = stuff.findall('users/user')  #retrieve a Python list of subtrees
print('User count:', len(lst))

# loop through all of the nodes
for item in lst:
    print('Name', item.find('name').text)
    print('Id', item.find('id').text)
    print('Attribute', item.get("x"))


User count: 2
Name Chuck
Id 001
Attribute 2
Name Brent
Id 009
Attribute 7


## JavaScript Object Notation - JSON

Python’s syntax for dictionaries and lists influenced the syntax of JSON. So the format of
JSON is nearly identical to a combination of Python lists and dictionaries.

JSON is more structured than HTML or XML so we do not need to represent it in Python using a data type that maintains order.  Typically we use a Python Dictionary.

## Parsing JSON

In [17]:
#json2.py

import json

# define a list of multiple dictionaries
data = '''
[
  { "id" : "001",
    "x" : "2",
    "name" : "Chuck"
  } ,
  { "id" : "009",
    "x" : "7",
    "name" : "Chuck"
  }
]'''

#use JSON parser to return list
info = json.loads(data) #returns a Python list where each element is a dictionary
print('User count:', len(info))

#iterate through each list element (ie. visit each contained dictionary)
for item in info:
    print('Name', item['name'])
    print('Id', item['id'])
    print('Attribute', item['x'])


User count: 2
Name Chuck
Id 001
Attribute 2
Name Chuck
Id 009
Attribute 7


## Application Programming Interfaces -- APIs

When we use an API, generally one program makes a set of services available for use by other applications and publishes the APIs (i.e., the “rules”) that must be followed to access the services provided by the program.

It is quite common that you need some kind of “API key” to make use of a vendor’s
API. The general idea is that they want to know who is using their services and
how much each user is using. Perhaps they have free and pay tiers of their services
or have a policy that limits the number of requests that a single individual can
make during a particular time period.

Since the JSON becomes a set of nested Python lists and dictionaries, we can use a
combination of the index operation and for loops to wander through the returned
data structures with very little Python code.


In [1]:
#whois.py

import urllib.parse, http.client

myKey = "9D0IASBR8KNBOKME5MNHSKOQB6FZ7ZNT"
domainName = input("Domain Name: ")

p = { 'key': myKey, 'domain': domainName, 'format': 'json' }

conn = http.client.HTTPSConnection("api.ip2whois.com")
conn.request("GET", "/v2?" + urllib.parse.urlencode(p))
res = conn.getresponse() #dictionary returned as a string
data = eval(res.read().decode())  #so we convert it to a real Python dictionary

#parse through returned json data
for s in list(data):
    if data.get(s) == '':
        data[s] = "n/a"
    if type(data[s]) is dict:
        d = data[s]
        for w in data[s]:
            if d[w] == '':
                d[w] = "n/a"
     
    #format output according to data type           
    if type(data.get(s)) != str:
        print (f"{s:<20}")
        if type(data[s]) is dict:
            for kv in data.get(s):
                val = data.get(s)[kv]
                print (f"{kv:>30} {val}")
        elif type(data[s]) is list:
            if len(data[s]) == 0:
                data[s] = ["n/a"]
            print("%30s" %data[s])
        else:
            print(f"{data[s]:>30}")
    else:
        print (f"{s:<20} {data.get(s):<20}")



Domain Name:  frontrange.edu


domain               frontrange.edu      
domain_id            n/a                 
status               registered          
create_date          2002-03-04T00:00:00+00:00
update_date          2022-03-25T00:00:00+00:00
expire_date          2022-07-31T00:00:00+00:00
domain_age          
                          7394
whois_server         n/a                 
registrar           
                       iana_id n/a
                          name n/a
                           url n/a
registrant          
                          name n/a
                  organization Front Range Community College
                street_address 3645 West 112th Avenue
                          city Westminster
                        region CO
                      zip_code 80030
                       country USA
                         phone n/a
                           fax n/a
                         email n/a
admin               
                          name Shaun Crowley Front Range Community 