# JSON-XML

**Objectives:** 

* Understand the structural differences between JSON, XML and Python 
* Manipulate some JSON and XML data in jupyter notebooks with pandas 


## JSON - Java Script Object Notation 

*  JSON is simply a way of representing data independent of a platform — this means that it is something like a PDF or a txt file 

* Proponents say, JSON files take up less space, are faster(more easily parsed by web-browsers) and easier to work with than XML files

                        JSON VS. PYTHON 

|JavaScript|JS Example               |Python Equivalent|Python Example           |
|----------|-------------------------|-----------------|-------------------------|
|Objects   |`{'key0':'v12','v':1}`   |Dictionaries     |`{'key0':'v12','v':1}`   |
|Arrays    |`[1, 'one', 'two', 3, 5]`|Lists            |`[1, 'one', 'two', 3, 5]`|
|Strings   |`'One'` or `"Two"`       |Strings          |`'One'` or `"Two"`       |
|Numbers   |`1234.5` or `8675309`    |Numbers          |`1234.5` or `8675309`    |  
|Boolean   |`true` or `false`        |Boolean          |`True` or `False`        |
|Null Value|`null`                   |None Value       |`None`                   |

In [1]:
#example 
gattaca = open("gattaca.txt")
print(gattaca.read())

{
	“title” : “Gattaca”,
	“release_year” : 1997,
	“good_reviews” : true,
	“won_oscar” : false,
	“actors” : [“Ethan Hawke”, “Uma Thurman”, “Alan Arkin”,
	 	    “Loren Dean”], 
	“budget” : null,
	“credits” : {
		“director” : “Andrew Niccol”, 
		“writer” : “Andrew Niccol”, 
		“composer” : “Michael Nyman”
		}
}


### How would this look in XML?

* The increased size has to do with the end tags repeating the opening tags

```XML
<movie>
    <title>Gattaca</title>
    <release_year>1997</release_year>
    <good_reviews>1</good_reviews>
    <won_oscar>0</won_oscar>
    <actors>
        <actor>Ethan Hawke</actor>
        <actor>Uma Thurman</actor>
        <actor>Jude Law</actor>
        <actor>Loren Dean</actor>
    </actors>
    <budget/>
    <credits>
        <director>Andrew Niccol</director>
        <writer>Andrew Niccol</writer>
        <composer>Michael Nyman</composer>
    </credits>
</movie>
```

* XML is a software/hardware independent tool used to transport and store data. 

## Reading JSON files

In [1]:
#Import JSON library. Also, import Pandas.
import json 
import pandas as pd 

dir(json) - shows us the methods. json.load(f) allows you to load data from a file like object. Loads allows you to load data from a string hence the s. 

Open function - 1st argument is the path to the file. 2nd argument is optional string argument **shift tab to view method** 

In [2]:
#Load JSON from file. 
#This file contains song data from Spotify on Old Town Road by Lil Nas X.
with open("old_town_road.json", "r") as read_file:
    data = json.load(read_file)

In [6]:
#Print out data.
print(json.dumps(data, indent=2))

{
  "meta": {
    "analyzer_version": "4.0.0",
    "platform": "Linux",
    "detailed_status": "OK",
    "status_code": 0,
    "timestamp": 1569910490,
    "analysis_time": 4.67758,
    "input_process": "libvorbisfile L+R 44100->22050"
  },
  "track": {
    "num_samples": 3463320,
    "duration": 157.06667,
    "sample_md5": "",
    "offset_seconds": 0,
    "window_seconds": 0,
    "analysis_sample_rate": 22050,
    "analysis_channels": 1,
    "end_of_fade_in": 0,
    "start_of_fade_out": 138.75084,
    "loudness": -5.56,
    "tempo": 136.041,
    "tempo_confidence": 0.559,
    "time_signature": 4,
    "time_signature_confidence": 1,
    "key": 6,
    "key_confidence": 0.756,
    "mode": 1,
    "mode_confidence": 0.651,
    "codestring": "eJxVmtlh3TAMBFtRCbyP_hvLzOrZz_lJTFqicC4WoGsvZ9d2z1Oesea6c47znN6eMeots-311HX5p5cxx93taZW9Ns7q95bLT_x2j75Or_zU5m1Pb_scVvPpnWXbu47RVn36OJy-6-qb855-F8s6b52L7_fZWJ11VtnPUIjb-tlzlPnMplzr1tbrLM9SiN6RoZfdn13KUf5zV-OQfZBz3zNa7ZUzN59BgVN69dB6xlNLH6X1xct38mteQWP

In [7]:
#Check type of data, get keys.
type(data)

dict

In [8]:
#Explore 'track' by printing it out, checking its type and keys.
data.keys()

dict_keys(['meta', 'track', 'bars', 'beats', 'tatums', 'sections', 'segments'])

In [9]:
data['track']

{'num_samples': 3463320,
 'duration': 157.06667,
 'sample_md5': '',
 'offset_seconds': 0,
 'window_seconds': 0,
 'analysis_sample_rate': 22050,
 'analysis_channels': 1,
 'end_of_fade_in': 0,
 'start_of_fade_out': 138.75084,
 'loudness': -5.56,
 'tempo': 136.041,
 'tempo_confidence': 0.559,
 'time_signature': 4,
 'time_signature_confidence': 1,
 'key': 6,
 'key_confidence': 0.756,
 'mode': 1,
 'mode_confidence': 0.651,
 'codestring': 'eJxVmtlh3TAMBFtRCbyP_hvLzOrZz_lJTFqicC4WoGsvZ9d2z1Oesea6c47znN6eMeots-311HX5p5cxx93taZW9Ns7q95bLT_x2j75Or_zU5m1Pb_scVvPpnWXbu47RVn36OJy-6-qb855-F8s6b52L7_fZWJ11VtnPUIjb-tlzlPnMplzr1tbrLM9SiN6RoZfdn13KUf5zV-OQfZBz3zNa7ZUzN59BgVN69dB6xlNLH6X1xct38mteQWPEfu7qEXlyNF-qRQXbQIM-F4Yoysp7HD61Th1tIHa_G3uwvCg2l5_um8-2svgaBljn7M268Q3W5cze_f3m-YYsGHNf1sfnz7n7YDzEXqw5DS_VUp86CoYYfczWGrZFrDzfZkFDlnfweGmLLxSkm_31B269Y7JedT3nNmzlQZX_jm9tXN_4aRW073eOeUflc2uc8mgmhFc9pCloWy6euAmN8qi4Txf1wvz1cBZy8rmtbW_ll_VqBpXZOK3P7tdPQRpk4yt1VtVWuUE4Faz31Fs1zsGT_VY-d7sy1jE31uG8G-lLX3oJQVtpc7

In [10]:
data['track'].keys()

dict_keys(['num_samples', 'duration', 'sample_md5', 'offset_seconds', 'window_seconds', 'analysis_sample_rate', 'analysis_channels', 'end_of_fade_in', 'start_of_fade_out', 'loudness', 'tempo', 'tempo_confidence', 'time_signature', 'time_signature_confidence', 'key', 'key_confidence', 'mode', 'mode_confidence', 'codestring', 'code_version', 'echoprintstring', 'echoprint_version', 'synchstring', 'synch_version', 'rhythmstring', 'rhythm_version'])

In [11]:
#What key is the song in?
print('Song key:', data['track']['key'])

Song key: 6


| ID | Key   |
|------|------|
| 0 | C |
1 |	C♯ 
2  | D 
3  | D♯
4  | E 
5  | F 
6  | F♯
7  | G
8  | G♯
9  | A
10 | A♯	
11 | B 	

In [14]:
#How long is the song?
print('Song length:', data['track']['duration'])

Song length: 157.06667


In [12]:
#Load the 'track' data into a DataFrame. Talk about Orient/Index
df = pd.DataFrame.from_dict(data['track'])

In [13]:
df.head()

Unnamed: 0,values
num_samples,3463320.0
duration,157.067
sample_md5,
offset_seconds,0.0
window_seconds,0.0


## Writing JSON

In [15]:
#Create a JSON string with a couple different songs and some info about them (artist, title, genres).

json_string = """
{
    "songs": [
    {
        "title": "Old Town Road",
        "artist": "Lil Nas X",
        "genres": [
                "rap",
                "country"
        ]
    },
    {
        "title": "Torn",
        "artist": "Ava Max",
        "genres": [
                "pop",
                "dance"
        ]
    },
    {
        "title": "Bad Guy",
        "artist": "Billie Eilish",
        "genres": [
                "alternative",
                "pop"
        ]
    }
    ]
}
"""

In [16]:
#Load the JSON string.
songs = json.loads(json_string)
songs

{'songs': [{'title': 'Old Town Road',
   'artist': 'Lil Nas X',
   'genres': ['rap', 'country']},
  {'title': 'Torn', 'artist': 'Ava Max', 'genres': ['pop', 'dance']},
  {'title': 'Bad Guy',
   'artist': 'Billie Eilish',
   'genres': ['alternative', 'pop']}]}

In [17]:
#Save the JSON to file.
with open('songs.json', 'w') as file:
    json.dump(songs, file)

In [18]:
#Load the JSON from file into a Pandas DataFrame.
with open("songs.json", "r") as read_file:
    songs_data = json.load(read_file)
songs_df=pd.DataFrame(songs_data['songs'])
songs_df.set_index('title')

Unnamed: 0_level_0,artist,genres
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Old Town Road,Lil Nas X,"[rap, country]"
Torn,Ava Max,"[pop, dance]"
Bad Guy,Billie Eilish,"[alternative, pop]"


## Reading XML

In [19]:
#Import ElementTree
import xml.etree.ElementTree as ET


In [20]:
#Parse the 'playlist.xml' file and get the root.
tree = ET.parse('playlist.xml')
root = tree.getroot()

In [21]:
#Print out the children.
root.getchildren()

[<Element 'track' at 0x11c4accc8>,
 <Element 'track' at 0x11c4acef8>,
 <Element 'track' at 0x11c4b11d8>,
 <Element 'track' at 0x11c4b1458>,
 <Element 'track' at 0x11c4b16d8>,
 <Element 'track' at 0x11c4b1958>,
 <Element 'track' at 0x11c4b1c28>,
 <Element 'track' at 0x11c4b1ef8>,
 <Element 'track' at 0x11c4b21d8>,
 <Element 'track' at 0x11c4b24a8>,
 <Element 'track' at 0x11c4b2728>,
 <Element 'track' at 0x11c4b29a8>,
 <Element 'track' at 0x11c4b2c78>,
 <Element 'track' at 0x11c4b2ef8>,
 <Element 'track' at 0x11c4b41d8>,
 <Element 'track' at 0x11c4b4458>,
 <Element 'track' at 0x11c4b46d8>,
 <Element 'track' at 0x11c4b4958>,
 <Element 'track' at 0x11c4b4bd8>,
 <Element 'track' at 0x11c4b4e58>,
 <Element 'track' at 0x11c4b7138>,
 <Element 'track' at 0x11c4908b8>,
 <Element 'track' at 0x11c4b72c8>,
 <Element 'track' at 0x11c4b7548>,
 <Element 'track' at 0x11c4b77c8>,
 <Element 'track' at 0x11c4b7a48>,
 <Element 'track' at 0x11c4b7cc8>,
 <Element 'track' at 0x11c4b7f98>,
 <Element 'track' at

In [22]:
#Print out the children of one of the children.
root[0].getchildren()

[<Element 'id' at 0x11c4acc78>,
 <Element 'title' at 0x11c4acd18>,
 <Element 'artist' at 0x11c4acd68>,
 <Element 'album' at 0x11c4acdb8>,
 <Element 'isrc' at 0x11c4ace08>,
 <Element 'addedDate' at 0x11c4ace58>,
 <Element 'addedBy' at 0x11c4acea8>]

In [23]:
#Print out the tag and text of each of the nodes of the first track.
for node in root[0].getchildren():
    print(node.tag)
    print(node.text)
    print(' ')

id
3JZctpS8BU9gWlIhp4mVMF
 
title
Night
 
artist
Jasper Byrne
 
album
Night
 
isrc
GBKQU1903669
 
addedDate
2019-09-29 00:02
 
addedBy
colortheory
 


In [24]:
#Convert the XML to a DataFrame.
data = []
columns = [x.tag for x in root[0].getchildren()]
for child in root.getchildren():
    entry = []
    for node in child.getchildren():
        entry.append(node.text)
    data.append(entry)    

In [25]:
df = pd.DataFrame(data, columns=columns)
df.head()

Unnamed: 0,id,title,artist,album,isrc,addedDate,addedBy
0,3JZctpS8BU9gWlIhp4mVMF,Night,Jasper Byrne,Night,GBKQU1903669,2019-09-29 00:02,colortheory
1,6Xh00IXFAeTkHRVMfLXcAX,Final Straw,Siamese Youth,Electric Dreams,TCAEH1964680,2019-09-29 00:03,colortheory
2,1oZTt7uqD3XhRLgVbz3SWn,Fires in the Snow,ZETA,Zeta,UK2D51700064,2019-09-29 00:04,colortheory
3,3BPQBRJn4sL6pUpBbOKXOt,Planes of Mind,"Robots With Rayguns, Thought Beings",C U L T P O P,QZHN41973533,2019-09-28 03:33,colortheory
4,0TV0grlrGvXok97rYGh37P,Motorway (Radio Edit),From Apes to Angels,Motorway (Radio Edit),ushm81982193,2019-09-25 23:01,colortheory


## <center>Activity
    
**Using the 'old_town_road.json' and 'playlist.xml' files and what you have learned above, do the following:**


In [None]:
#Create a DataFrame of the different segments of Old Town Road

In [None]:
#Find the longest segment and the loudest segment.

In [None]:
#How many bars are in the song?

In [None]:
#What's the average length of a bar?

In [None]:
#Extract the 7th song from the playlist in 'playlist.xml',
#compose a JSON string for the song and save it as a new JSON file.