## JSON

https://docs.python.org/3.6/library/json.html

JSON stand for JavaScript Object Notation. It came after XML and was meant to streamline many data transportation issues at the time. It is now the common standard amongst data transfers on the web and has numerous parsing packages for numerous languages (including Python)!

    import json
    f = open('nyc_2001_campaign_finance.json')
    data = json.load(f)
    #root data type
    type(data)
    #Navigate to the 'data' key and find data type
    type(data['data'])
    #Preview the first entry
    data['data'][0]
    #Preview the Entry under meta -> view -> columns
    data['meta']['view']['columns']
    #Create a DataFrame from your json data
    df = pd.DataFrame(data['data'])
    cols = [i['name'] for i in data['meta']['view']['columns']]
    df.columns = cols
    df.head()
    
#### Reading a JSON schema

https://developer.nytimes.com/article_search_v2.json#/Documentation/GET/articlesearch.json

##### Loading Specific Data
    docs = data['response']['docs']
    print(type(docs), len(docs))
    for doc in docs:
        print(doc['headline']['main'])
        print('\n')

##### Breaking out nested data in a loaded json dataframe
    keys = df.headline.iloc[0].keys() #Get dictionary keys
    #Keep track of columns we make for subsequent preview
    new_cols = []
    #Create a new feature for each of these keys
    for key in keys:
        new_col = 'headline_{}'.format(key) #Create new column name
        df[new_col] = df.headline.map(lambda x: x[key]) #Create a new column
        new_cols.append(new_col)
    df[new_cols].head()
    
##### Outputing to JSON
    with open('output.json', 'w') as f:
        json.dump(data, f)

## XML

##### element tree for parsing xml files
https://docs.python.org/3.6/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

XML stands for 'Extensible Markup Language'. You may note the acronym's similarity to HTML; HyperText Markup Language. While HTML tells us how to display a page, XML is used to store the data and content of the page itself. Like HTML, xml uses tags to seperate and organize data in a hierarchical manner.

    import xml.etree.ElementTree as ET
    #Create an XML tree and retrieve the root tag
    tree = ET.parse('nyc_2001_campaign_finance.xml')
    root = tree.getroot()
    #direct descendents of the root tag
    count = 0 
    for child in root:
        count += 1
    print(count)
    #different types of tags are there within the entire XML file
    tags = []
    for element in root.iter():
        tags.append(element.tag)
    print(len(set(tags)))
    #Create a DataFrame listing the number of each type of tag
    import pandas as pd
    tags = {}
    for element in root.iter():
        tags[element.tag] = tags.get(element.tag, 0) + 1
    df = pd.DataFrame.from_dict(tags, orient='index')
    df.columns = ['count']
    df = df.sort_values(by='count', ascending=False)
    df.head()