____________________
# Music 255: XML Encoding and Beautiful Soup

In this homework assignment, you will get an opportunity to engage, explore, and experiment with **XML** files through Python and **Beautiful Soup**. As mentioned in class, **XML** stands for **eXtensible Markup Language** and mainly serves to transport and store data. At its core, XML was designed to be **both machine- and human-readable**. In case of music files, the common encoding standard is **MEI**, which stands for **Music Encoding Initiative** and is built on top of the XML framework.


Mainly, this homework focuses on:

* **access**: downloading, storing, reading, processing XML files
* **analysis**: performing basic quantitative and qualitative analysis of XML files
* **interpretation**: exploring the meaning and utility of XML files from both analytical and creative perspective 

As we deal with these files, we will make use of a Beautiful Soup – a Python library that makes interacting with XML and HTML friendly and comfortable. Or at least tries to... You can access Beautiful Soup's documentation [here](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).

XML files are guided by **markup rules**, which you can read more about [here](https://www.w3schools.com/xml/xml_syntax.asp) and consist of **elements**, which you can dive into [here](https://www.w3schools.com/xml/xml_elements.asp).

MEI files are primarily XML files that have music-specific sections, elements, and structure. You can learn more about MEI [here](https://music-encoding.org/).

----

#### Setup: Importing Python Libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import os
from bs4 import BeautifulSoup as bs
import optparse
import sys
from pathlib import Path
import requests
from lxml import etree
import re
from collections import Counter

____________________
# Part 1: Import Data

As your first and perhaps most important step, you'll have to go find some data. For the purposes of this homework assignment, we will work with *these files*, which you can access *here*. Once you've picked your file and copied its URL, use the function provided below to **import it as a JSON object**:

In [2]:
# function to extract xml document from given url
def getXML(url):
    # request for HTML document of given url
    response = requests.get(url)    
    # response will be provided in JSON format
    return response.text

In the cell below, **read in your file** and store it in the xml_file variable:

In [None]:
# xml_document = ...

Once you've imported a file, you should be able to **convert it into a Beautiful Soup Object**:

In [32]:
soup = bs(xml_document, 'xml')
print(soup.prettify())

____________________
# Part 2: Data Exploration

In this part of the homework, we ask you to **investigate your file**. The **MEI** framework tends to be easy to understand (as it is meant to be human-readable, too!) – and given some simple tools you should be able to successfully navigate your data.

#### Step 2.1: General Info

Generally, every MEI file has some information about its contents. This typically covers the encoding standard, creators and editors involved with the file, and its structure.

In [33]:
#Head / title / meiHead / work / created Date etc

#### Step 2.2: Moving Down

As you have probably noticed by now, MEI (and XML, generally) files follow a **tree-like structure**. Any document has its elements defined recursively, as its children. Intuitively, **children** are elements contained within a broader element (think section), also known as **parent**.

Use the cell below to **get all children** of the document's Title statement (soup.titleStmt):

In [None]:
# get all children here

What are some of your observations about these elements? How much do they share with the parent element? How would you go about identifying any relation between these child elements?

Use the cell below to **pick one** of the elements children and **get to the lowest available element** in the structure:

In [None]:
# get to the bottom of it!

What are some of your observations?


#### Step 2.3: Moving Up

Working the opposite way around, you should be able to get to the entire (original) document given just one of its children. This can be done rather easily using Beautiful Soup and its recursive tree navigation.

Use the cell below to **pick a low-level child** element and **move up until its highest-level parent**:

In [None]:
# pick a child and move up

#### Step 2.3: Moving Sideways

As hierarchically-related elements are named children and parents, elements **at the same depth** are called **siblings**. 

Pick any element within your MEI file and output all of its siblings in the cell below:

In [None]:
# get siblings here

Did you notice anything interesting about sibling-element structure?

#### Step 2.4: Search

Finally, it is possible to **search for elements** within a Beautiful Soup XML/HTML file. As usual, elements can be searched based on ID, tag, class name, unique identifier, or some condition.

Use the cell below to **find all "measure" elements** in your document by searching by tag:

In [None]:
# find measures here

Now, get a particular **element by its ID**:

In [3]:
# find element by ID here

There is typically more than one right way to do a search. What would be an alternative way to find your element?

____________________
# Part 3: Data Analysis

In this part, we'll use some of the familiar Python-based statistics and Data visualization tools to analyze your file. 

To do:

    1. pitch distribution
    2. conditional measure distribution
    3. histograms / charts / etc

In [4]:
# to be implemented

____________________
# Part 3: Sandbox

In this final part of the homework, you are asked to combine some of the tools you have employed in sections 2 and 3 – as well as the lab – in order to produce a somewhat sophisticated music analysis of your document and reflect on it.

To do:

    1. combining tools
    2. 

____________________
# Reflections (optional)

Please use this space to reflect on your experience, ask any additional questions, or suggest changes to this or other assignments.