<h1 style="text-align: center; font-size: 40px">Java Review Cont'd and Intro to XML</h1>

## Loading Libraries

Let's import all the necessary packages first!

In [2]:
import java.util.*;
import java.lang.*;

// Types that will be used for 
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

//Exceptions that can occur when parsing XML
import org.xml.sax.SAXException;
import javax.xml.parsers.ParserConfigurationException;

//The classes we will use to parse the XML files
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

//File io classes
import java.io.File;
import java.io.IOException;
import java.io.InputStream;

## Objectives

The objectives of this worksheet are as follows:
* Introduction to the structure of the Extensible Markup Language (XML)
* The XML parsing template in Java
* Using XML data to instantiate classes

More generally the purpose of this section of the course is to introduce the idea that programs and the data they use can be separated. XML is a commonly used data storage and for that reason it has been selected as one of the prinary data formats we will be using in the course.


#### Using Jupyter
A few things to remind you with regard to using this Jupyter environment:
1. If the platform crashes don't worry. All of this is in the browser so just refresh and all of your changes up to your last save should still be there. For this reason we encourage you to **save often**.
2. Be sure to run cells from top to bottom.
3. If you're getting strange errors to go: Kernel->Restart & Clear Output. From there, run cells from top to bottom. 

Additionally keep an eye out for the badges in each section they give an indication for which sections are inclass activities .

<h2 style="text-align: center;">Intro to XML</h2>

### XML Structure

XML (Extensible Markup Language) is a file format used to store arbitrary information in a heirarchical fashion. This is done by using *tags* that encapsulate either information (e.g., text, numbers) or other tags. The names of these tags can be arbitrary meaning you get to name them whatever you like!

Consider the following snippet of XML. 

```xml
<animals>
    <animal>
        <type>dog</type>
        <name>Jack</name>
        <sex>Male</sex>
        <age>10</age>
    </animal>
    <animal>
        <type>cat</type>
        <name>Midnight</name>
        <sex>Female</sex>
        <age>13</age>
    </animal>
</animals>
```

In this example we have a set of start and end `dog` tags that encapsulates other information. Within the dog tags are other sets of tags that encapsulate information such as the name of the dog this record is on as well as some other information on said dog. The same goes for `cat`.

<hr style="border:1px solid gray"> </hr>


Now, lets pretend we are building a database for a vet and we want to store different types of animals and each animal has a the same structure as the example in the previous section. In the following exercise we will be:
1. Parsing an XML file.
2. Allowing the user to enter a type of animal and an attribute.
3. Return a list containing all animals of the given type in the database that have that attribute.

### Parsing XML

There are a number of steps that use several classes but, before we dive into the code, here is the structure of what we are attempting to accomplish:
1. Create an instance of the `DocumentBuilderFactory` object.
2. Use the `DocumentBuilderFactor` to create a `DocumentBuilder`. We will be using this object to parse our XML.
3. Open the file by creating a new `File` object.
4. Use our document builder to parse the XML in the `File` object.

#### DocumentBuilderFactory

This is the step where we make our `DocumentBuilderFactor`. This is a class that will allow us to spawn an instance of a `DocumentBuilder` which is the Java class that allows us to parse XML.

In [3]:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

#### DocumentBuilder

Next, we will use the `dbf` instance we created in the last step to instantiate a new `DocumentBuilder`. However, this process can produce a `PraserConfigurationException` so this step is done using a try-catch block.

In [4]:
DocumentBuilder db = null;
try{
    db = dbf.newDocumentBuilder();
} catch(ParserConfigurationException e){
    System.out.println("Failed to configure document builder object.");
}

#### File

Next, we need to read our file so that we can give it ot the `DocumentBuilder` to be parsed. So, we create a new file object and instantiate it with the path to our XML file.

In [5]:
File f = new File("data/animals-small.xml");

#### Document
Four our final step we use the `db` we created to parse the file. This returns a `Document` object which is our parsed XML file. However, this also has the possibility of creating exceptions if our XML is incorrectly formatted (SAXException) or that file cannot be read (IOException) so this step is also done in a try-catch block that looks for both of those exceptions.

In [6]:
Document doc = null;
try{
    doc = db.parse(f);
} catch(SAXException | IOException e){
    System.out.println("Failed to parse document");
}

#### Nodes

Now that we have parsed our XML file and have the structure stored in our `Document` object we move onto using the information to which we just gained access. Here are some of the objects and their functions we will be using to parse the data:

* `Node` -> This is an object representing a single snippet of XML. For instance, pair of animal tags an all the stuff in between them (e.g., `<animal> ... </animal>`). In order to get the text in between the two tags use the `.getTextContent()` function.
* `NodeList` -> This is simply a list of nodes. We use the `.item(i)` function to get an item at a specified position in a `NodeList`.
* `Element` -> This is very similar to a node however it allows for functionality like using the `.getElementsByTagName()` to get a `NodeList` of all elements within it that have a given tag name.

Read through the following comments to see how these functions and objects are used in practice.

In [47]:
public void printAnimalNames(NodeList animals){
    for(int i = 0; i < animals.getLength(); i++){
        
        //Inorder to search the substructure we must cast this as an Element
        //This gets a single instance of the <animal> ... </animal> tags at 
        //position i in the NodeList.
        Element node = (Element) animals.item(i); 
        
        /* Next, we get all pairs of <name>...</name> tags in the animal
        tag we're on.*/
        NodeList nameTags = node.getElementsByTagName("name");
            
        /* Recall, our structure only has one name tag we we can be assured
        that the name is the only element in that list*/
        Node nameTag = nameTags.item(0);
        
        /* Finally, let's get the text inbetween the name tag we just got and print it */
        String name = nameTag.getTextContent();
        System.out.println(name);
    }
}

In [49]:
NodeList animals = doc.getElementsByTagName("animal");
printAnimalNames(dogs);

Jack
Midnight


<h2 style="text-align: center;">Parsing XML and Creating Classes</h2>

<img alt="Activity - In-Class" src="https://img.shields.io/badge/Activity-In--Class-E84A27" align="left" style="margin-right: 5px"/>
<br>
<br>

Now that we have our XML parser, let's use that parser to extract information from the file then use that information to instantiate classes. 

First we will need a class to store animal information. Let's create a class `Animal` that has attributes that correspond to each of the attributes encapsulated by the `<animal>` tags. Additionally, override the `toString()` method such that, when an instance of `Animal` is printed. The following output is produced:
```bash
<Name>:
    - type: <type>
    - age: <age>
    - sex: <sex>
```

We also need a way to compare between animals if we want to sort the list. This involves using the class definition `implements Comparable<>`. The `Comparable` interface contains the method definitions for methods that are used for comparing between objects and ultimately for sorting collections of objects. As such, we will be implementing the `compareTo` method such that two instances of `Animal` can be compared in order to determine which is greater or if they are equal. For this class the `compareTo` method has been implemented to make the comparisions based on the name so that if we were to sort a `List` containing instances of `Animal` it would be sorted in alphabetical order. **I have provided the code for this so simply read through it but don't modify it.

We will be covering this concept along with interfaces next week.

In [None]:
class Animal implements Comparable<Animal>{
    //Add the class attributes here
    
    //Define the constructor
    Animal(/*add parameters*/){
        
    }
    
    //Override the ToString method 
    
    //Provided override compareTo()
    @Override
    public int compareTo(Animal other){
        return name.compareTo(other.name)
    }
}

Now that we have this class created lets use the XML template from before here to perform the following operations:
1. Create a list of dogs
2. Parse the animal XML file
3. Iterate over the `NodeList` of animals and at each iteration:
    * Extract the values associated with the animals
    * Instantiate an `Animal` using those values
    * Add those elements to the list
    

In [None]:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

DocumentBuilder db = null;
try{
    db = dbf.newDocumentBuilder();
} catch(ParserConfigurationException e){
    System.out.println("Failed to configure document builder object.");
}

File f = new File("data/animals-large.xml");

Document doc = null;
try{
    doc = db.parse(f);
} catch(SAXException | IOException e){
    System.out.println("Failed to parse document");
}

In [None]:
NodeList animals = doc.getElementsByTagName("animal");
List<Animal> animalList = new ArrayList<>();

for(int i = 0; i < animals.getLength(); i++){
    Element elem = (Element) animals.item(i);
    
    String type = elem.getElementByTagName("type").item(0).getTextContent();
    String name = elem.getElementsByTagName("name").item(0).getTextContent();
    String sex = elem.getElementByTagName("sex").item(0).getTextContent();
    Integer age = Integer.parseInt(elem.getElementByTagName("age").item(0).getTextContent());
    
    animalList.add(new Animal(type, name, sex, age));
}

Now that we have our list of animals create and call a function that takes a parameter of type `List<Animal>` then iterates over said list and prints each instance of animal within.

In [None]:
public void printAnimalList(/*Add the parameter*/){
    //Fill in the body of the function
}

In [None]:
//Call the function here on the list of animals you created

<h2 style="text-align: center;">Getting User Input</h2>

If we want to get input from the user we must use the `Scanner` class and instantiated as suchOnce you have a scanner instantiated you can can prompt the user with `System.out.print("Message here: ")` and then retrieve the user's input with one of the following methods:
* `sc.nextInt()`
* `sc.nextLine()`
* `sc.nextDouble()`
* `sc.nextFloat()`

There are even more methods for retrieving user input however these are the most common. Each gets the user input and attempts to convert it to the given type (i.e., `sc.nextLine()` converts input to Integer). Below are examples of each of these you are welcome to test out. Try putting in different types of data to observe the errors that are produced.

In [None]:
Scanner sc = new Scanner(System.in);
System.out.print("Enter an integer value: ");
Integer intValue = sc.nextInt();
System.out.println(intValue);

In [None]:
Scanner sc = new Scanner(System.in);
System.out.print("Enter a floating point: ");
Float floatValue = sc.nextFloat();
System.out.println(floatValue);

In [None]:
Scanner sc = new Scanner(System.in);
System.out.print("Enter a sequence of characters: ");
String strValue = sc.nextLine();
System.out.println(strValue);

### Filtering Based on User Input

Now that we've reviwed the `Scanner` class, let's use user input to filter create methods that filter the list of animals we created earlier. 

<img alt="Activity - In-Class" src="https://img.shields.io/badge/Activity-In--Class-E84A27" align="left" style="margin-right: 5px"/>  Create a method `filterAge` that takes a single parameter of type `List<Animal>` and returns a list of the same type. This function should:
1. Create a new list of type `List<Animal>`
2. Read in a single integer from the user 
3. Filter all animals with an age attribute that is greater than or equal to the number.
4. Return the list of filtered `Animal`s

In [None]:
/* Your Solution Here */
public List<Animal> filterAge(List<Animal> animals){
    //Fill in the body of the function
}

In [None]:
/* Once your function is implemented, test it here */

<img alt="Activity - In-Class" src="https://img.shields.io/badge/Activity-In--Class-E84A27" align="left" style="margin-right: 5px"/>  Create a function `filterType` that takes a single parameter of type `List<Animal>` and returns a list of the same type. This function should:
1. Create a new list of type `List<Animal>`
2. Read in a single string from the user 
3. Filter all animals that have a type that is equal to the string the user inputted
4. Return the list of filtered `Animal`s

As a reminder to compare strings you cannot use the `==` operator. Instead, use one of the following functions:
* `string1.equals(string2)`
* `string.equalsIgnoreCase(string2)`

In this case we reccomend using the latter function since we don't care about case sensitivity.

In [None]:
/* Your Solution Here */

In [None]:
/* Once your function is impelemented, test it with this cell */