# <center>Big Data &ndash; Exercises &ndash; Solution</center>
## <center>Fall 2024 &ndash; Week 4 &ndash; ETH Zurich</center>

# Introduction and setup
This exercise will cover XML and JSON well-formedness.

For the next few weeks you will be using [oXygen 26.0](https://www.oxygenxml.com/xml_editor/download_oxygenxml_editor.html), an XML/JSON development IDE. Before starting, make sure oXygen is installed and working on your computer. You can download the required licence from the [ETH IT shop](https://itshop.ethz.ch/EndUser/Items/Home):

1. Login with your ETH credentials, click on **+ CREATE REQUEST** in the top right, select **Software and Business Applications** and go to **Software & Licenses** > **Order Software Product**.

2. Look for "oxygen" and select the version that fits your local setup.

3. Click **Next step** at the bottom, and accept the terms of services.

4. Wait until you get the confirmation email (it should take a couple of minutes). Simply download the __license file__, and then download the software from the [official website](https://www.oxygenxml.com/xml_editor/software_archive_editor.html), and proceed with the installation. You should get asked to copy the __license file__ at some point.

5. Alternatively, after downloading open a shell and cd to the directory where you downloaded the installer.

- At the prompt type:
```
sh ./oxygen-64bit-openjdk.sh
```
- Copy the license key (License Key String) provided in the instructions from the step 4 and paste it in the license registration dialog box from the application.

*Another option is to follow the instructions on the IT shop page and using the server address information below that applies to your operating system*

# 1. JSON 

## 1.1 Well-formedness
Correct the following JSON documents to be well-formed. Try first to "parse" them in your mind manually, then use oXygen to check your solutions.

### 1.1.1 Document A

```
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  age: 25,
  "isRetired",
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100",
    'is verified' : "true"
  }
  'phoneNumbers': [
    {
      "type": [["home"]],
      "@number": "212 555-1234"
    },
    {
      "type": [["office"]],
      "@number": "646 555-4567"
    },
    {
      "type": [["mobile"[],
      "@number": "123 456-7890"
    }
  ],
  "children": [],
  "settings": {},
  "spouse": Null,
  "": ""
}
```

**Solution**
1. `age` key must be double quoted.
2. `isRetired` must have a value.
3. `is verified` and `phoneNumbers` should be double quoted.
4. `address` object must be followed by a comma.
5. The nested array in the `type` attribute of the last `phoneNumbers` is incorrectly balanced (`[["mobile"[]`).
6. `Null` is not a valid value (`null` is valid).

*Best practices:*
- Using whitespaces and non-ascii characters for key names is allowed although not recommended. 
- Mixing proper boolean values and strings used as boolean values (ie. "true") is considered bad practice.

Corrected document:

```json
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "isRetired": false,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100",
    "isVerified" : true
  },
  "phoneNumbers": [
    {
      "type": [["home"]],
      "@number": "212 555-1234"
    },
    {
      "type": [["office"]],
      "@number": "646 555-4567"
    },
    {
      "type": [["mobile"]],
      "@number": "123 456-7890"
    }
  ],
  "children": [],
  "settings": {},
  "spouse": null,
  "": ""
}
```

### 1.1.2 Document B

```
[
    1: {
      "name": 'John'
      "lastname": 'Smith',
      "account": "jsmith"
      "phonenumbers" [{
           "type": "home",
           "1phone": 212-3242,
           "2phone": "545-4568"
       }]
    },
    2: {
      "name": "Jane"
      "lastname": 'Doe',
      "account": "jdoe"
      "phonenumbers" [
      {
           "type": "home",
           "phone": "8989 7685"
      },
      "phone": "545-4568"
      ],
      "account": "janedoe"
    }
]
```

**Solution:**
1. The document must start with `{`, not with `[`.
2. All strings must be double quoted.
3. Commas are missing after `"John"`, `"jsmith"`, `"Jane"` and `"jdoe"`
4. `:` are missing after `phonenumbers`.
5. `212-3242` is an invalid number, to include the dash it would need to be a string.
6. `"phone": "545-4568"` can not be an element in an array, it has to be part of an object (inside `{ }`).
7. Duplicated key `account` in the second element.

Corrected document:

```json
{
    "1": {
      "name": "John",
      "lastname": "Smith",
       "account": "jsmith",
       "phonenumbers": [{
           "type": "home",
           "1phone": "212-3242",
           "2phone": "545-4568"
       }]
    },
    "2": {
      "name": "Jane",
      "lastname": "Doe",
       "account": "jdoe",
       "phonenumbers": [
          {
              "type": "home",
              "phone": "8989 7685"
          },
          {
            "phone": "545-4568"
          }
       ]
    }   
}
```

## 1.2 JSON Key Names
Which of the following are well-formed JSON key names? 
1. `""`
1. `"123456"`
1. `"abcd"`
1. `"\"`
1. `"\\"`
1. `"""`
1. `"'"`

**Solution**

1, 2, 3, 5, 7 are valid key names. The only restriction the JSON syntax imposes on the key names is that " and \ must be escaped.

# 2 XML
## 2.1 Well-formedness
Correct the following XML documents to be well-formed! Just as with the JSON documents from the last exercise, first try to solve the problems without software, and then check.

### 2.1.1 Document A

```
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE catalog>
<catalog>
    <!-- Start book list --to be defined -->
   <Book id=`bk101`>
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95€</price>
      <publish_date version='hard' version='soft'>2000-10-01</publish_date>
      <_description lang=en>An `in-depth look` at creating applications 
      with XML <for dummies>.</_description>
      <xml_parse>true</xml_parse>
   </book>
</>
```

**Solution**

Document A has the following problems:
1. Comments `<!-- -->` cannot include the characters `--`;
2. The quotes in XML must always be simple quotes or double quotes, but not "Word-style" quotes (〝, 〞, \`, etc.);
3. Attribute `version` in `publish_date` is duplicated, this is forbidden;
4. The `lang` attribute should be quoted;
5. `<` must be escaped in text. Also it is suggested to use `&gt;` for the `>` symbol;
6. The `book` start tag does not correspond to the `Book` end tag;
7. The `catalog` tag is not closed correctly;
8. XML names beginning with xml are reserved by the W3C. Their usage should be avoided (except if it is as specified as the W3C, e.g. xml:space, xml:lang, xmlns...). **OxYgen does not show this as an error to be future-compatible, but this is still considered an error**.

Here is the corrected document:

```xml
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE catalog [
<!ENTITY cright "&#169;">
]>
<catalog>
    <!-- Start book list - -to de defined -->
   <Book id='bk101'>
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95€</price>
      <publish_date version='hard' version2='soft'>2000-10-01</publish_date>
      <_description lang='en'>An `in-depth look` at creating applications 
      with XML &lt;for dummies&gt;.</_description>
      <parse>true</parse>
   </Book>
</catalog>
```

### 2.1.2 Document B

```
<?xml version="1.0" encoding="utf-16"?>
<h:library xmlns:xdc="http://www.xml.com/books" xmlns:h="http://xml.com/library">
    <head><h:title>Book Review</title></head>
    <body/>
        <_xdc:bookreview>
            <xdc:title>XML: A Primer</xdc:title>
            <_table _style='container'>
                <h:tr align="#center">
                    <h:td>Author<h:span>St. Laurent & Tom Faron</h:td></h:span>
                </h:tr>
                <h:tr align="#left">
                    <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
                    <h:td><xdc:price>31.98</xdc:price></h:td>
                    <h:td><xdc:#pages>352</xdc:#pages></h:td>
                    <h:td><xdc:_date>1998/01</xdc:_date></h:td>
                    <h:td><xdc:-comment>Love it</xdc:-comment></h:td>
                </h:tr>
            </_table>
        </_xdc:bookreview>
    </body>
</h:library>
```

**Solution**

Document B has the following problems:
1. `<h:title>` opening tag does not match the closing tag `</title>`;
1. In `<_xdc:_bookreview>` the namespace `_xdc` is not defined;
1. The `&` in the author text field should be escaped;
1. The `<h:span>` element containing the author name should be closed before closing its parent;
1. `<xdc:#pages>` is not a valid tag name;
1. `<xdc:-comment>` is not a valid tag name.
1. `body` uses an empty tag when opening tag is required instead;

Here is the corrected document:

```xml
<?xml version="1.0" encoding="utf-16"?>
<h:library xmlns:xdc="http://www.xml.com/books" xmlns:h="http://xml.com/library">
    <head><h:title>Book Review</h:title></head>
    <body>
    <xdc:bookreview>
        <xdc:title>XML: A Primer</xdc:title>
        <_table _style='container'>
            <h:tr align="#center">
                <h:td>Author<h:span>St. Laurent &amp; Tom Faron</h:span></h:td>
            </h:tr>
            <h:tr align="#left">
                <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
                <h:td><xdc:price>31.98</xdc:price></h:td>
                <h:td><xdc:pages>352</xdc:pages></h:td>
                <h:td><xdc:_date>1998/01</xdc:_date></h:td>
                <h:td><xdc:comment>Love it</xdc:comment></h:td>
            </h:tr>
        </_table>
    </xdc:bookreview>
    </body>
</h:library>
```

### 2.2 XML Names
Which of the following are well-formed XML tags (i.e. which tag contain a conform XML name)? 
1. `<_bar/>`
1. `<123foo/>`
1. `<Foo/>`
1. `<foo 123>`
1. `<foo_123/>`
1. `<foo#123/>`
1. `<foo-123/>`
1. `<foo.123/>`
1. `<XmL_123/>`

**Solution**

1, 3, 5, 7, 8 are valid names. Remember:
1. Element names are case-sensitive.
1. Element names must start with a letter or underscore.
1. Element names cannot start with the letters xml (or XML, or Xml, etc).
1. Element names can contain letters, digits, hyphens, underscores, and periods.
1. Element names cannot contain spaces.

### 2.3 Predefined entities
XML has only 5 predefined entities. Connect each escape code to the corresponding value.
1. `&lt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     >
1. `&amp;`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;           "
1. `&gt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     '
1. `&quot;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           &
1. `&apos;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           <

Which characters must always be escaped?

**Solution:**
1. `&lt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     <
1. `&amp;`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;           &
1. `&gt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     >
1. `&quot;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           "
1. `&apos;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           '

& and < must always be escaped.

## 3 Conversion from a relational database

Messages from conversations between users on some online forum are currently stored in the following table. Take the table, and translate the data into JSON and XML!

*Hint: Try not just to go row by row, but use nesting to keep the the same key-value pairs from appearing more often than necessary. Mind that you might need to escape characters, and that you have to handle null values in some way or another.*

|conversation_id | people | sender | content | timestamp | is_read | attachment_id|
|----------------|--------|--------|---------|-----------|---------|--------------|
|42|charlie,ari,jesse|charlie|hey, here's the doc ><|1510410193|TRUE|NULL|
|42|charlie,ari,jesse|charlie|NULL|1510410244|TRUE|doc_6492|
|42|charlie,ari,jesse|ari|thanks! \o/|1510432987|FALSE|NULL|
|17|rudy,sage|rudy|look at this cute "bat-cat"! 😻|1500897189|TRUE|img_91847|
|17|rudy,sage|NULL|aww ♥|1506610190|TRUE|NULL| 

## 3.1 JSON 

**Solution**

There are, of course, many possible solutions. Here is one of them:

```json
[
    {
        "conversation_id": 42,
        "people": ["charlie", "ari", "jesse"],
        "messages": [
            {
                "sender": "charlie",
                "content": "hey, here's the doc ><",
                "timestamp": 1510410193,
                "is_read": true
            },
            {
                "sender": "charlie",
                "timestamp": 1510410244,
                "is_read": true,
                "attachment_id": "doc_6492"
            },
            {
                "sender": "ari",
                "content": "thanks! \\o/",
                "timestamp": 1510432987,
                "is_read": false
            }
        ]
    },
    {
        "conversation_id": 17,
        "people": ["rudy", "sage"],
        "messages": [
            {
                "sender": "rudy",
                "content": "look at this cute \"bat-cat\"! 😻",
                "timestamp": 1500897189,
                "is_read": true,
                "attachment_id": "img_91847"
            },
            {
                "content": "aww ♥",
                "timestamp": 1506610190,
                "is_read": true
            }
        ]
    }
]

```

Observe that for NULL values, we can just elegantly skip the key corresponding to the field. However, we could have incuded those keys and set their values to `null` instead. It is a matter of what convention you are following (or personal preference).

## 3.2 XML 

**Solution:**

Here is a well-formed solution, but as in the last case, there are multiple possibilities. Usually in practice, we also define a schema with a specification of the type of each value, transforming timestamps into `xs:datetime`, etc. but here we ommited this for brevity.

```xml
<?xml version="1.0" encoding="UTF-8" ?>
<conversations>
    <conversation id="42">
        <people>
            <person>charlie</person>
            <person>ari</person>
            <person>jesse</person>
        </people>
        <messages>
            <message>
                <sender>charlie</sender>
                <content>hey, here's the doc &gt;&lt;</content>
                <timestamp>1510410193</timestamp>
                <is_read>true</is_read>
            </message>
            <message>
                <sender>charlie</sender>
                <timestamp>1510410244</timestamp>
                <is_read>true</is_read>
                <attachment_id>doc_6492</attachment_id>
            </message>
            <message>
                <sender>ari</sender>
                <content>thanks! \o/</content>
                <timestamp>1510432987</timestamp>
                <is_read>false</is_read>
            </message>
        </messages>
    </conversation>
    <conversation id="17">
        <people>
            <person>rudy</person>
            <person>sage</person>
        </people>
        <messages>
            <message>
                <sender>rudy</sender>
                <content>look at this cute &quot;bat-cat&quot;! 😻</content>
                <timestamp>1500897189</timestamp>
                <is_read>true</is_read>
                <attachment_id>img_91847</attachment_id>
            </message>
            <message>
                <content>aww ♥</content>
                <timestamp>1506610190</timestamp>
                <is_read>true</is_read>
            </message>
        </messages>
    </conversation>
</conversations>
```

In the case of XML, there are also several (some better, some worse) ways to represent missing values:
1. by not including the tag at all
2. by including an empty tag
3. by ìncluding the tag and setting `xsi:nil="true"` as an attribute.
  
In most cases however, it is important to not have tags for NULL values, otherwise the value is interpreted as the empty string, so it is a better practice to avoid (2). Here we just did not include the tags (1).

# More on XML

## 4.1 XML Namespaces

1. Is the following XML file well-formed?
2. What are the namespaces of each attribute and each element?
3. What's wrong with this file? Fix it so it is well-formed, follows best practices, and each element uses the correct namespace.

```xml
<?xml version="1.0" encoding="UTF-8"?>
<foo
xmlns="http://xmlrepo.test/foo.xml"
xmlns:foo="http://xmlrepo.test/foo.xml"
xmlns:math="http://xmlrepo.test/math.xml">
    <bar:baz xmlns:bar="http://xmlrepo.test/bar.xml" bar:attr="some attribute" lalala="some other attribute">
        <svg xmlns="http://xmlrepo.test/svg.xml">
            <textbox>
                <math:msup>42</math:msup>
                <foo:plus/>
                <math:msub>17</math:msub>
            </textbox>
            <foo_value id="748">some value</foo_value>
        </svg>
        <svg xmlns:svg="http://xmlrepo.test/svg.xml">
            <svg:textbox>
                <math:msup>42</math:msup>
                <foo:plus/>
                <msub>17</msub>
            </svg:textbox>
            <bar_value id="867">some other value</bar_value>
        </svg>
        <math:othermath/>
    </bar:baz>
</foo:foo>

```

**Solution**

1. The document is weird and is full of "obvious" mistakes (see answer to question 3), but it is technically well-formed.
2. 
  - `foo` is in namespace `foo`. 
  - In the first `<svg>` tag, elements `textbox` and `foo_value` are in namespace `svg`. 
  - In the second `<svg>`, `msub` and `bar_value` are in namespace `foo`. 
  - Prefixed attributes are in the corresponding namespaces. 
  - Nonprefixed attributes (`lalala` and `id`) are in no namespace.
3. Let's declare everything in the root, not using any default namespace at all (although leaving `foo` as the default namespace could be reasonable), and prefix everything.

  **Note:** It is considered best practice to have everything inside a namespace, including the attributes. However, `lalala` and `id` were originally in no namespace, so we need to decide where to move them (in this example we put `lalala` in `bar`, and `id` in `foo`).   

```xml
<?xml version="1.0" encoding="UTF-8"?>
<foo:foo
    xmlns:foo="http://xmlrepo.test/foo.xml"
    xmlns:bar="http://xmlrepo.test/bar.xml"
    xmlns:svg="http://xmlrepo.test/svg.xml"
    xmlns:math="http://xmlrepo.test/math.xml">
    <bar:baz bar:attr="some attribute" bar:lalala="some other attribute">
        <svg:svg>
            <svg:textbox>
                <math:msup>42</math:msup>
                <foo:plus/>
                <math:msub>17</math:msub>
            </svg:textbox>
            <foo:foo_value foo:id="748">some value</foo:foo_value>
        </svg:svg>
        <svg:svg>
            <svg:textbox>
                <math:msup>42</math:msup>
                <foo:plus/>
                <foo:msub>17</foo:msub>
            </svg:textbox>
            <foo:bar_value foo:id="867">some other value</foo:bar_value>
        </svg:svg>
        <math:othermath/>
    </bar:baz>
</foo>
```

## 4.2 HTML vs XHTML (Optional)
As mentioned during the lectures, HTML has a similar structure to XML, but it's more lenient, meaning, that valid HTML documents are not necessarily valid XML documents. XHTML is an extension to HTML that conforms to the XML standard. 
1. Is the following correct XML? 
2. Is it correct XHTML? 
3. Is it correct HTML?

```html
<html>
  <head>
    <title>Untitled</title>
  </head>
  Dear jane <br>
  <p>You are invited at the weekly meeting
  <p>Yours sincerely, <br>
  John
</html>
```

**Solution**

(1) This will be shown correctly in most browsers. However, it is not well-formed XML: the `br` and `p` tags are not closed. The following would be well-formed XML:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<html>
    <head>
        <title>Untitled</title>
    </head>
    <body>
        Dear jane <br/>
        <p>You are invited at the weekly meeting</p>
        <p>Yours sincerely, <br/>
            John</p>
    </body>
</html>
```

(2) XHTML is more than just XML: it also has to have a certain structure (this is called to be "valid"). Among others, the tags have to live in the XHTML namespace:

```xml
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Untitled</title>
    </head>
    <body>
        <div>Dear jane 
            <p>You are invited at the weekly meeting</p>
            <p>Yours sincerely, <br/>
                John</p>
        </div>
    </body>
</html>
```

(3) As to whether this is correct HTML, we first have to pick a HTML version to answer that question. With HTML5, you would need to add `<!DOCTYPE html>` in the beginning of the file; and most validators will complain if you don't add a charset declaration (like `<meta charset="utf-8">`) to your file. Then, it becomes a valid file — closing the `<p>` tags is optional in HTML5. It is good practice to close them, though, for consistency and predictability when using CSS.

# 5. From XML to JSON - back to the REST API request result from previous exercise sessions.
In this exercise you are asked to translate the following XML document into a JSON document. A REST API call to Azure Blob Storage resulted in an XML file. Below you can find it (with some elements removed for simplicity and a second fake blob added to the response). Transform it to a JSON file!
```xml 
<EnumerationResults ContainerName="https://melaniestorage.blob.core.windows.net/exercise02">
    <Blobs>
        <Blob>
            <Name>picture</Name>
            <Url>https://melaniestorage.blob.core.windows.net/exercise02/picture</Url>
            <Properties>
                <Last-Modified>Wed, 03 Oct 2018 07:22:16 GMT</Last-Modified>
                <Content-Length>136356</Content-Length>
                <Content-Encoding />
                <BlobType>BlockBlob</BlobType>
            </Properties>
        </Blob>
        <Blob>
            <Name>music</Name>
            <Url>https://melaniestorage.blob.core.windows.net/exercise02/music</Url>
            <Properties>
                <Last-Modified>Wed, 03 Oct 2018 07:23:16 GMT</Last-Modified>
                <Content-Length>222222</Content-Length>
                <Content-Encoding />
                <BlobType>BlockBlob</BlobType>
            </Properties>
        </Blob>
    </Blobs>
</EnumerationResults>
```

**Solution**

```
{"EnumerationResults": 
    {"ContainerName": "https://melaniestorage.blob.core.windows.net/exercise02", 
            "Blobs": 
            [{"Blob": {"Name": "picture", 
                       "Url": "https://melaniestorage.blob.core.windows.net/exercise02/picture", 
                       "Properties": 
                                {"Last-Modified": "Wed, 03 Oct 2018 07:22:16 GMT", 
                                "Content-Length": 136356, 
                                "Content-Encoding": null, 
                                "BlobType": "BlockBlob"}
                        }
                },
                {"Blob": {"Name": "music", 
                          "Url": "https://melaniestorage.blob.core.windows.net/exercise02/music", 
                          "Properties": 
                                {"Last-Modified": "Wed, 03 Oct 2018 07:23:16 GMT", 
                                "Content-Length": 222222, 
                                "Content-Encoding": null, 
                                "BlobType": "BlockBlob"}
                         }
                 }]
    }
} 
```

# 6. JSON to XML - exploring an open API
In this exercise you can use any open API that answers with a JSON. One such API is: [the Star Wars API](https://swapi.dev/). Below you can find an (slightly modified) example of the response to the request: https://swapi.dev/api/people/1/. Transform it (our if you want to make your own call, that one) to XML!

```json
{
  "name": "Luke Skywalker",
  "height": "172",
  "mass": "77",
  "homeworld": "http://swapi.dev/api/planets/1/",
  "films": [
    "http://swapi.dev/api/films/1/",
    "http://swapi.dev/api/films/2/",
    "http://swapi.dev/api/films/3/",
    "http://swapi.dev/api/films/6/"
  ],
  "starships": [],
  "vehicles": [
    "http://swapi.dev/api/vehicles/14/",
    "http://swapi.dev/api/vehicles/30/"
  ]
}
```

**Solution:** 

```xml
<?xml version="1.0" encoding="UTF-8"?>
<document>
    <name>Luke Skywalker</name>
    <height>172</height>
    <mass>77</mass>
    <homeworld>http://swapi.dev/api/planets/1/</homeworld>
    <films>
        <film>http://swapi.dev/api/films/1/</film>
        <film>http://swapi.dev/api/films/2/</film>
        <film>http://swapi.dev/api/films/3/</film>
        <film>http://swapi.dev/api/films/6/</film>
    </films>
    <starships>
        <starship></starship>
    </starships>
    <vehicles>
        <vehicle>http://swapi.dev/api/vehicles/14/</vehicle>
        <vehicle>http://swapi.dev/api/vehicles/30/</vehicle>
    </vehicles>
</document>
```

# 7. XML vs CSV - the limits of tables for heterogeneous data
If your document consists of a collection of heterogeneous objects with different attributes, XML/JSON turns out to be more suited than a comma-separated format to store the data. In this exercise we want to show that denormalization is a good idea in this setting. 

You are given the following XML document representing a collection of products available in an online shop selling all kinds of products. Note that in this product catalog, each product has different attributes. 
```xml
<productscatalog>
    <product>
        <id> 1 </id>
        <category> BBQ </category>
        <type> Gas </type>
        <height> 120cm </height>
    </product>
    <product>
        <id> 2 </id>
        <category> notebook </category>
        <brand> Apple </brand>
        <specs>
             <RAM> 16Gb </RAM>
            <storage> 128Gb </storage>
        </specs>
    </product>
    <product>
        <id> 3 </id>
        <category> shoes </category>
        <size> 39 </size>
        <model> Heels </model>
    </product>
</productscatalog>
```    

**1. Turn this data into a CSV file (i.e. into a table)!**

**Solution:**

```
id, category, type, height, brand, specs:RAM, specs:storage, size, model
1, BBQ, Gas, 120cm,,,,,
2, notebook,,,Apple,16Gb,128Gb,,
3,shoes,,,,,,39, Heels
```

This solution is however not unique, you could for example also store it in the following way:

```
id, AttributeName, AttributeValue
1, category, BBQ
1, type, Gas
1, height, 120cm
2, category, notebook
2, brand, Apple
2, specs:RAM, 16Gb
2, specs:storage, 128Gb
3, category, shoes
3, size, 39
3, model, Heels
```

**2. What are the disadvantages of the CSV format compared to the XML format in this case?**

**Solution:**

For the first solution:
We have different attributes for each category of products, so most of the columns in the table are empty. The resulting table is extremely sparse and not easily humanly readable. 

For the second solution: 
It is not convenient to read with several lines for the same product. You have to store the id multiple times. And you need to make sure the table is sorted by id if you want to see all the attributes for one product as a group.

Other problem: if we have a lot of nested attributes it can be cumbersome to put them in the table. 

**3. Describe or give an example of one use case where the CSV format would be more appropriate than the XML format!**

**Solution:**

If all the rows have the same (fixed set) of attributes and there is no nesting, it is more natural to describe the data as a table.