# <center>Big Data &ndash; Exercises</center>
## <center>Autumn 2025 &ndash; Week 4 &ndash; ETH Zurich</center>

# Introduction and setup
This exercise will cover XML and JSON well-formedness.

For the next few weeks you will be using [oXygen XML editor](https://www.oxygenxml.com/xml_editor.html), an XML/JSON development IDE. Before starting, make sure oXygen is installed and working on your computer. You can download the required licence from the [ETH IT shop](https://itshop.ethz.ch/EndUser/Items/Home):

1. Login with your ETH credentials

2. Click on `+ CREATE REQUEST` in the top right, select **Software and Business Applications** and go to **Software & Licenses** > **Order Software Product**.

3. Look for "oxygen" and select the version that fits your local setup.

4. Click **Next step** at the bottom, and accept the terms of services.

5. Wait until you get the confirmation email (it should take a couple of minutes). Simply download the __license file__, and then download the software from the [official website](https://www.oxygenxml.com/xml_editor/download_oxygenxml_editor.html?os=Linux), and proceed with the installation. You should get asked to copy the __license file__ at some point.

6. Alternatively, after downloading open a shell and `cd` to the directory where you downloaded the installer.

- At the prompt type:
```
sh ./oxygen-64bit-openjdk.sh
```
- Copy the license key (License Key String) provided in the instructions from step 4 and paste it in the license registration dialog box from the application.

*Another option is to follow the instructions on the IT shop page and to use the server address information below that applies to your operating system*

# 1. JSON 

## 1.1 Well-formedness
Correct the following JSON documents so that they are well-formed. First, try to “parse” them manually in your head, then use oXygen to check your solutions.

### 1.1.1 Document A

```
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  age: 25,
  "isRetired",
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100",
    'is verified' : "true"
  }
  'phoneNumbers': [
    {
      "type": [["home"]],
      "@number": "212 555-1234"
    },
    {
      "type": [["office"]],
      "@number": "646 555-4567"
    },
    {
      "type": [["mobile"[],
      "@number": "123 456-7890"
    }
  ],
  "children": [],
  "settings": {},
  "spouse": Null,
  "": ""
}
```

### 1.1.2 Document B

```
[
    1: {
      "name": 'John'
      "lastname": 'Smith',
      "account": "jsmith"
      "phonenumbers" [{
           "type": "home",
           "1phone": 212-3242,
           "2phone": "545-4568"
       }]
    },
    2: {
      "name": "Jane"
      "lastname": 'Doe',
      "account": "jdoe"
      "phonenumbers" [
      {
           "type": "home",
           "phone": "8989 7685"
      },
      "phone": "545-4568"
      ],
      "account": "janedoe"
    }
]
```

### 1.1.3 Document C

```
{
  "Physical quantities": [
    {"elementary charge": +1.6033e-19},
    {"electron specific charge": -1758819}
  ]
}
```

### 1.1.3 Document D

```
{
  "Physical quantities": [
    "sl":299792458,
    "eg":1.60217733e+19,
    "ep":-0
  ]
}
```

## 1.2 JSON Key Names
Which of the following are well-formed JSON key names? 

<img src="https://polybox.ethz.ch/index.php/s/A4FYLjCbwbNZqkb/download" width=200/>

## 1.3 JSON True/False Questions
Mark the following statements as either True or False. If a statement is False, briefly explain why.

- According to the JSON ECMA-404 standard, key/value pairs in a JSON object have no concept of ordering.
- Among the six JSON data types, there is a URL data type.
- According to the JSON ECMA-404 standard, JSON syntax describes a sequence of Unicode code points.
- Strings in JSON may be written using either double or single quotes.
- In JSON objects, keys must always be strings, while values may be any valid JSON data type.

# 2. XML
## 2.1 Well-formedness
Correct the following XML documents so that they are well-formed. Just as with the JSON documents from the previous exercise, first try to solve the problems without using any software, and then check your results.

### 2.1.1 Document A

```
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE catalog>
<catalog>
    <!-- Start book list --to be defined -->
   <Book id=`bk101`>
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95€</price>
      <publish_date version='hard' version='soft'>2000-10-01</publish_date>
      <_description lang=en>An `in-depth look` at creating applications 
      with XML <for dummies>.</_description>
      <xml_parse>true</xml_parse>
   </book>
</>
```

### 2.1.2 Document B

```
<?xml version="1.0" encoding="utf-16"?>
<h:library xmlns:xdc="http://www.xml.com/books" xmlns:h="http://xml.com/library">
    <head><h:title>Book Review</title></head>
    <body/>
        <_xdc:bookreview>
            <xdc:title>XML: A Primer</xdc:title>
            <_table _style='container'>
                <h:tr align="#center">
                    <h:td>Author<h:span>St. Laurent & Tom Faron</h:td></h:span>
                </h:tr>
                <h:tr align="#left">
                    <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
                    <h:td><xdc:price>31.98</xdc:price></h:td>
                    <h:td><xdc:#pages>352</xdc:#pages></h:td>
                    <h:td><xdc:_date>1998/01</xdc:_date></h:td>
                    <h:td><xdc:-comment>Love it</xdc:-comment></h:td>
                </h:tr>
            </_table>
        </_xdc:bookreview>
    </body>
</h:library>
```

### 2.2 XML Names
Which of the following XML tags are well-formed (i.e., which tags contain a conforming XML name)?
1. `<_bar/>`
1. `<123foo/>`
1. `<Foo/>`
1. `<foo 123>`
1. `<foo_123/>`
1. `<foo#123/>`
1. `<foo-123/>`
1. `<foo.123/>`
1. `<XmL_123/>`

## 3. Exercise: XML Document Structure

Below is an empty table describing where different XML constructs can appear within an XML document. Fill in each cell with **yes** or **no** to indicate whether the given construct (elements, attributes, text, comments) is allowed in that position.

|                | Top-Level | Between Element Tags | Inside Opening Element Tag |
|----------------|-----------|----------------------|----------------------------|
| **Elements**   | ?         | ?                    | ?                          |
| **Attributes** | ?         | ?                    | ?                          |
| **Text**       | ?         | ?                    | ?                          |
| **Comments**   | ?         | ?                    | ?                          |

## 4. XML Namespaces
Look at the following XML document:
```xml
<catalog xmlns="http://example.org/catalog"
         xmlns:bk="http://example.org/book"
         xmlns:pub="http://example.org/publisher">

    <book id="b1" pub:year="1949">
        <bk:title>1984</bk:title>
        <author>George Orwell</author>
    </book>

    <bk:magazine issue="42">
        <title>Science Weekly</title>
        <pub:editor>Jane Doe</pub:editor>
    </bk:magazine>

    <publisher>Generic Press</publisher>
</catalog>

```    

Your task is to determine the namespace in which the following elements and attributes live. 
Options are: 
- `no namespace`
- `http://example.org/catalog`
- `http://example.org/book`
- `http://example.org/publisher`

Each option may be used multiple times.

- The element `catalog` is in _______________________
- The element `book` is in _______________________
- The element `bk:title` is in _______________________
- The element `author` is in _______________________
- The element `pub:editor` is in _______________________
- The element `publisher` is in _______________________
- The attribute `id` is in _______________________
- The attribute `pub:year` is in _______________________
- The attribute `issue` is in _______________________