# W38Mon: Digital Data Structures - Demo

Welcome to the Demo Notebook for the SDS Base Camp session on digital data structures.

The Demo Notebook mirrors the structure of the lecture, so we will start with HTML and see how we can render it in Jupyter Notebooks and explore it's structure and tags. Then we will explore one example of how to use Cascading Style Sheets in Python. We will close with an example of JSON and how to access data stored in JSON files.

## 1. HTML

As we discussed in the lecture, HTML is the common language used by web pages to structure and format the content they are presenting. This structure uses tags for formatting content and nesting to represent more complex structures. One of the really amazing things about Jupyter Notebooks and HTML is that...

<html>
<body>

<h1>You can render web pages inside Jupyter Notebook markdown cells</h1>
<h2>And use most tags, like this sub-heading</h2>


<p>This is a lenghty paragraph of text. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>

<p>This is another paragraph of text with a text entry box following it.</p>

<input type="text" required minlength="4" maxlength="8" size="10">


<h2>This is another sub-heading</h2>
<p>And one paragraph with some <del>mistakes</del> <ins>corrections</ins> and <em>emphasized</em> words. And once you understand how to use this in practice, I hope this gif is you:</p>
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2Fw370Abr1SDvalxR6vs%2F200.gif%3Fcid%3D790b7611jnk3lv2fd4qvmfhlqraqr55mo1gmgzy1bg1cx7sm%26rid%3D200.gif&f=1&nofb=1" width="200" height="200">

<p>Now for one final pragraph introducing a set of adjectives that might be applicable to how you feel:</p>
<ul>
    <li>Elated</li>
    <li>Enthralled</li>
    <li>Flabbergasted</li>
</ul>

</body>
</html>

For many purposes, using HTML in your Jupyter Notebook might be overkill. But it is good to know that you can do this, in order to get a better sense of how web pages are built up. For this purpose, we can also print the entire HTML code producing the embedded page above, by simply indenting all of it by one tab.

    <html>
    <body>

    <h1>You can render web pages inside Jupyter Notebook markdown cells</h1>
    <h2>And use most tags, like this sub-heading</h2>


    <p>This is a lenghty paragraph of text. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>

    <p>This is another paragraph of text with a text entry box following it.</p>

    <input type="text" required minlength="4" maxlength="8" size="10">


    <h2>This is another sub-heading</h2>
    <p>And one paragraph with some <del>mistakes</del> <ins>corrections</ins> and <em>emphasized</em> words. And once you understand how to use this in practice, I hope this gif is you:</p>
    <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2Fw370Abr1SDvalxR6vs%2F200.gif%3Fcid%3D790b7611jnk3lv2fd4qvmfhlqraqr55mo1gmgzy1bg1cx7sm%26rid%3D200.gif&f=1&nofb=1" width="200" height="200">

    <p>Now for one final pragraph introducing a set of adjectives that might be applicable to how you feel:</p>
    <ul>
        <li>Elated</li>
        <li>Enthralled</li>
        <li>Flabbergasted</li>
    </ul>

    </body>
    </html>

This then allows us to understand how the different tags are used, which comes in handy when we want to collect data from web pages. At the same time, this vanilla HTML is lacking a bit of color and control about presentation, which we gain by using Cascading Style Sheets.

## 2. CSS

With CSS web pages can be made more visually appealing and structured in more meaningful ways. In fact, a number of HTML tags have been abandoned in favor of styling using CSS, one example being the `<center>` tag. Let's look at the html document we introduced above with some additional styling using CSS.

<html>
<body>

<h1 style="color: #FF0000"> You can render web pages inside Jupyter Notebook markdown cells</h1>
<h2 style="text-align: center">And use most tags, like this sub-heading</h2>
<p style="font: italic small-caps bold 16px/2 cursive">This is a lenghty paragraph of text. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>

<p>This is another paragraph of text with a text entry box following it.</p>

<input type="text" required minlength="4" maxlength="8" size="10">


<div class="alert alert-warning">
<h2 style="text-align: center">This is another sub-heading</h2>
<p>And one paragraph with some <del>mistakes</del> <ins>corrections</ins> and <em>emphasized</em> words. And once you understand how to use this in practice, I hope this gif is you:</p>
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2Fw370Abr1SDvalxR6vs%2F200.gif%3Fcid%3D790b7611jnk3lv2fd4qvmfhlqraqr55mo1gmgzy1bg1cx7sm%26rid%3D200.gif&f=1&nofb=1" width="200" height="200">

<div class="alert alert-success">
<p>Now for one final pragraph introducing a set of adjectives that might be applicable to how you feel:</p>
<ul>
    <li>Elated</li>
    <li>Enthralled</li>
    <li>Flabbergasted</li>
</ul>

</div>
</div>

</body>
</html>

The underlying HTML and CSS to create this page is displayed below.

    <html>
    <body>

    <h1 style="color: #FF0000"> You can render web pages inside Jupyter Notebook markdown cells</h1>
    <h2 style="text-align: center">And use most tags, like this sub-heading</h2>
    <p style="font: italic small-caps bold 16px/2 cursive">This is a lenghty paragraph of text. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>

    <p>This is another paragraph of text with a text entry box following it.</p>

    <input type="text" required minlength="4" maxlength="8" size="10">


    <div class="alert alert-warning">
    <h2 style="text-align: center">This is another sub-heading</h2>
    <p>And one paragraph with some <del>mistakes</del> <ins>corrections</ins> and <em>emphasized</em> words. And once you understand how to use this in practice, I hope this gif is you:</p>
    <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2Fw370Abr1SDvalxR6vs%2F200.gif%3Fcid%3D790b7611jnk3lv2fd4qvmfhlqraqr55mo1gmgzy1bg1cx7sm%26rid%3D200.gif&f=1&nofb=1" width="200" height="200">

    <div class="alert alert-success">
    <p>Now for one final pragraph introducing a set of adjectives that might be applicable to how you feel:</p>
    <ul>
        <li>Elated</li>
        <li>Enthralled</li>
        <li>Flabbergasted</li>
    </ul>

    </div>
    </div>

    </body>
    </html>

In this case, we only used inline styling meaning we defined styling attributes within the individual tags. In the wild, you might are more likely to find styling that is set outside of the `<body>` section of the page, such as explicitly in the `<head>` section or implicitely by refering to a separate file with the styling instructions. In such cases, the tags generally use `id` and `class` attributes to link the page content to styles.

## 3. JSON

The final topic we covered in the lecture today was JavaScript Object Notation, a non-tabular data format that is ubiquitous online as it is flexible in terms of data it can contain and easy to work with in many languages. The Python container most akin to JSON data files are dictionaries, and in fact Python transforms JSON files into dictionaries when we are loading them into our session.


### Loading and Working with JSON Data
To load a JSON data file, we open it like a regular text file and then use the `json` function `load` to assign decode (or translate it to a Python representation of the data) which we can then assign to a file. Let's try this with the file "bail-etal-2020.json", which you can find on Absalon. _Note:_ Make sure the file is in the same folder as your Jupyter Notebook.

In [1]:
# Load module to work with json files and data
import json

# Read in json file as plain text
bail_json = open("bail-etal-2020.json", "r")

# Decode json file using `load` function
bail_dict = json.load(bail_json)

Now, you don't need to take my word for it, so let's see what the type of the object `bail_dict` is, which we just created.

In [2]:
type(bail_dict)

dict

Given our familiarity with dictionaries, we now also know how to get an overview of the attributes in the json file. Let's look at those.

In [3]:
bail_dict.keys()

dict_keys(['id', 'identifier', 'persistentUrl', 'protocol', 'authority', 'publisher', 'publicationDate', 'storageIdentifier', 'datasetVersion'])

But, as you might have expected, this file has quite a bit of nested data in it. So let's dig one level deeper.

In [4]:
bail_dict["datasetVersion"].keys()

dict_keys(['id', 'datasetId', 'datasetPersistentId', 'storageIdentifier', 'versionNumber', 'versionMinorNumber', 'versionState', 'UNF', 'lastUpdateTime', 'releaseTime', 'createTime', 'license', 'termsOfUse', 'fileAccessRequest', 'metadataBlocks', 'files', 'citation'])

As we have learned, json files can contain unordered name-value pairs and list-like arrays. To dig all the way to some leaves of our dictionary, we also encounter these.

In [5]:
bail_dict["datasetVersion"]["files"][1]["dataFile"]["filename"]

'Troll Influence Study.Rdata'

If you've now become interested in where this json file came from, you can find the source [here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UATZBA).

### Creating JSON Data

At some point when you collect data, you might also want to share it with others. Json files are a convenient way of doing this and with our knowledge of dictionaries and the functionality from the `json` module, we have all we need.

Let's define a dictionary and encode it as json using the `dump` function.

In [None]:
# Define a tasty dictionary
potato_tacos = {"potatoes":{"amount":1000,"unit":"g","at_home":False},
                "tacos":{"amount":6,"unit":"pieces","at_home":True},
                "spices":{"amount":25,"unit":"g","at_home":True}
               }

# Use dump function to encode it as json and save to a new file
with open("recipe_shopping_list.json", "w") as json_file:
    json.dump(potato_tacos, json_file)