Skip to content
This repository has been archived by the owner. It is now read-only.
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

AU Washington, DC Datathon Notebook

Jupyter notebook to assist in creating additional analysis and visualizations of Archives Unleashed Cloud derivatives at the Archives Unleashed Washington, DC Datathon.

The datathon notebook is derived from Archives Unleashed Cloud: Jupyter Notebook.

Getting Started

  • Shell into your assigned datathon VM with the provided key, and IP address:
ssh -i ~/.ssh/archives-hackathon.key ubuntu@206.167.180.xx.xx
$ ssh -i ~/.ssh/archives-hackathon.key ubuntu@

Welcome to Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-46-generic x86_64)

 * Documentation:
 * Management:
 * Support:

  System information as of Wed Mar 13 22:07:38 UTC 2019

  System load:  0.03               Processes:           231
  Usage of /:   38.6% of 19.21GB   Users logged in:     1
  Memory usage: 5%                 IP address for ens3:
  Swap usage:   0%

  => There are 12 zombie processes.

 * Ubuntu's Kubernetes 1.14 distributions can bypass Docker and use containerd
   directly, see or try it now with

     snap install microk8s --channel=1.14/beta --classic

  Get cloud support with Ubuntu Advantage Cloud Guest:

 * Canonical Livepatch is available for installation.
   - Reduce system reboots and improve kernel security. Activate at:

0 packages can be updated.
0 updates are security updates.

Last login: Wed Mar 13 14:01:02 2019 from
  • Change directory to dc-datathon-notebook:
cd dc-datathon-notebook
ubuntu@datathon-1:~$ cd dc-datathon-notebook
  • Get the local IP address (192.168.x.x) of the VM with ifconfig:
ubuntu@datathon-1:~/dc-datathon-notebook$ ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet  netmask  broadcast
        inet6 fe80::f816:3eff:fe8b:dcc9  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:8b:dc:c9  txqueuelen 1000  (Ethernet)
        RX packets 76527718  bytes 165982902295 (165.9 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 52581415  bytes 3570965770 (3.5 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet  netmask
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 3223  bytes 14686658 (14.6 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3223  bytes 14686658 (14.6 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  • Start-up the notebook with jupyter-notebook and the IP address you found above:
jupyter-notebook --ip=192.168.xx.xx
ubuntu@datathon-1:~$ jupyter-notebook --ip=
[I 21:02:35.037 NotebookApp] JupyterLab extension loaded from /home/ubuntu/anaconda3/lib/python3.7/site-packages/jupyterlab
[I 21:02:35.037 NotebookApp] JupyterLab application directory is /home/ubuntu/anaconda3/share/jupyter/lab
[I 21:02:35.039 NotebookApp] Serving notebooks from local directory: /home/ubuntu/dc-datathon-notebook
[I 21:02:35.039 NotebookApp] The Jupyter Notebook is running at:
[I 21:02:35.039 NotebookApp]
[I 21:02:35.039 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 21:02:35.044 NotebookApp] No web browser found: could not locate runnable browser.
[C 21:02:35.045 NotebookApp] 
    To access the notebook, open this file in a browser:
    Or copy and paste one of these URLs:
  • In your browser, navigate to the IP address you shelled into 206.167.x.x + :8888/?token=WHAT-EVER-THE-TOKEN-IS-ABOVE:

If you need help or run into any problems getting setup, @ mention ruebot in Slack.

Types of Visualizations

There are several types of visualizations that you can produce in the Jupyter Notebook. A total of 14 outputs can be generated.

  • Domain Analysis: Provides information about what has been crawled (e.g. which domains) and how often.
  • Text Analysis: Highlights the frequency of words through various filters including domain and year.
  • Sentiment Analysis: Visualizes sentiment scores by domain and year.
  • Network Analysis: Shows the connections and relationship among websites through network graph layouts.


This application is available as open source under the terms of the Apache License, Version 2.0.


This work is primarily supported by the Andrew W. Mellon Foundation. Any opinions, findings, and conclusions or recommendations expressed are those of the researchers and do not necessarily reflect the views of the sponsors.


No description, website, or topics provided.




No releases published


No packages published