Welcome to a demo of Torchlite in a Jupyter Notebook!

In this demo, we're going to show how to use Torchlite to access extracted features from an HTRC workset.  We're going to use torchlite's htrc-extracted-features library to do this in Python.  For this demo, we've already downloaded the htrc-extracted-features library into the Jupyter kernel; here in the Notebook, all we have to do is import it.

In [1]:
from htrc.torchlite.ef.workset import WorkSet

The torchlite dashboard includes a widget for selecting a workset from a pre-built list of worksets, but we're not showing that here. For the demo, we'll use one of these: a small one containing four volumes.  

In [2]:
my_worksetid = '63f7ae452500006404fc54c7'

This demo assumes you are somewhat familiar with basic Python, and that you know Python is an object-oriented language.  Let's create a WorkSet object that is associated with our demo workset.

In [3]:
ws = WorkSet(my_worksetid)

Our WorkSet object gives us access to many of the workset properties in the Extracted Features dataset.  For example, we can see what volumes the workset contains:

In [4]:
ws.volumes

[Volume(mdp.35112103187797),
 Volume(mdp.39015058744122),
 Volume(uc1.$b684263),
 Volume(uc1.32106011187561)]

The Volumes, too, are Python objects whose properties we can access:

In [5]:
[v.title for v in ws.volumes]

['The Law times reports',
 'Highlights of ...',
 'The cruise of the Marchesa to Kamschatka & New Guinea.',
 'Bilder vom Erzählen : Gedichte /']

In [6]:
[v.pub_date for v in ws.volumes]

[1947, 2001, 1889, 2001]

Torchlite's Widgets use these WorkSet objects to access the workset's data and manipulate it for analysis.  A torchlite Widget has two parts: the back end, which applies algorithms and filters to the raw workset data to produce a calculation, and a front-end which takes the results of the calculation and displays a visualization of it in the dashboard.  Torchlite will ship with two widgets: a "Timeline Widget," which displays the publication date of each of the workset's volumes on a graphical timeline; and a "Map Widget," which maps place of birth for persons named in each volume's contributor field on a global map. 

Let's use the Timeline Widget to calculate the publication dates of each volume in our workset.

In [8]:
from backend.widgets import TimelineWidget

In [19]:
my_widget = TimelineWidget()
my_widget.workset = ws
print(my_widget.data)

[{'title': 'The Law times reports', 'pub_date': 1947}, {'title': 'Highlights of ...', 'pub_date': 2001}, {'title': 'The cruise of the Marchesa to Kamschatka & New Guinea.', 'pub_date': 1889}, {'title': 'Bilder vom Erzählen : Gedichte /', 'pub_date': 2001}]


The widget's back-end component generates a JSON object containing all the data the front-end component needs to display a timeline.  The front-end of the widget doesn't have to know about worksets, or volume metadata properties: all it knows is how to plot dates and labels.

This two-part separation corresponds to something fundamental in data science: the visualization is not the data.  Visualizations are simply graphical presentations of data, whose purpose is to make the data easier for people to understand.  That JSON object is not terribly perspicuous; it isn't ordered chronologically, for one thing, and if there were more than a few volumes in the workset, the JSON syntax would quickly get in the way of a human being's "reading" the data.  Visualizations are so useful and powerful because they create an interpretation of the data that is easier for humans to read.

A visualization is not the data; it is a reading of the data.