Skip to content
Distant Viewing Toolkit for the Analysis of Visual Culture
Python HTML JavaScript Makefile TeX Shell CSS
Branch: master
Clone or download
statsmaths create abstract class for pipelines
Rewrite the command line interface to support pipelines with
a fixed structure, and to support different commandline args
from each pipeline
Latest commit 86f7780 Aug 19, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github command line interface and unit tests Aug 3, 2019
docs-source create abstract class for pipelines Aug 19, 2019
docs create abstract class for pipelines Aug 19, 2019
dvt create abstract class for pipelines Aug 19, 2019
tests create abstract class for pipelines Aug 19, 2019
.gitignore Update .gitignore Aug 2, 2019
.pylintrc updating coding style for consistency Aug 5, 2019
.travis.yml fixing build issue Aug 5, 2019
LICENSE include visualization metadata in the package Aug 18, 2019 include visualization metadata in the package Aug 18, 2019

Distant Viewing Toolkit for the Analysis of Visual Culture

Build Status codecov License: GPL v2 PyPI pyversions PyPI version PyPI status

The Distant TV Toolkit is a Python package designed to facilitate the computational analysis of visual culture. It contains low-level architecture for applying state-of-the-art computer vision algorithms to still and moving images. The higher-level functionality of the toolkit allows users to quickly extract semantic metadata from digitized collections. Extracted information can be visualized for search and discovery or aggregated and analyzed to find patterns across a corpus.

More information about the toolkit and project is available on the following pages:

If you have any trouble using the toolkit, please open a GitHub issue. If you have further questions or are interested in collaborating, please contact us at and

NEH The Distant Viewing Toolkit is supported by the National Endowment for the Humanities through a Digital Humanities Advancement Grant.


The distant viewing toolkit has been built and tested using Python 3.7. We suggest installing the Anaconda Distribution. The package can then be installed through PyPI:

pip install dvt

Additional Python requirements should be automatically installed automatically through PyPI.

Minimal Demo

The following code assumes that you have installed the dvt toolkit and have the video file video-clip.mp4 in your working directory. Run the following command to run the default pipeline of annotators from the distant viewing toolkit:

python3 -m dvt video-clip.mp4

This may take several minutes to complete. Some minimal logging information should display the annotators progress in your terminal. Once finished, you should have a new directory dvt-output-data that contains extracted metadata and frames from the source material. You can view the extracted information by starting a local http server:

python3 -m http.server --directory dvt-output-data

And opening the following:

You can repeat the same process with your own video inputs, though keep in mind that it may take some time (often several times the length of the input video file) to finish. You can see an example of the toolkit's output on several video files here.

Getting started with the Python API

The command line tools provide a fast way to get started with the toolkit but there is much more functionality available when using the full Python API provided by the module.

Using the distant viewing toolkit starts by constructing a DataExtraction object that is associated with some input data (either a video file or a collection of still images). Algorithms are then applied to the DataExtraction; the results are stored as Pandas DataFrames and can be exported as CSV or JSON files. There are two distinct types of algorithms:

  • annotators: algorithms that work directly with the visual data source but are able to only work with a small subset of frames or still images
  • aggregators: algorithms that have access to information extracted from previously run annotators across across the entire input, but cannot direclty access the visual data

The separation of algorithms into these two parts makes it easier to write straightforward, error-free code. It closely mirrors the theory of theory of distant viewing:

Distant viewing is distinguished from other approaches by making explicit the interpretive nature of extracting semantic metadata from images. In other words, one must 'view' visual materials before studying them. Viewing, which we define as an interpretive action taken by either a person or a model, is necessitated by the way in which information is transferred in visual materials. Therefore, in order to view images computationally, a representation of elements contained within the visual material—a code system in semiotics or, similarly, a metadata schema in informatics—must be constructed. Algorithms capable of automatically converting raw images into the established representation are then needed to apply the approach at scale.

The annotator algorithms conduct the process of 'viewing' the material whereas the aggregator algorithms perform a 'distant' (e.g., separated from the raw materials) analysis of the visual inputs.

Here is an example showing the usage of these elements to detect shot breaks it a video input. We start by running an annotator that detects the differences between subsequent shots and then apply the cut aggregator to determine where the changes indicate a pattern consistent with a shot break. As in the Minimal Demo, the code assumes that the video file video-clip.mp4 is in your working directory:

from dvt.core import DataExtraction, FrameInput
from dvt.annotate.diff import DiffAnnotator
from dvt.aggregate.cut import CutAggregator

dextra = DataExtraction(FrameInput(input_path="video-clip.mp4"))
dextra.run_aggregator(CutAggregator(cut_vals={'q40': 3}))

Looking at the output data, we see that there are four detected shots in the video file:

frame_start  frame_end
0            0         74
1           75        154
2          155        299
3          300        511

There are many annotators and aggregators currently available in the toolkit. Pipelines, pre-bundled sequences of annotators and aggregators, are also included in the package. Currently available implementations in the toolkit are:

Annotators Aggregators Pipelines
CIElabAnnotator CutAggregator VideoPipeline
DiffAnnotator DisplayAggregator
EmbedAnnotator ShotLengthAggregator
FaceAnnotator PeopleAggregator

Details of these implementations can be found in the full API documentation. Additionally, it is possible to construct your own Annotator and Aggregator objects. Details are available in this tutorial. If you develop an object that you think may be useful to others, consider contributing your code to the toolkit.


If you make use of the toolkit in your work, please cite the relevant papers describing the tool and its application to the study of visual culture:

  title   = "Distant Viewing: Analyzing Large Visual Corpora",
  author  = "Arnold, Taylor B and Tilton, Lauren",
  journal = "Digital Scholarship in the Humanities",
  year    = "2019",
  doi     = "10.1093/digitalsh/fqz013",
  url     = ""
  title   = "Visual Style in Two Network Era Sitcoms",
  author  = "Arnold, Taylor B and Tilton, Lauren and Berke, Annie",
  journal = "Cultural Analytics",
  year    = "2019",
  doi     = "10.22148/16.043",
  url     = ""


Contributions, including bug fixes and new features, to the toolkit are welcome. When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change. Small bug fixes can be given directly as pull requests.

Please note that the project has a code of conduct. Contributors are expected to follow the guidelines for all interactions with the project.

You can’t perform that action at this time.