## `import` statements

This section loads ancillary code that isn't part of base Python.

In [1]:
# code related to DICES
from dicesapi import DicesAPI
from dicesapi.text import CtsAPI
from dicesapi.jupyter import NotebookPBar

# science and graphing tools
import pandas as pd
from matplotlib import pyplot as plt

# generic utilities
import re
import os

## Create connections to external data sources

This section instantiates two important "objects" and saves them to variables for later use. One, `api` is a connection to the DICES database. We'll use this to download speech data. The other, `cts`, is a connection to the Perseus Digital Library. It will be used to download the actual text of the speeches once we know their beginning and ending loci.

In [2]:
# connection to DICES
api = DicesAPI(
    logfile = 'dices.log',
    progress_class = NotebookPBar,
)

# connection to Perseus
cts = CtsAPI(
    dices_api = api,
)

## Download all the speeches

Here we download all the speeches from DICES using a single command. The resulting collection of data (we call it a SpeechGroup) is saved to a variable called `speeches`.

In [3]:
speeches = api.getSpeeches(progress=True)

HBox(children=(IntProgress(value=0, bar_style='info', max=4691), Label(value='0/4691')))

## Select only the Latin speeches

For now, let's look just at the Latin speeches. We can select a subset of `speeches` by using the `advancedFilter` method. This command takes as its argument a simple function definition. That function is then run on every one of the speeches in the SpeechGroup: any speeches for which the function returns `True` are selected; those for which it returns `False` are left behind.

The function definition is created by the `lambda` keyword -- don't worry too much about the details, but basically the function we're creating here just returns `True` if the speech's `lang` tag is set to `'latin'` and `False` otherwise.

In [None]:
latin_speeches = speeches.advancedFilter(lambda s: s.)