# Introduction to the Python DICES client

In this first workshop, we’re going to look at how to retrieve and work with DICES data inside a Python script. While the web interface can be helpful for browsing and exploring the data, more complicated tasks are better suited to a script. 

For example:
- when your search has several steps, and you want to make sure they're done in a specific order
- when you want to repeat an operation many times and collate the results
- when you have to connect information from different sources, like DICES, Perseus, MANTO, etc.

## The DICES API

You could, if you wanted, examine the machine-oriented version of the database manually using your web-browser. A separate set of URLs provides access to the same data, but without the human-friendly tables, drop-downs and buttons. For example, compare the two pages below. Both represent the same search, for speeches by Jason.

- for humans: http://dices.ub.uni-rostock.de/app/speeches?spkr_name=Jason
- for machines: http://dices.ub.uni-rostock.de/api/speeches?spkr_name=Jason

The machine-actionable API is provided by Django Web Framework. If you’re interested in working with the API directly or have questions or suggestions about its implementation, please feel free to let us know!


## The DICES client

Most of the time, working with URLs like the one above and parsing the JSON responses from the server isn’t something you want to have to deal with. The Python DICES client provides a wrapper around the API that lets you make requests and manipulate the results using Python objects.

### Installation

The DICES client package lives in a [GitHub repository](https://github.com/cwf2/dices-client). **If you're running this notebook with Binder, then it's already installed.** Otherwise, you’ll have to install it. Here’s how to do that from within a Jupyter Notebook on a Mac or Linux computer:</p>

<div style="margin:1em">
    <code>!pip install git+https://github.com/cwf2/dices-client</code>
</div>


### The database connection

The client provides a class, **DicesAPI**, which allows you to manage your connection to the database. This is how you request data; it also lets you specify a custom server, in case you’re running your own mirror of the database. The first part of this tutorial will cover searching the database and manipulating the results.

### Records as objects

The client provides some basic class definitions that represent the records in the database as objects: speeches, characters, works, etc. Each class comes with properties and methods to help make common tasks straightforward, for example, filtering a collection of speeches based on speaker or language.

We’ll look at each of the classes in turn below and cover their specific properties and methods in detail.

### Additional modules

Beyond the basic mechanisms for querying the database and working with DICES records, we’re continuing to build out a suite of ancillary tools for specialized tasks, often associated with external linked data. For example, the `manto` module provides some basic methods for linking speech data to the MANTO database, while `text` provides shortcuts for downloading the text of the speeches from Perseus and processing them with CLTK. 

For now, the selection is small, and centred on the tasks that we wanted to do ourselves. We’re very interested to hear from you about potential additions or improvements to existing modules, or suggestions for entire new modules.

## Thanks for beta testing!

All of these tools are under active development. We expect to discover bugs and inconsistencies as the user base expands, and we’re very grateful to you for helping us in this regard! Our goal is to support your research, and in the process, to make sure that future scholars are able to replicate and expand upon your work.

# Getting started


## Loading the DICES client

In every script that works with DICES, you’ll have to **import** the `DicesAPI` class so you can create a connection to the database. Here, we also import the optional `NotebookPBar`, which lets us draw progress bars in Jupyter.

In [1]:
# necessary
from dicesapi import DicesAPI

# optional
from dicesapi.jupyter import NotebookPBar

## Creating a connection to the database

Next, create an instance of `DicesAPI`. This is our connection to the database, letting us request data from the server. In this step you can also specify session settings including a custom server URL. I’m specifying a local log file for debugging messages and providing a link to the progress bar class.

In [2]:
api = DicesAPI(
    logfile = 'dices.log',
    progress_class = NotebookPBar,
)

## Basic searches

Now that we’ve created an instance of the DicesAPI class and assigned it to `api`, this becomes our access point to search functionality.

### Works

The `getWorks()` method returns a set of `Work` objects matching the specified criteria.

**Example**

This returns all works by authors named "Homer":

In [3]:
works = api.getWorks(author_name='Homer')

for w in works:
    print(w)

<Work 1: Iliad>
<Work 2: Odyssey>


### Authors

The `getAuthors()` method returns a set of `Author` objects.

**Example**

This returns all authors named "Virgil":

In [4]:
authors = api.getAuthors(name='Virgil')

for auth in authors:
    print(auth)

<Author 4: Virgil>


### Characters

The `getCharacters()` method returns a set of `Character` objects. According to the DICES model, a **Character** represents core attributes of a person, while **Character Instances** are used to represent that person’s manifestations in various contexts.

**Example**

This returns all characters who are labelled as divine female collectives:

In [5]:
chars = api.getCharacters(gender='female', number='collective', being='divine')

for c in chars:
    print(c)

<Character 399: Fates>
<Character 410: Furies>
<Character 489: Horae>
<Character 699: Naiads>


### Character Instances

The `getInstances()` method returns a set of `CharacterInstance` objects. Each represents a case of that character with a particular context and attributes. For any given character, there should be at least one character instance per text.

**Example**

This returns all instances of an underlying character called "Hera". Depending on the language of the text in which she occurs, the name of the instance may change, but the name of the character is constant.

In [6]:
instances = api.getInstances(char_name='Hera')

for inst in instances:
    print(inst)

<CharacterInstance 15: Hera>
<CharacterInstance 264: Hera>
<CharacterInstance 1315: Hera>
<CharacterInstance 1581: Hera>
<CharacterInstance 1676: Hera>
<CharacterInstance 1712: Hera>
<CharacterInstance 1759: Hera>
<CharacterInstance 287: Juno>
<CharacterInstance 463: Juno>
<CharacterInstance 870: Juno>
<CharacterInstance 945: Juno>
<CharacterInstance 1162: Juno>


### Speeches

The `getSpeeches()` method returns a set of `Speech` objects. This is the interface that we use most frequently, and is therefore most developed.

**Example**

This returns all speeches addressed to characters named "Hypsipyle":

In [7]:
speeches = api.getSpeeches(addr_name='Hypsipyle')

for s in speeches:
    print(s)

<Speech 1403: Argonautica 1.836-1.841>
<Speech 1406: Argonautica 1.900-1.909>
<Speech 3079: Thebaid 4.753-4.771>
<Speech 3084: Thebaid 5.20-5.27>
<Speech 3086: Thebaid 5.43-5.47>
<Speech 3092: Thebaid 5.271-5.284>


If the search is long, you can add a progress bar:

In [8]:
speeches = api.getSpeeches(work_title='Odyssey', progress=True)

HBox(children=(IntProgress(value=0, bar_style='info', max=673), Label(value='0/673')))

### Speech Clusters

The `getClusters()` method returns a set of `SpeechCluster` objects.

**Example**

This returns all speech clusters in works called "Theogony":

In [9]:
clusters = api.getClusters(work_title='Theogony')

for cl in clusters:
    print(cl)

<SpeechCluster 15001: Theogony 26 ff.>
<SpeechCluster 15002: Theogony 164 ff.>
<SpeechCluster 15003: Theogony 543 ff.>
<SpeechCluster 15004: Theogony 644 ff.>


### What can I search for?

The list of arguments currently available for each search method can be found in the [Reference](Reference.ipynb) notebook in this repository. This is an area of active development, so please let us know if there's something you'd like to be able to search for and can't.

# DICES records as objects

The results of any one of these searches will be a set of zero or more records matching your criteria. In Python, the client API represents these records are as custom object classes. Each entity in the datase has a class that represents individual records, and a second class that represents an iterable collection of records, all of the same type. The collection classes all inherit from a generic `DataGroup` class.

| entity | single record | list of records |
| --- | --- | --- |
| author | `Author` | `AuthorGroup` |
| work | `Work` | `WorkGroup` |
| character | `Character` | `CharacterGroup` |
| character instance | `CharacterInstance` | `CharacterInstanceGroup` |
| speech | `Speech` | `SpeechGroup` |
| speech cluster | `SpeechCluster` | `SpeechClusterGroup` |

## Object properties

Each class of objects has attributes that you can access. Generally, these are of three kinds:
 - data from the underlying record, e.g. an author’s name.
 - methods that can be performed on the object, e.g., generate a URN for a speech
 - access to related objects, e.g., the set of speeches associated with a given cluster

## Objects representing records


### Author

#### Properties

- `id`: a unique identifier for the author
- `name`: the author’s name
- `wd`: a WikiData ID for the author, if we have it
- `urn`: a CITE-complient URN for the author, if we have it

#### Examples

In [10]:
# get some author records
authors = api.getAuthors(work_title='Argonautica')

# print the name of each
for author in authors:
    print(author.name)

Apollonius
Valerius Flaccus


### Work

#### Properties

- `id`: a unique identifier for the work
- `title`: the work’s title
- `wd`: a WikiData ID for the work, if we have it
- `urn`: a CTS URN for the work, if we have it
- `author`: link to the `Author` object associated with the work
- `lang`: the work’s language—one of `'greek'` or `'latin'`

#### Examples

In [11]:
# get some author records
works = api.getWorks(lang='latin')

# print the author name and title of each
for work in works:
    print(work.author.name + ', ' + work.title)

Claudian, De bello Gildonico
Claudian, De Bello Gothico
Claudian, De consulatu Stilichonis
Claudian, De Raptu Proserpinae
Claudian, Epithalamium de Nuptiis Honorii Augusti
Claudian, In Eutriopium
Claudian, In Rufinum
Claudian, Panegyricus de consulatu Manlii Theodori
Claudian, Panegyricus de Quarto Consulatu Honorii Augusti
Claudian, Panegyricus de Sexto Consulatu Honorii Augusti
Claudian, Panegyricus de Tertio Consulatu Honorii Augusti
Claudian, Panegyricus Probino et Olybrio
Lucan, Civil War
Ovid, Metamorphoses
Prudentius, Psychomachia
Silius, Punica
Statius, Achilleid
Statius, Thebaid
Valerius Flaccus, Argonautica
Virgil, Aeneid


### Character

Character records are supposed to represent the (more or less) constant or transcendent aspects of a person, as opposed to ephemeral attributes which change with context. In practice, this is a pretty subjective determination.

#### Properties
- `id`: a unique identifier for the character
- `name`: the character’s name
- `being`: one of (`'divine'`, `'mortal'`, `'creature'`, `'other'`) [see note 1]
- `number`: one of (`'singular'`, `'collective'`)
- `gender`: one of (`'male'`, `'female'`, `'x'`, `'none'`) [see note 2]
- `wd`: a WikiData ID for the character, if we have one
- `manto`: a MANTO ID for the character, if we have one

#### Notes

1. While humans, monsters, and the Olympian gods are usually straightforward to classify, miscellaneous nymphs and offspring of minor deities can be ambiguous. If you feel that a character is misclassified you find an inconsistency in the scheme, please don't hesitate to let us know.

2. The gender `'x'` is used for mixed-gender collectives and characters classed as non-binary, while `'none'` is used for characters where gender is not applicable, generally inanimate objects. If gender is your specialty and you have alternative schemes that might be more useful, please let us know.

#### Examples

In [12]:
# women who speak second in the odyssey
characters = api.getCharacters(
    work_title='Odyssey', 
    being='mortal', 
    gender='female', 
    speech_part=2)

# print the name of each
for char in characters:
    print(char.name)

Euryclea
Eurynome
Helena
Melantho
Nausicaa
Penelope


### Character Instance

Character instance records represent a given character's properties in a specific context.

#### Properties

- `id`: a unique identifier for the character instance
- `context`: a description of the context in which the instance occurs, defaults to work title
- `name`: the name under which the character instance appears in this context [see note 1]
- `char`: access to the `Character` of which this is an instance. [2]
- `being`: one of (`'divine'`, `'mortal'`, `'creature'`, `'other'`) [1]
- `number`: one of (`'singular'`, `'collective'`) [1]
- `gender`: one of (`'male'`, `'female'`, `'x'`, `'none'`) [1]
- `wd`: the WikiData ID of the underlying characer, if there is one [3]
- `manto`: the MANTO ID of the underlying characer, if there is one [3]

#### Notes

1. The `name`, `being`, `number`, and `gender` properties of an instance may not be the same as those of the underlying character. For example, a character instance may have the name 'Jupiter' while its character has the name 'Zeus'.

2. Some character instance records have no associated character. This is the case for a couple of classes of anonymous speakers/addressees. If there is no character, then `char` will be `None`.

3. WikiData and MANTO attributes pass through to the underlying character. These will be `None` if there is no character or if the character lacks these attributes.

#### Examples

In [13]:
# all instances of the god of war
instances = api.getInstances(char_name='Ares')

# print the WikiData ID for each (should be same for all),
#    plus the name and the context
for inst in instances:
    print(inst.wd, inst.name, inst.context, sep="\t")

Q40901	Ares	Dionysiaca
Q40901	Ares	Batrachomyomachia
Q40901	Ares	Odyssey
Q40901	Ares	Iliad
Q40901	Mars	Punica
Q40901	Mars	Thebaid
Q40901	Mars	In Eutriopium
Q40901	Mars	In Rufinum
Q40901	Mars	Aeneid
Q40901	Mars	Metamorphoses
Q40901	Mars	Argonautica


### Speech

#### Properties

- `id`: a unique identifier for the speech
- `cluster`: access to the `SpeechCluster` object to which this speech belongs
- `seq`: an integer that can be used for ordering all the speeches in a given work
- `l_fi`: the locus of the passage's first line, as a string
- `l_la`: the locus of the passage's last line, as a string
- `l_range`: the range of loci covered by the passage; equivalent to joining `l_fi`, `l_la` with a `'-'`
- `spkr`: a list of `CharacterInstance` objects representing the speaker(s)
- `addr`: a list of `CharacterInstance` objects representing the addressee(s)
- `part`: which turn this speech fills in the conversation, as an integer
- `type`: one of (`'soliloquy'`, `'monologue'`, `'dialogue'`, `'general'`)
- `work`: access to the `Work` object associated with this speech
- `author`: access to the `Author` object associated with this speech
- `lang`: one of (`'greek'`, `'latin'`)
- `urn`: the CTS URN representing the passage

#### Methods

- `getSpkrString()`: returns name(s) of speaker(s) as a single string (separated by commas if multiple)
- `getAddrString()`: returns name(s) of addressee(s) as a single string (separated by commas if multiple)

#### Examples

In [14]:
# all the speeches in the Iliad where Aphrodite is addressed
speeches = api.getSpeeches(addr_name='Aphrodite', work_title='Iliad')

# print the full locus for each speech, 
#     with names of the speaker(s) and addressee(s)
for s in speeches:
    print(s.work.title, s.l_range, s.getSpkrString(), s.getAddrString(), sep='\t')

Iliad	3.399-3.412	Helena	Aphrodite
Iliad	5.348-5.351	Diomedes	Aphrodite
Iliad	5.373-5.374	Dione	Aphrodite
Iliad	5.382-5.415	Dione	Aphrodite
Iliad	5.428-5.430	Zeus	Aphrodite
Iliad	14.190-14.192	Hera	Aphrodite
Iliad	14.198-14.210	Hera	Aphrodite
Iliad	21.428-21.433	Athena	Aphrodite, Ares


### Speech Cluster

A speech cluster represents a conversation. Speech cluster objects don't *contain* speeches (for which see `SpeechGroup` below); rather, they provide data about the conversation as a higher-level object.

#### Properties

- `id`: a unique identifier for the speech cluster

#### Methods

- `getSpeeches()`: Returns all speeches in this cluster as a `SpeechGroup`
- `getFirstSpeech()`: Returns only the first speech, as a `Speech`

#### Notes

1. `getSpeeches()` and `getFirstSpeech()` each perform a new API query in the background

#### Examples


In [15]:
# all conversations in the Aeneid in which Ascanius gets to speak
clusters = api.getClusters(spkr_name='Ascanius')

# list all speeches in each conversation,
#     giving lines and speakers of each
for cl in clusters:
    print(cl)
    for s in cl.getSpeeches():
        print(s.l_range, s.getSpkrString(), sep="\t")
    print()

<SpeechCluster 4049: Aeneid 5.670 ff.>
5.670-5.673	Ascanius

<SpeechCluster 4072: Aeneid 7.116 ff.>
7.116-7.116	Ascanius
7.120-7.134	Aeneas

<SpeechCluster 4102: Aeneid 9.234 ff.>
9.234-9.245	Nisus (Aeneid)
9.247-9.250	Aletes (Trojan)
9.252-9.256	Aletes (Trojan)
9.257-9.280	Ascanius
9.281-9.292	Euryalus (Aeneid)
9.296-9.302	Ascanius

<SpeechCluster 4112: Aeneid 9.598 ff.>
9.598-9.620	Numanus
9.625-9.629	Ascanius
9.634-9.635	Ascanius



## Objects representing collections of records

All of these inherit from the parent class `DataGroup`. They're mostly intended to be iterated over, but each has specific `filter*` methods to extract a subset of the member objects based on their properties, and a set of `get*` methods to extract specific properties from the member objects.

### Examples

#### Iterating over a DataGroup:

In [16]:
# download two sets of speeches to thetis
group_a = api.getSpeeches(addr_name='Thetis', work_title='Iliad')
group_b = api.getSpeeches(addr_name='Thetis', work_title='Achilleid')

print(len(group_a), 'speeches in the Iliad:')
for s in group_a:
    print(s)
print()
    
print(len(group_b), 'speeches in the Achilleid')
for s in group_b:
    print(s)
print()

12 speeches in the Iliad:
<Speech 22: Iliad 1.352-1.356>
<Speech 24: Iliad 1.365-1.412>
<Speech 30: Iliad 1.518-1.527>
<Speech 484: Iliad 18.79-18.93>
<Speech 486: Iliad 18.98-18.126>
<Speech 499: Iliad 18.385-18.387>
<Speech 502: Iliad 18.424-18.427>
<Speech 504: Iliad 18.463-18.467>
<Speech 506: Iliad 19.21-19.27>
<Speech 655: Iliad 24.88-24.88>
<Speech 657: Iliad 24.104-24.119>
<Speech 659: Iliad 24.139-24.140>

5 speeches in the Achilleid
<Speech 3279: Achilleid 1.31-1.51>
<Speech 3281: Achilleid 1.80-1.94>
<Speech 3283: Achilleid 1.143-1.158>
<Speech 3291: Achilleid 1.526-1.535>
<Speech 3308: Achilleid 2.17-2.19>



#### Adding

You can use the `+` operator to combine two groups into a new one.
- Adding two groups together may alter the order of the members, so it's often a good idea to sort the results.

In [17]:
# combine the two speech groups
combined = group_a + group_b
print(len(combined), 'speeches combined:')
for s in combined:
    print(s)

17 speeches combined:
<Speech 504: Iliad 18.463-18.467>
<Speech 502: Iliad 18.424-18.427>
<Speech 657: Iliad 24.104-24.119>
<Speech 506: Iliad 19.21-19.27>
<Speech 484: Iliad 18.79-18.93>
<Speech 3281: Achilleid 1.80-1.94>
<Speech 3291: Achilleid 1.526-1.535>
<Speech 3308: Achilleid 2.17-2.19>
<Speech 30: Iliad 1.518-1.527>
<Speech 3279: Achilleid 1.31-1.51>
<Speech 3283: Achilleid 1.143-1.158>
<Speech 22: Iliad 1.352-1.356>
<Speech 486: Iliad 18.98-18.126>
<Speech 24: Iliad 1.365-1.412>
<Speech 655: Iliad 24.88-24.88>
<Speech 499: Iliad 18.385-18.387>
<Speech 659: Iliad 24.139-24.140>


#### Subtracting

In [18]:
# download all speeches addressed to Thetis in any work
group_c = api.getSpeeches(addr_name='Thetis')
print(len(group_c), 'speeches in the whole corpus', '\n')

# remove Iliad examples from the larger group
all_but_homer = group_c - group_a
print(len(all_but_homer), 'excluding the Iliad:')

# examine the results
for s in all_but_homer:
    print(s)

23 speeches in the whole corpus 

11 excluding the Iliad:
<Speech 1505: Argonautica 4.783-4.832>
<Speech 3461: Dionysiaca 23.284-23.319>
<Speech 3601: Dionysiaca 43.145-43.191>
<Speech 2243: Metamorphoses 11.221-11.223>
<Speech 4094: Posthomerica 3.633-3.654>
<Speech 4095: Posthomerica 3.770-3.780>
<Speech 3279: Achilleid 1.31-1.51>
<Speech 3281: Achilleid 1.80-1.94>
<Speech 3283: Achilleid 1.143-1.158>
<Speech 3291: Achilleid 1.526-1.535>
<Speech 3308: Achilleid 2.17-2.19>


#### Finding common members between two groups

In [19]:
# speeches with Achilles as speaker
group_a = api.getSpeeches(spkr_name='Achilles', work_title='Iliad')

# speeches with Thetis as addressee
group_b = api.getSpeeches(addr_name='Thetis', work_title='Iliad')

# speeches that belong to both groups
for s in group_a.intersect(group_b):
    print(s)

<Speech 22: Iliad 1.352-1.356>
<Speech 24: Iliad 1.365-1.412>
<Speech 484: Iliad 18.79-18.93>
<Speech 486: Iliad 18.98-18.126>
<Speech 506: Iliad 19.21-19.27>
<Speech 659: Iliad 24.139-24.140>


#### Sorting

You can use `sorted(DataGroup)` to produce a sorted *copy* of the group without changing the group itself; `DataGroup.sort()` on the other hand reorganizes the members of the group internally.

In [20]:
# same speeches we used the addition example
group_a = api.getSpeeches(addr_name='Thetis', work_title='Iliad')
group_b = api.getSpeeches(addr_name='Thetis', work_title='Achilleid')

# adding them jumbles the membership a bit
combined = group_a + group_b

print('Unsorted:')
for s in combined:
    print(s)
    
# sort to put them back in order by work, line
combined.sort()

print('\nSorted:')
for s in combined:
    print(s)

Unsorted:
<Speech 504: Iliad 18.463-18.467>
<Speech 502: Iliad 18.424-18.427>
<Speech 657: Iliad 24.104-24.119>
<Speech 506: Iliad 19.21-19.27>
<Speech 484: Iliad 18.79-18.93>
<Speech 3281: Achilleid 1.80-1.94>
<Speech 3291: Achilleid 1.526-1.535>
<Speech 3308: Achilleid 2.17-2.19>
<Speech 30: Iliad 1.518-1.527>
<Speech 3279: Achilleid 1.31-1.51>
<Speech 3283: Achilleid 1.143-1.158>
<Speech 22: Iliad 1.352-1.356>
<Speech 486: Iliad 18.98-18.126>
<Speech 24: Iliad 1.365-1.412>
<Speech 655: Iliad 24.88-24.88>
<Speech 499: Iliad 18.385-18.387>
<Speech 659: Iliad 24.139-24.140>

Sorted:
<Speech 22: Iliad 1.352-1.356>
<Speech 24: Iliad 1.365-1.412>
<Speech 30: Iliad 1.518-1.527>
<Speech 484: Iliad 18.79-18.93>
<Speech 486: Iliad 18.98-18.126>
<Speech 499: Iliad 18.385-18.387>
<Speech 502: Iliad 18.424-18.427>
<Speech 504: Iliad 18.463-18.467>
<Speech 506: Iliad 19.21-19.27>
<Speech 655: Iliad 24.88-24.88>
<Speech 657: Iliad 24.104-24.119>
<Speech 659: Iliad 24.139-24.140>
<Speech 3279: Achi

Each kind of record has a default sorting: authors and works sort alphabetically, speeches by work and first line, etc. You can also specify a custom sorting function.

For example, we can sort by speaker name:

In [21]:
for s in sorted(combined, key=lambda s: s.spkr[0]):
    print(f'{s.getSpkrString():12} {s}')

Achilles     <Speech 22: Iliad 1.352-1.356>
Achilles     <Speech 24: Iliad 1.365-1.412>
Achilles     <Speech 484: Iliad 18.79-18.93>
Achilles     <Speech 486: Iliad 18.98-18.126>
Achilles     <Speech 506: Iliad 19.21-19.27>
Achilles     <Speech 659: Iliad 24.139-24.140>
Achilles     <Speech 3308: Achilleid 2.17-2.19>
Calchas      <Speech 3291: Achilleid 1.526-1.535>
Charis       <Speech 499: Iliad 18.385-18.387>
Chiron       <Speech 3283: Achilleid 1.143-1.158>
Hephaestus   <Speech 502: Iliad 18.424-18.427>
Hephaestus   <Speech 504: Iliad 18.463-18.467>
Iris         <Speech 655: Iliad 24.88-24.88>
Neptune      <Speech 3281: Achilleid 1.80-1.94>
Thetis       <Speech 3279: Achilleid 1.31-1.51>
Zeus         <Speech 30: Iliad 1.518-1.527>
Zeus         <Speech 657: Iliad 24.104-24.119>


### Extracting member properties

One common task with DataGroups is extracting a single attribute for each member of the group.

#### Examples

In [22]:
# a WorkGroup
works = api.getWorks()

# extract the titles as a list
works.getTitles()

['Homeric Hymns',
 'Orphic Argonautica',
 'Argonautica',
 'Hymns',
 'De bello Gildonico',
 'De Bello Gothico',
 'De consulatu Stilichonis',
 'De Raptu Proserpinae',
 'Epithalamium de Nuptiis Honorii Augusti',
 'In Eutriopium',
 'In Rufinum',
 'Panegyricus de consulatu Manlii Theodori',
 'Panegyricus de Quarto Consulatu Honorii Augusti',
 'Panegyricus de Sexto Consulatu Honorii Augusti',
 'Panegyricus de Tertio Consulatu Honorii Augusti',
 'Panegyricus Probino et Olybrio',
 'Colluthus',
 'Homerocentones',
 'St. Cyprian',
 'Theogony',
 'Works and Days',
 'Iliad',
 'Odyssey',
 'Civil War',
 'Europa',
 'Hero and Leander',
 'Dionysiaca',
 'Paraphrase',
 'Cynegetica',
 'Halieutica',
 'Metamorphoses',
 'D. Sanctae Sophiae',
 'Psychomachia',
 'Batrachomyomachia',
 'Megara',
 'Posthomerica',
 'Punica',
 'Achilleid',
 'Thebaid',
 'Idylls',
 'Sack of Troy',
 'Argonautica',
 'Aeneid']

In [23]:
# a SpeechGroup
speeches = api.getSpeeches(spkr_name='Galatea')

# extract `l_fi` tags as a list
speeches.getL_fis()

['6.319', '13.740', '13.750']

#### When attributes are other objects

Sometimes the attribute you're extracting from the members of the group refers to other objects. In this case, you have the option of turning the results into a new DataGroup.

This returns a list with as many items as there are speeches in the group:

In [24]:
speeches.getAddrs()

[[<CharacterInstance 1347: Pan>],
 [<CharacterInstance 746: Scylla (monster)>],
 [<CharacterInstance 747: Polyphemus>,
  <CharacterInstance 746: Scylla (monster)>,
  <CharacterInstance 541: Venus>]]

This returns a CharacterInstanceGroup, with duplicate memebers removed.

In [25]:
for inst in speeches.getAddrs(flatten=True):
    print(inst.name)

Pan
Venus
Polyphemus
Scylla (monster)


### Filtering DataGroups

The other task you might do with a DataGroup is subset the member objects according to their attributes. Each kind of DataGroup has dedicated methods to filter by specific attributes. Each of these methods takes as its argument a list of values to match against. If you only want to match a single value, pass it as a list of one item.

#### Examples

In [26]:
# a SpeechGroup
speeches = api.getSpeeches(spkr_name='Achilles')
print(len(speeches), 'speeches by Achilles')

# filter by Type
subgroup = speeches.filterTypes(['S'])
print(len(subgroup), 'are soliloquies')

111 speeches by Achilles
13 are soliloquies


In [27]:
# women who speak in the Iliad
iliad_women = api.getCharacters(work_title='Iliad', gender='female')

# filter Achilles' speeches by addressee, matching against the list of women
subgroup = speeches.filterAddrs(iliad_women)

# examine results
for s in subgroup:
    print(s.work.title, s.l_range, s.getSpkrString(), s.getAddrString(), sep='\t')

Iliad	1.202-1.205	Achilles	Athena
Iliad	1.216-1.218	Achilles	Athena
Iliad	1.352-1.356	Achilles	Thetis
Iliad	1.365-1.412	Achilles	Thetis
Iliad	18.79-18.93	Achilles	Thetis
Iliad	18.98-18.126	Achilles	Thetis
Iliad	18.182-18.182	Achilles	Iris
Iliad	18.188-18.195	Achilles	Iris
Iliad	19.21-19.27	Achilles	Thetis
Iliad	24.139-24.140	Achilles	Thetis
Achilleid	2.17-2.19	Achilles	Thetis


### Custom filters

For more complex selection criteria, you can create a custom filter. The filter should be a user-defined function which takes an object of the class you're interested in as its first argument. Pass this function to the `advancedFilter()` method, and it will return a new DataGroup containing only members for which your function returns `True`.

#### Examples

Let's test for conversations in which the first turn is made by a woman and the second by a man.

In [28]:
def isFemaleRepliedByMale(cluster):
    '''Returns True if the first turn is by a woman and the second by a man.'''
    
    # collect speeches for this cluster
    speeches = cluster.getSpeeches()
    
    # make sure there are more than one turn
    if len(speeches) > 1:
        
        gender_first_turn = [inst.gender for inst in speeches[0].spkr]
        gender_second_turn = [inst.gender for inst in speeches[1].spkr]
        
        if 'female' in gender_first_turn and 'male' in gender_second_turn:
            return True

Now we can filter on our new function.

In [29]:
# a SpeechClusterGroup
clusters = api.getClusters(work_title='Metamorphoses')

In [30]:
# filter
#   - this goes a bit slowly, because it has to download all the speeches
subset = clusters.advancedFilter(isFemaleRepliedByMale)

print(len(subset), '/', len(clusters), 'clusters meet criteria:')

11 / 411 clusters meet criteria:


In [31]:
for cluster in subset:
    for s in cluster.getSpeeches():
        print(s.work.title, s.l_range, s.getSpkrString(), '->', s.getAddrString())
    print()

Metamorphoses 2.815-2.816 Aglaurus -> Mercury
Metamorphoses 2.817-2.817 Mercury -> Aglaurus

Metamorphoses 4.73-4.77 Pyramus, Thisbe -> Pyramus, Thisbe
Metamorphoses 4.79-4.79 Pyramus, Thisbe -> Pyramus, Thisbe

Metamorphoses 4.320-4.328 Salmacis -> Hermaphroditus
Metamorphoses 4.336-4.336 Hermaphroditus -> Salmacis
Metamorphoses 4.337-4.338 Salmacis -> Hermaphroditus

Metamorphoses 5.514-5.522 Ceres -> Jupiter
Metamorphoses 5.523-5.532 Jupiter -> Ceres

Metamorphoses 6.206-6.213 Latona -> Apollo, Diana
Metamorphoses 6.215-6.215 Apollo -> Latona

Metamorphoses 7.755-7.755 Diana -> Procris
Metamorphoses 7.794-7.794 Phocus -> Cephalus
Metamorphoses 7.796-7.862 Cephalus -> Phocus

Metamorphoses 8.90-8.94 Scylla (daughter of Nisus) -> Minos
Metamorphoses 8.97-8.100 Minos -> Scylla (daughter of Nisus)

Metamorphoses 10.364-10.364 Myrrha -> Cinyras
Metamorphoses 10.365-10.366 Cinyras -> Myrrha

Metamorphoses 11.421-11.443 Alcyone -> Ceyx
Metamorphoses 11.451-11.453 Ceyx -> Alcyone
Metamorpho