# Introduction to the Python DICES client

In this first workshop, we’re going to look at how to retrieve and work with DICES data inside a Python script. While the web interface can be helpful for browsing and exploring the data, more complicated tasks are better suited to a script. 

For example:
- when your search has several steps, and you want to make sure they're done in a specific order
- when you want to repeat an operation many times and collate the results
- when you have to connect information from different sources, like DICES, Perseus, MANTO, etc.

## The DICES API

You could, if you wanted, examine the machine-oriented version of the database manually using your web-browser. A separate set of URLs provides access to the same data, but without the human-friendly tables, drop-downs and buttons. For example, compare the two pages below. Both represent the same search, for speeches by Jason.

- for humans: http://csa20211203-005.uni-rostock.de/app/speeches?spkr_name=Jason
- for machines: http://csa20211203-005.uni-rostock.de/api/speeches?spkr_name=Jason

The machine-actionable API is provided by Django Web Framework. If you’re interested in working with the API directly or have questions or suggestions about its implementation, please feel free to let us know!


## The DICES client

Most of the time, working with URLs like the one above and parsing the JSON responses from the server isn’t something you want to have to deal with. The [Python DICES client](https://github.com/cwf2/dices-client) provides a wrapper around the API that lets you make requests and manipulate the results using Python objects.

### The database connection

The client provides a class, **DicesAPI**, which allows you to manage your connection to the database. This is how you request data; it also lets you specify a custom server, in case you’re running your own mirror of the database. The first part of this tutorial will cover searching the database and manipulating the results.

### Records as objects

The client provides some basic class definitions that represent the records in the database as objects: speeches, characters, works, etc. Each class comes with properties and methods to help make common tasks straightforward, for example, filtering a collection of speeches based on speaker or language.

We’ll look at each of the classes in turn below and cover their specific properties and methods in detail.

### Additional modules

Beyond the basic mechanisms for querying the database and working with DICES records, we’re continuing to build out a suite of ancillary tools for specialized tasks, often associated with external linked data. For example, the `manto` module provides some basic methods for linking speech data to the MANTO database, while `text` provides shortcuts for downloading the text of the speeches from Perseus and processing them with CLTK. 

For now, the selection is small, and centred on the tasks that we wanted to do ourselves. We’re very interested to hear from you about potential additions or improvements to existing modules, or suggestions for entire new modules.

## Thanks for beta testing!

All of these tools are under active development. We expect to discover bugs and inconsistencies as the user base expands, and we’re very grateful to you for helping us in this regard! Our goal is to support your research, and in the process, to make sure that future scholars are able to replicate and expand upon your work.

# Getting started

<div class="alert alert-info" style="margin:2em 1em">
<h3>Installation</h3>

<p>The DICES client package lives in a <a href="https://github.com/cwf2/dices-client">GitHub repository</a>. The first time you use it on a new machine, you’ll have to install it. Here’s how to do that from within a Jupyter Notebook (on a Mac or Linux computer):</p>

<div style="margin:1em">
    <code>!pip install git+https://github.com/cwf2/dices-client</code>
</div>

</div>

## Loading the DICES client

In every script that works with DICES, you’ll have to **import** the `DicesAPI` class so you can create a connection to the database. Here, we also import the optional `NotebookPBar`, which lets us draw progress bars in Jupyter.

In [1]:
# necessary
from dicesapi import DicesAPI

# optional
from dicesapi.jupyter import NotebookPBar

### Instantiating a connection to the database

Next, create an instance of `DicesAPI`. This is our connection to the database, letting us request data from the server. In this step you can also specify session settings including a custom server URL. I’m specifying a local log file for debugging messages and providing a link to the progress bar class.

In [2]:
api = DicesAPI(
    logfile = 'dices.log',
    progress_class = NotebookPBar,
)

## Basic searches

Now that we’ve created an instance of the DicesAPI class and assigned it to `api`, this becomes our access point to search functionality.

### Works

The `getWorks()` method returns a set of `Work` objects matching the specified criteria.

**Example**

This returns all works by authors named "Homer":

In [3]:
works = api.getWorks(author_name='Homer')

for w in works:
    print(w)

<Work 24: Homeric Hymns>
<Work 1: Iliad>
<Work 2: Odyssey>


### Authors

The `getAuthors()` method returns a set of `Author` objects.

**Example**

This returns all authors named "Virgil":

In [4]:
authors = api.getAuthors(name='Virgil')

for auth in authors:
    print(auth)

<Author 4: Virgil>


### Characters

The `getCharacters()` method returns a set of `Character` objects. According to the DICES model, a **Character** represents core attributes of a person, while **Character Instances** are used to represent that person’s manifestations in various contexts.

**Example**

This returns all characters who are labelled as divine female collectives:

In [5]:
chars = api.getCharacters(gender='female', number='collective', being='divine')

for c in chars:
    print(c)

<Character 387: Fates>
<Character 396: Furies>
<Character 468: Horae>
<Character 663: Naiads>


### Character Instances

The `getInstances()` method returns a set of `CharacterInstance` objects. Each represents a case of that character with a particular context and attributes. For any given character, there should be at least one character instance per text.

**Example**

This returns all instances of an underlying character called "Hera". Depending on the language of the text in which she occurs, the name of the instance may change, but the name of the character is constant.

In [6]:
instances = api.getInstances(char_name='Hera')

for inst in instances:
    print(inst)

<CharacterInstance 15: Hera>
<CharacterInstance 264: Hera>
<CharacterInstance 1278: Hera>
<CharacterInstance 1489: Hera>
<CharacterInstance 1581: Hera>
<CharacterInstance 1609: Hera>
<CharacterInstance 1655: Hera>
<CharacterInstance 287: Juno>
<CharacterInstance 463: Juno>
<CharacterInstance 866: Juno>
<CharacterInstance 939: Juno>
<CharacterInstance 1153: Juno>


### Speeches

The `getSpeeches()` method returns a set of `Speech` objects. This is the interface that we use most frequently, and is therefore most developed.

**Example**

This returns all speeches addressed to characters named "Hypsipyle":

In [7]:
speeches = api.getSpeeches(addr_name='Hypsipyle')

for s in speeches:
    print(s)

<Speech 1403: Argonautica 1.836-1.841>
<Speech 1406: Argonautica 1.900-1.909>
<Speech 3079: Thebaid 4.753-4.771>
<Speech 3084: Thebaid 5.20-5.27>
<Speech 3086: Thebaid 5.43-5.47>
<Speech 3092: Thebaid 5.271-5.284>


If the search is long, you can add a progress bar:

In [8]:
speeches = api.getSpeeches(work_title='Odyssey', progress=True)

HBox(children=(IntProgress(value=0, bar_style='info', max=673), Label(value='0/673')))

### Speech Clusters

The `getClusters()` method returns a set of `SpeechCluster` objects.

**Example**

This returns all speech clusters in works called "Theogony":

In [9]:
clusters = api.getClusters(work_title='Theogony')

for cl in clusters:
    print(cl)

<SpeechCluster 15001: Theogony 26 ff.>
<SpeechCluster 15002: Theogony 164 ff.>
<SpeechCluster 15003: Theogony 543 ff.>
<SpeechCluster 15004: Theogony 644 ff.>


### What can I search for?

The list of search parameters currently available can be found in the [Reference](Reference.ipynb#getAuthors()) notebook

# DICES records as objects

The results of any one of these searches will be a set of zero or more records matching your criteria. In Python, the client API represents these records are as custom object classes. Each entity in the datase has a class that represents individual records, and a second class that represents an iterable collection of records, all of the same type. The collection classes all inherit from a generic `DataGroup` class.

| entity | single record | list of records |
| --- | --- | --- |
| author | `Author` | `AuthorGroup` |
| work | `Work` | `WorkGroup` |
| character | `Character` | `CharacterGroup` |
| character instance | `CharacterInstance` | `CharacterInstanceGroup` |
| speech | `Speech` | `SpeechGroup` |
| speech cluster | `SpeechCluster` | `SpeechClusterGroup` |

## Object properties

Each class of objects has attributes that you can access. Generally, these are of three kinds:
 - data from the underlying record, e.g. an author’s name.
 - methods that can be performed on the object, e.g., generate a URN for a speech
 - access to related objects, e.g., the set of speeches associated with a given cluster

## Objects representing records


### Author

#### Properties

- `id`: a unique identifier for the author
- `name`: the author’s name
- `wd`: a WikiData ID for the author, if we have it
- `urn`: a CITE-complient URN for the author, if we have it

#### Examples

In [None]:
# get some author records
authors = api.getAuthors(work_title='Argonautica')

# print the name of each
for author in authors:
    print(author.name)

### Work

#### Properties

- `id`: a unique identifier for the work
- `title`: the work’s title
- `wd`: a WikiData ID for the work, if we have it
- `urn`: a CTS URN for the work, if we have it
- `author`: link to the `Author` object associated with the work
- `lang`: the work’s language—one of `'greek'` or `'latin'`

#### Examples

In [None]:
# get some author records
works = api.getWorks(lang='latin')

# print the author name and title of each
for work in works:
    print(work.author.name + ', ' + work.title)

### Character

#### Properties
- `id`: a unique identifier for the character
- `name`: the character’s name
- `being`: one of (`'divine'`, `'mortal'`, `'creature'`, `'other'`) [see note 1]
- `number`: one of (`'singular'`, `'collective'`)
- `gender`: one of (`'male'`, `'female'`, `'x'`, `'none'`) [see note 2]
- `wd`: a WikiData ID for the character, if we have one
- `manto`: a MANTO ID for the character, if we have one

#### Notes

1. While humans, monsters, and the Olympian gods are usually straightforward to classify, miscellaneous nymphs and offspring of minor deities can be ambiguous. If you feel that a character is misclassified you find an inconsistency in the scheme, please don't hesitate to let us know.

2. The gender `'x'` is used for mixed-gender collectives and characters classed as non-binary, while `'none'` is used for characters where gender is not applicable, generally inanimate objects. If gender is your specialty and you have alternative schemes that might be more useful, please let us know.

#### Examples

In [None]:
# women who speak second in the odyssey
characters = api.getCharacters(work_title='Odyssey', being='mortal', gender='female', speech_part=2)

# print the name of each
for char in characters:
    print(char.name)

### Character Instance

#### Properties

- `id`: a unique identifier for the character instance
- `context`: a description of the context in which the instance occurs, defaults to work title
- `name`: the name under which the character instance appears in this context [see note 1]
- `char`: access to the `Character` of which this is an instance. [2]
- `being`: one of (`'divine'`, `'mortal'`, `'creature'`, `'other'`) [1]
- `number`: one of (`'singular'`, `'collective'`) [1]
- `gender`: one of (`'male'`, `'female'`, `'x'`, `'none'`) [1]
- `wd`: the WikiData ID of the underlying characer, if there is one [3]
- `manto`: the MANTO ID of the underlying characer, if there is one [3]

#### Notes

1. The `name`, `being`, `number`, and `gender` properties of an instance may not be the same as those of the underlying character. For example, a character instance may have the name 'Jupiter' while its character has the name 'Zeus'.

2. Some character instance records have no associated character. This is the case for a couple of classes of anonymous speakers/addressees. If there is no character, then `char` will be `None`.

3. WikiData and MANTO attributes pass through to the underlying character. These will be `None` if there is no character or if the character lacks these attributes.

#### Examples

In [None]:
# all instances of the god of war
instances = api.getInstances(char_name='Ares')

# print the WikiData ID for each (should be same for all),
#    plus the name and the context
for inst in instances:
    print(inst.wd, inst.name, inst.context, sep="\t")

### Speech Cluster

#### Properties

- `id`: a unique identifier for the speech cluster

#### Methods

- `getSpeeches()`: Returns all speeches in this cluster as a `SpeechGroup`
- `getFirstSpeech()`: Returns only the first speech, as a `Speech`

#### Notes

1. `getSpeeches()` and `getFirstSpeech()` perform an API query in the background

#### Examples


In [None]:
# all conversations in the Aeneid in which Ascanius gets to speak
clusters = api.getClusters(spkr_name='Ascanius')

# list all speeches in each conversation,
#     giving lines and first speaker of each
for cl in clusters:
    print(cl)
    for s in cl.getSpeeches():
        print(s.l_range, s.spkr[0].name, sep="\t")
    print()

### Speech

#### Properties

- `id`: a unique identifier for the speech
- `cluster`: access to the `SpeechCluster` object to which this speech belongs
- `seq`: an integer that can be used for ordering all the speeches in a given work
- `l_fi`: the locus of the passage's first line, as a string
- `l_la`: the locus of the passage's last line, as a string
- `l_range`: the range of loci covered by the passage; equivalent to joining `l_fi`, `l_la` with a `'-'`
- `spkr`: a list of `CharacterInstance` objects representing the speaker(s)
- `addr`: a list of `CharacterInstance` objects representing the addressee(s)
- `part`: which turn this speech fills in the conversation, as an integer
- `type`: one of (`'soliloquy'`, `'monologue'`, `'dialogue'`, `'general'`)
- `work`: access to the `Work` object associated with this speech
- `author`: access to the `Author` object associated with this speech
- `lang`: one of (`'greek'`, `'latin'`)
- `urn`: the CTS URN representing the passage

#### Methods

- `getCTS()`: download the passage from Perseus. [Deprecated: this functionality is moving to `dicesapi.text.cts`]

#### Examples

In [None]:
# all the speeches in the Iliad where Aphrodite is addressed
speeches = api.getSpeeches(addr_name='Aphrodite', work_title='Iliad')

# print the full locus for each speech, 
#     with names of the speaker(s) and addressee(s)
for s in speeches:
    print(s.work.title, s.l_range, [inst.name for inst in s.spkr], [inst.name for inst in s.addr])

## Objects representing collections of records

All of these inherit from the parent class `DataGroup`. They're mostly intended to be iterated over, but each has specific `filter*` methods to extract a subset of the member objects based on their properties, and a set of `get*` methods to extract specific properties from the member objects.

### Examples

#### Iterating over a DataGroup:

In [None]:
# download two sets of speeches to thetis
group_a = api.getSpeeches(addr_name='Thetis', work_title='Iliad')
group_b = api.getSpeeches(addr_name='Thetis', work_title='Achilleid')

print(len(group_a), 'speeches in the Iliad:')
for s in group_a:
    print(s)
print()
    
print(len(group_b), 'speeches in the Achilleid')
for s in group_b:
    print(s)
print()

#### Adding

In [None]:
# combine the two speech groups
combined = group_a + group_b
print(len(combined), 'speeches combined:')
for s in combined:
    print(s)

#### Subtracting

In [None]:
group_c = api.getSpeeches(addr_name='Thetis')
print(len(group_c), 'speeches in the whole corpus')

all_but_homer = group_c - group_a
print(len(all_but_homer), 'excluding the iliad:')

for s in all_but_homer:
    print(s)

In [None]:
# download speeches with Achilles as speaker
group_a = api.getSpeeches(spkr_name='Achilles', work_title='Iliad')
group_b = api.getSpeeches(addr_name='Thetis', work_title='Iliad')

# find intersection with those having Thetis as addressee
for s in group_a.intersect(group_b):
    print(s)

#### Sorting

In [None]:
for s in sorted(thetis_all, key=lambda s: s.spkr[0]):
    print(s.spkr[0].name, s)

### AuthorGroup

#### Methods

- `getIDs()`: returns a list author ids
- `getNames()`: returns a list of author names
- `getWDs()`: returns a list of author wikidata ids
- `getUrns()`: returns a list of author CITE URNs
- `filterNames(list)`: returns an `AuthorGroup` containing only members whose `name` attributes match values in `list`
- `filterIDs(list)`: returns an `AuthorGroup` containing only members whose `id` attributes match values in `list`
- `filterWDs(list)`: returns an `AuthorGroup` containing only members whose `wd` attributes match values in `list`
- `filterUrns(list)`: returns an `AuthorGroup` containing only members whose `urn` attributes match values in `list`

### WorkGroup

#### Methods

- `getIDs()`: returns a list of work ids
- `getTitles()`: returns a list of work titles
- `getWDs()`: returns a list of work wikidata ids
- `getUrns()`: returns a list of work CTS URNs
- `getLangs()`: returns a list of work languages
- `getAuthors(flatten=False)`: returns a list of `Author` objects, one per member work. This includes duplicates if multiple works have the same author. Passing the optional `flatten=True` will remove duplicates and return an `AuthorGroup`. 
- `filterIDs(list)`: returns a `WorkGroup` containing only members whose `id` attributes match values in `list`
- `filterTitles(list)`: returns a `WorkGroup` containing only members whose `title` attributes match values in `list`

- `filterWDs(list)`: returns a `WorkGroup` containing only members whose `wd` attributes match values in `list`
- `filterUrns(list)`: returns a `WorkGroup` containing only members whose `urn` attributes match values in `list`
- `filterAuthors(list)`: returns a `WorkGroup` containing only members whose `author` attributes are found in `list`. Accepts an `AuthorGroup` as well as list.
- `filterLangs(list)`: returns a `WorkGroup` containing only members whose `lang` attributes match values in `list`

### CharacterGroup

#### Methods

- `getIDs()`: Returns a list of character ids
- `getNames()`: Returns a list of character names
- `getBeings()`: Returns a list of character `being`s
- `getNumbers()`: Returns a list of character `number`s
- `getWDs()`: Returns a list of character WikiData IDs
- `getMantos()`: Returns a list of character MANTO IDs
- `getGenders()`: Returns a list of character genders
- `filterIDs(list)`: returns a `CharacterGroup` containing only members whose `id` attributes match values in `list`
- `filterNames(list)`: returns a `CharacterGroup` containing only members whose `name` attributes match values in `list`
- `filterBeings(list)`: returns a `CharacterGroup` containing only members whose `being` attributes match values in `list`
- `filterNumber(list)`: returns a `CharacterGroup` containing only members whose `number` attributes match values in `list`
- `filterGenders(list)`: returns a `CharacterGroup` containing only members whose `gender` attributes match values in `list`
- `filterWDs(list)`: returns a `CharacterGroup` containing only members whose `wd` attributes match values in `list`
- `filterMantos(list)`: returns a `CharacterGroup` containing only members whose `manto` attributes match values in `list`

### CharacterInstanceGroup

#### Methods

- `getIDs()`: Returns a list of character instance ids
- `getNames()`: Returns a list of character instance names
- `getContexts()`: Returns a list of character instance contexts
- `getBeings()`: Returns a list of character instance `being`s
- `getNumbers()`: Returns a list of character instance `number`s
- `getGenders()`: Returns a list of character instance genders
- `getChars(flatten=False)`: Returns a list of underlying `Character` objects, one per memeber instance. Passing `flatten=True` will remove duplicates and return a CharacterGroup.
- `filterIDs(list)`: returns a `CharacterInstanceGroup` containing only members whose `id` attributes match values in `list`
- `filterNames(list)`: returns a `CharacterInstanceGroup` containing only members whose `name` attributes match values in `list`
- `filterContexts(list)`: returns a `CharacterInstanceGroup` containing only members whose `context` attributes match values in `list`
- `filterChars(list)`: returns a `CharacterInstanceGroup` containing only members whose `char` attributes match `Character`s in `list`. Can be passed a `CharacterGroup` instead of a list.

- `filterBeings(list)`: returns a `CharacterInstanceGroup` containing only members whose `being` attributes match values in `list`
- `filterNumber(list)`: returns a `CharacterInstanceGroup` containing only members whose `number` attributes match values in `list`
- `filterGenders(list)`: returns a `CharacterInstanceGroup` containing only members whose `gender` attributes match values in `list`

### SpeechClusterGroup

#### Methods

- `getIDs()`: Returns a list of speech cluster ids
- `filterIDs(list)`: returns a `SpeechClusterGroup` containing only members whose `id` attributes match values in `list`

### SpeechGroup

#### Methods

- `getIDs()`: Returns a list of speech ids
- `getClusters(flatten=False)`: Returns a list of `SpeechCluster` objects, one per member speech. Passing `flatten=True` will return a `SpeechClusterGroup` instead, with duplicate entries removed.
- `getSeqs()`: Returns a list of member speech `seq` values
- `getL_fis()`: Returns a list of member speech `l_fi` values
- `getL_las()`: Returns a list of member speech `l_la` values
- `getSpkrs(flatten=False)`: Returns a list of member speech `spkr` values. Note that each element will itself be a list, since `Speech.spkr` is a list of character instances. Passing `flatten=False` will return a single `CharacterInstanceGroup` instead, with duplicate instances removed.
- `getAddrs(flatten=False)`: Returns a list of member speech `addr` values. As for `getSpkrs()`, passing `flatten=False` will return a single `CharacterInstanceGroup` instead of a list of lists.
- `getParts()`: Returns a list of member speech `part` values
- `getTypes()`: Returns a list of member speech `type` values
- `getWorks(flatten=False)`: Returns a list of `Work` objects, one per member speech. Passing `flatten=True` will return a `WorkGroup` instead, with duplicate works removed.
- `filterIDs(list)`: Returns a `SpeechGroup` containing only members whose `id` attributes match values in `list`
- `filterClusters(list)`: Returns a `SpeechGroup` containing only members whose `cluster` attributes match `Cluster` objects in `list`. Also accepts a `ClusterGroup` instead of a list.
- `filterSeqs(list)`: Returns a `SpeechGroup` containing only members whose `seq` attributes match values in `list`
- `filterL_fis(list)`: Returns a `SpeechGroup` containing only members whose `l_fi` attributes match values in `list`
- `filterL_las(list)`: Returns a `SpeechGroup` containing only members whose `l_la` attributes match values in `list`
- `filterSpkrInstances(list)`: Returns a `SpeechGroup` containing only members whose `spkr` list contains at least one `CharacterInstance` found in `list` [see note 1]
- `filterSpkrs(list)`: Returns a `SpeechGroup` containing only members whose `spkr` list contains at least one  `CharacterInstance` whose `Character` is found in `list` [1]
- `filterAddrInstances(list)`: Returns a `SpeechGroup` containing only members whose `addr` list contains at least one `CharacterInstance` found in `list` [1]
- `filterAddrs(list)`: Returns a `SpeechGroup` containing only members whose `addr` list contains at least one  `CharacterInstance` whose `Character` is found in `list` [1]
- `filterParts(list)`: Returns a `SpeechGroup` containing only members whose `part` attributes match values in `list`
- `filterTypes(list)`: Returns a `SpeechGroup` containing only members whose `type` attributes match values in `list`
- `filterWorks(list)`: Returns a `SpeechGroup` containing only members whose `work` attributes match `Work` objects in `list`. Also accepts a `WorkGroup` instead of a list.

#### Notes

1. `filterSpkrInstances()`, `filterSpkrs()`, `filterAddrInstances()`, and `filterAddrs()` are slated for refactoring.... these methods' names may change.