## Load the DICES client

In order to talk to DICES, we first have to `import` the DICES client code.

In [None]:
from dicesapi import DicesAPI
from dicesapi.jupyter import NotebookPBar

## Create connections to external data sources

In order to work with the DICES database via Python, the first step is to create a connection to DICES using the `DicesAPI()` command. This instantiates a custom Python **object** which holds information about the connection, for example what server we're using. We will assign the resulting object to a new variable, here called `api`.

In [None]:
api = DicesAPI(
    logfile = 'dices.log',
    progress_class = NotebookPBar,
)

## Make a request to the DICES database

Now that we have a connection to the database, we can use it to make requests for data about speeches, works, authors, etc. Every time we make a request, we start by calling on the API object we have stored in the variable `api`. The various requests are invoked using **methods**, i.e. verbs that the API object knows how to perform. 

In Python, methods are joined to the name of the object that performs them with a dot. For example, to get a list of authors, we use `api.getAuthors()`.

In [None]:
api.getAuthors()

🤔 Oops! We didn't tell Python what to do with the results of our request, so it just printed out a brief representation and then forgot about them. Let's try again, this time assigning the results to a new variable so we can use them.

In [None]:
authors = api.getAuthors()

This time there's no output visible, because the results are saved under the new name `authors`.

The results take the form of an AuthorGroup, a custom container that holds author records. 

#### Size

We can see how many results are in the container using Python's `len()` function (i.e. "length").

In [None]:
len(authors)

#### Iteration

We can also iterate over the items in the container using a **`for` loop**. The first line of the loop defines a **loop variable**, here called `author`, which will be set to each of the author records in `authors` in turn. The indented lines of the loop will then be executed once for each author.

In [None]:
print(f"Retrieved records for {len(authors)} authors:")

for author in authors:
    print("  *", author.name)

## Retrieving speeches

The most common request we make to the DICES database is for speeches. The method for retrieving speech records is `api.getSpeeches()`.

### Specifying search parameters

Instead of retrieving all the speeches in the database, we can specify some search parameters in the request. Search parameters go inside the parentheses. For example, let's download all the speeches by Poseidon. The parameter specifying the speaking character's name is `spkr_name`.

In [None]:
# download speech records for Poseidon
speeches = api.getSpeeches(spkr_name="Poseidon")

# print the number of results
print(len(speeches), "speeches retrieved.")

Here are some useful search parameters:

#### Speaker details
- `spkr_name` name of the speaker
- `spkr_gender` gender of the speaker ("male", "female", "non-binary", "none")
- `spkr_being` entity type ("mortal", "divine", "creature", "other")
- `spkr_number` ("individual", "collective")

#### Addressee details

- `addr_*` all of the above, but for addressee

#### Poem details

- `work_title`
- `author_name`

## Properties of speeches

Once you've downloaded some speech records, you can manipulate them and interrogate their **attributes**. A Python object's attributes are like nouns it possesses. Just like methods, attributes are accessed by using the object's name and a dot. For example, assuming you have a speech stored in the variable `speech`, its language is accessible as the attribute `speech.lang`

In [None]:
# select the first item in speeches
speech = speeches[0]

# print its language
print(speech.lang)

### Some useful speech attributes

- `lang` - language
- `author` - author (Author object)
- `work` - work (Work object)
- `l_fi` - first line
- `l_la` - last line
- `l_range` - first and last lines joined by a hyphen
- `spkr` - speaker(s) (list of CharacterInstance objects)
- `addr` - addressee(s) (list of CharacterInstance objects)

### ⚠️ `author` and `work` are themselves objects

Some of the attributes listed above hold not just text but other objects. That means they can have their own attributes in turn.

- For the author's name, use `author.name`
- For the work title, use `work.title`

#### Example

Print the author, work, and line range for the first 5 speeches by Poseidon.

In [None]:
for speech in speeches[:5]:
    print(speech.author.name, speech.work.title, speech.l_range)

### ⚠️ `spkr` and `addr` are lists

The `spkr` and `addr` attributes for a speech are always lists. Speeches often have more than one addressee, and a few speeches have more than one speaker. For the sake of consistency, we always represent these attributes as lists---if there's only one speaker, then `speech.spkr` will be a list of length 1.

The items in these lists are **CharacterInstance** objects. We'll talk more about Character Instances below, but for now, let's note just that three attributes of a Character Instance are:

- `name`
- `gender` - options: `'male'`, `'female'`, `'non-binary'`, or `'none'`
- `being` - options: `'mortal'`, `'divine'`, `'creature'`, or `'other'`

#### Example

Print the speaker(s) and addressee(s) for the first 5 speeches.

💁🏻‍♂️ By now I have several things that I want to print out for each speech. As long as they're all inside the parens of the `print()` function, I can put them on separate lines to make the code easier to read.

In [None]:
for speech in speeches[:5]:
    print(
        [inst.name for inst in speech.spkr],  # speaker list
        [inst.name for inst in speech.addr],  # addressee list
        speech.author.name,                   # author name
        speech.work.title,                    # work title
        speech.l_range,                       # line range
    )

In the example above, we used a Python idiom called **list comprehension** to extract just the `name` attribute of each character instance in the list. It's like a miniature `for` loop, where the variable `inst` takes the place of each element in turn.

Because we left the speaker and addressee names in list format, they appear in square brackets in the output. Note that the third speech in the list actually has two addressees: the two Ajaxes.

To make the output easier to read, we can squash the lists into text representations using the `str.join()` method. This uses a separator of our choosing to join a list of strings into one long string. Let's use a semi-colon.

In [None]:
for speech in speeches[:5]:
    print(
        '; '.join(inst.name for inst in speech.spkr),   # speakers as string
        '; '.join(inst.name for inst in speech.addr),   # addressees as string
        speech.author.name,                             # author name
        speech.work.title,                              # work title
        speech.l_range                                  # line range
    )

## Working with tabular data

It's already getting a little hard to read the output now that we have multiple attributes for each speech. With a little extra code, we can format our results as tables using **Pandas**, an add-on package commonly used for data science in Python. 

### import pandas

In [None]:
import pandas as pd

### Re-write our loop to create rows in a table

Pandas stores tabular data in objects called **DataFrames**. One easy way to create a DataFrame is by iterating over all the rows in the table. Each row can be thought of as a **dictionary**: the **keys** of the dictionary provide the column headings, and the **values** of the dictionary give the contents of the cells.

Here, we re-write the `for` loop above, but instead of printing out each row, we append it to a list called `rows`. At the end, we instantiate a new DataFrame from the list of rows with `pd.DataFrame()`.

In [None]:
# initialize an empty list
rows = []

# iterate over the speeches
for speech in speeches[:5]:
    
    # create a new row as a dictionary
    this_row = dict(
        speaker = '; '.join(inst.name for inst in speech.spkr), 
        addressee = '; '.join(inst.name for inst in speech.addr),
        author = speech.author.name,
        work = speech.work.title,
        loci = speech.l_range,
    )
    
    # add to the list of rows
    rows.append(this_row)
    
# create a new DataFrame
pd.DataFrame(rows)

### Again, but more concisely

Often we can rephrase the construction of a DataFrame using list comprehension, in a kind of extended one-liner. The code below is equivalent to the previous example.

In [None]:
pd.DataFrame(dict(
    speaker = '; '.join(inst.name for inst in speech.spkr), 
    addressee = '; '.join(inst.name for inst in speech.addr),
    author = speech.author.name,
    work = speech.work.title,
    loci = speech.l_range,
) for speech in speeches[:5])

### The full table

Here are all 38 rows.

In [None]:
pd.DataFrame(dict(
    speaker = '; '.join(inst.name for inst in speech.spkr), 
    addressee = '; '.join(inst.name for inst in speech.addr),
    author = speech.author.name,
    work = speech.work.title,
    loci = speech.l_range,
) for speech in speeches)

### Export the table to Excel

One common task is to move data from DICES to Excel. Pandas has a handy method for exporting Data Frames, `DatFrame.to_csv()`. It takes as an argument the name of the file you want to create.

I've also added the `index=False` flag—this suppresses an optional first column containing row numbers.

In [None]:
# create a table and assign it to `df`
df = pd.DataFrame(dict(
    speaker = '; '.join(inst.name for inst in speech.spkr), 
    addressee = '; '.join(inst.name for inst in speech.addr),
    author = speech.author.name,
    work = speech.work.title,
    loci = speech.l_range,
) for speech in speeches)

# export to csv file
df.to_csv('poseidon.csv', index=False)

The code above should have written a new file to the directory containing this notebook. Take a look and see if it's there. Try importing it into Excel.