<a href="https://colab.research.google.com/github/cwf2/style_2025/blob/main/Example%201.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Install DICES client software

This step is only necessary once on most machines, but because Google Colab runs this notebook on a fresh virtual machine every time, we always need to install DICES as the first step.

In [None]:
!pip install -q git+https://github.com/cwf2/dices-client

### Import statements

This tells Python which ancillary functions we want to use in this notebook.

In [None]:
# for talking to DICES
from dicesapi import DicesAPI

# for tabular data manipulation
import pandas as pd

### Initialize connection to external sources

This creates a connection to the DICES database.

💁🏻‍♂️ *The DICES client spits out a lot of debugging information. It's convenient to divert those messages to a separate file.*

In [None]:
# DICES database
api = DicesAPI(logfile="dices.log", logdetail=0)

### Get some speeches

This is the basic search function to get speeches from DICES according to specific parameters.

In [None]:
speeches = api.getSpeeches(author_name="Virgil")
n = len(speeches)
print("Retrieved", n, "speeches.")

### Print out some basic information about the speeches

This loops over each speech in turn and prints out its attributes, separated by a tab.


In [None]:
for speech in speeches:
    print(speech.id, speech.author.name, speech.work.title, speech.l_fi, speech.l_la, speech.getSpkrString(), speech.getAddrString(), sep="\t")

### Extract book and line numbers from the loci

The loci are recorded as *strings*, that is, sequences of characters rather than as numeric data. Here we split each locus into two parts based on the "." character.

To keep the example short, we'll just look at the first 10 speeches.

In [None]:
for speech in speeches[:10]:
    loc_first = speech.l_fi
    loc_last = speech.l_la

    book_first, line_first = loc_first.split(".")
    book_last, line_last = loc_last.split(".")

    print(speech.id, speech.author.name, speech.work.title, book_first, line_first, line_last, sep="\t")

### Speaker gender

Each speaker and addressee has attributes `gender` (e.g. male, female), `being` (e.g., divine, mortal), `number` (individual, collective). Here we extract the gender of the first speaker in each speech (because speeches occasionally have multiple speakers, DICES always treats `speech.spkr` as a list.

Once again, we'll just look at the first 10 speeches.

In [None]:
# take each speech of the first 10 in turn
for speech in speeches[:10]:

    # separate the loci into book and line
    book_first, line_first = speech.l_fi.split(".")
    book_last, line_last = speech.l_la.split(".")

    # get the first speaker
    spkr = speech.spkr[0]

    # print row
    print(speech.id, speech.author.name, speech.work.title, book_first, line_first, line_last, spkr.name, spkr.gender, sep="\t")

### Putting it all together

Finally, let's use the line numbers to calculate a rough approximation of the speech lengh. In order to do arithmatic with the line numbers, we must convert them from strings to *integers* (`int`).

In [None]:
for speech in speeches:
    # separate book and line numbers
    book_first, line_first = speech.l_fi.split(".")
    book_last, line_last = speech.l_la.split(".")

    # calculate length of speech
    nlines = int(line_last) - int(line_first) + 1

    # get first speaker gender
    spkr_gender = speech.spkr[0].gender

    # print row
    print(
        speech.id,
        speech.author.name,
        speech.work.title,
        book_first,
        line_first,
        line_last,
        nlines,
        speech.getSpkrString(),
        spkr_gender,
        speech.getAddrString(),
        sep="\t")

### Make a table

Now that we have demonstrated how to work with speeches and access some of their attributes, let's make the output easier to read and manipulate.

Python can work with tabular data like a spreadsheet with the help of the ancillary package Pandas. Here we make the same data into a Pandas DataTable.

In [None]:
# an empty list to hold the rows
rows = list()

# iterate over the speeches
for speech in speeches:
    # separate book and line numbers
    book_first, line_first = speech.l_fi.split(".")
    book_last, line_last = speech.l_la.split(".")

    # calculate length of speech
    nlines = int(line_last) - int(line_first) + 1

    # get first speaker gender
    spkr_gender = speech.spkr[0].gender

    # create a new row, labelling all the data values
    row = {
        "id": speech.id,
        "author": speech.author.name,
        "work": speech.work.title,
        "book": int(book_first),
        "first_line": line_first,
        "last_line": line_last,
        "num_lines": nlines,
        "speaker": speech.getSpkrString(),
        "gender": spkr_gender,
        "addressee": speech.getAddrString(),
    }

    # add the row to the list
    rows.append(row)

# make the table
table = pd.DataFrame(rows)

# write the table to a file for import to Excel
table.to_csv("speeches.tsv", sep="\t", index=False)

# display the table
display(table)

### Summarize data

Just like in Excel, we can summarize tabular data with a pivot table (draaitabel). In this example, we'll count how many speeches are attributed to male and female speakers in each book of the *Aeneid*.

We need to specify which columns in the original table we want to use:
- The rows (or "index") of our summary table will come from `book`. Each book gets one row in the new table.
- The columns will come from `gender`, i.e., "male" and "female".
- We'll derive the values for each cell from the `id` column: that is, we're going to count how many speeches each gender gets.

We also need to specify how we want to summarize the speech ids. In this case, we just want to count them. We tell Python this using the `aggfunc` ("aggregation function") parameter.

In [None]:
count_by_book = table.pivot_table(index="book", columns="gender", values="id", aggfunc="count")
display(count_by_book)

### Make a graph

Pandas has some basic visualization functions built in. Let's turn the summary table above into a bar graph.

In [None]:
# generate a bar graph
plot_by_book = count_by_book.plot.bar(title="speech count by gender", ylabel="number of speeches")

# save to an image file
plot_by_book.figure.savefig("speech_count_by_book.png")

### More aggregation options

Let's do a second summary, this time looking at the number of lines spoken by each gender in each book of the poem. The rows and columns of our summary table will be the same as last time. But now the values will come from `num_lines` and the aggregation function will be `"sum"` instead of `"count"`.

In [None]:
lines_by_gender = table.pivot_table(index="book", columns="gender", values="num_lines", aggfunc="sum")
lines_by_gender.to_csv("lines_by_gender.csv", index=False)
display(lines_by_gender)

### Assignments

Time to get started on your own assignment!
Below, you'll find four different assignments that are similar to this example. It's easiest to keep working in this file, so you don't have to run the preliminaries again.

How to get started: after choosing your assignment, scroll back to the top of this file. Start under 'Get some speeches'.

Tip: open the notebook which corresponds to your assignment. Use it for hints.


1a: Achilles
- Collect all of Achilles' speeches
- Print the speeches' data (work, addressee, number of lines etc.) like in the example above
- Convert the data into a table
- Make a pivot table that tells you how many speeches and how many lines Achilles speaks in each work
- Convert this data into a bar graph

1b: Female speakers in the *Iliad* and the *Odyssey*
- Collect all speeches by women in Homer
- Print the speeches' data (work, addressee, number of lines etc.) like in the example above
- Convert the data into a table
- Make a pivot table that tells you how many speeches by women there are in the *Iliad* and the *Odyssey*
- Convert this data into a bar graph
- Next step: repeat the last two steps, but for number of lines instead of number of speeches

1c: Mortal women and goddesses in the *Odyssey*
- Collect all speeches by female speakers in the *Odyssey*
- Print the speeches' data (work, addressee, number of lines etc.) like in the example above
- Convert the data into a table
- Make a pivot table that tells you how many speeches in the *Odyssey* are spoken by mortal women and how many by goddesses
- Convert this data into a bar graph
- Next step: repeat the last two steps, but for number of lines instead of number of speeches

1d: Speech length in the *Iliad* and the *Dionysiaca*
- Collect all speeches from the *Iliad* and the *Dionysiaca*
- Print the speeches' data (work, addressee, number of lines etc.) like in the example above
- Convert the data into a table
- Make a pivot table that tells you the average speech length in the *Iliad* and in the *Dionysiaca*
- Convert this data into a bar graph
- Next step: repeat the last two steps, but for average speech length per book instead of per work