<a href="https://colab.research.google.com/github/cwf2/style_2025/blob/main/Assignment%201c%20-%20mortal%20vs%20divine%20female%20speakers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Install DICES client software

This step is only necessary once on most machines, but because Google Colab runs this notebook on a fresh virtual machine every time, we always need to install DICES as the first step.

In [None]:
!pip install -q git+https://github.com/cwf2/dices-client

### Import statements

This tells Python which ancillary functions we want to use in this notebook.

In [None]:
# for connecting to DICES
from dicesapi import DicesAPI

# for working with tabular data
import pandas as pd

### Initialize connection to external sources

This creates connections to the DICES database.

In [None]:
# DICES database
api = DicesAPI(logfile="dices.log", logdetail=0)

### Get some speeches

This is the basic search function to get speeches from DICES according to specific parameters.

In [None]:
# get speeches by female speakers in the Odyssey
speeches = api.getSpeeches(work_title="Odyssey", spkr_gender="female")

# how many did we get?
n = len(speeches)

# print a message
print(f"Retrieved {n} speeches")

### Mortal and divine speakers

Each speaker and addressee has attributes `gender` (e.g. male, female), `being` (e.g., divine, mortal), `number` (individual, collective). Here we extract the `being` of the first speaker in each speech (because speeches occasionally have multiple speakers, DICES always treats speech.spkr as a list).

In [None]:
# take each speech in turn
for speech in speeches:

    # split first and last line loci into two parts each
    book_first, line_first = speech.l_fi.split(".")
    book_last, line_last = speech.l_la.split(".")

    # get the first speaker
    spkr = speech.spkr[0]

    # print a list of speech attributes
    print(speech.id, speech.author.name, speech.work.title, book_first, line_first, line_last, spkr.being, sep="\t")

### Putting it all together

### Make a table

Python can work with tabular data like a spreadsheet with the help of the ancillary package [Pandas](https://pandas.pydata.org/docs/user_guide/index.html#user-guide). Here we make the same data into a Pandas data frame.

In [None]:
# an empty list to hold the rows
rows = list()

# iterate over the speeches
for speech in speeches:

    # separate book and line numbers
    book_first, line_first = speech.l_fi.split(".")
    book_last, line_last = speech.l_la.split(".")

    # calculate length of speech
    nlines = int(line_last) - int(line_first) + 1

    # get first speaker being
    spkr_being = speech.spkr[0].being

    # create a new row, labelling all the data values
    row = {
        "id": speech.id,
        "author": speech.author.name,
        "work": speech.work.title,
        "book": int(book_first),
        "first_line": line_first,
        "last_line": line_last,
        "num_lines": nlines,
        "being": spkr_being
        }

    # add the row to the list
    rows.append(row)

# make the table
table = pd.DataFrame(rows)

# write the table to a file for import to Excel
table.to_csv("speeches.tsv", sep="\t", index=False)

# display the table
display(table)

### Summarize data

Just like in Excel, we can summarize tabular data with a pivot table (draaitabel). In this example, we'll count how many speeches are attributed to mortal versus divine female speakers in the *Odyssey*.

We need to specify which columns in the original table we want to use:
- The rows (or "index") of our summary table will come from `book`. Each book gets one row in the new table.
- The columns will come from `being`, i.e., "mortal", "divine", etc.
- We'll derive the values for each cell from the `id` column: that is, we're going to count how many speeches mortal women and goddesses get in each work.

We also need to specify how we want to summarize the speech IDs. In this case, we just want to count them. We tell Python this using the `aggfunc` ("aggregation function") parameter.

In [None]:
count_Odspeeches = (
    table
    .pivot_table(
        index="book",
        columns="being",
        values="id",
        aggfunc="count"
    )
    .fillna(0)
    .astype(int)
)

display(count_Odspeeches)

### Make a graph

Pandas has some basic visualization functions built in. Let's turn the summary table above into a bar graph.

In [None]:
# generate a bar graph
plot_by_book = count_Odspeeches.plot.bar(title="Female speeches in the Odyssey", ylabel="number of speeches")

# save to an image file
plot_by_book.figure.savefig("Odspeeches_count.png")

### More aggregation options

Let's do a second summary, this time looking at the number of lines spoken by mortal women and goddesses in each book of the poem. The rows and columns of our summary table will be the same as last time. But now the values will come from `num_lines` and the aggregation function will be `"sum"` instead of `"count"`.

In [None]:
lines_by_being = (
    table
    .pivot_table(
        index="book",
        columns="being",
        values="num_lines",
        aggfunc="sum"
    )
    .fillna(0)
    .astype(int)
)
lines_by_being.to_csv("lines_by_being.csv", index=False)
display(lines_by_being)

In [None]:
# generate a bar graph
plot_by_line = lines_by_being.plot.bar(title="Lines by female speakers in the Odyssey", ylabel="number of speeches")

# save to an image file
plot_by_line.figure.savefig("Odspeeches_count_by_line.png")