<a href="https://colab.research.google.com/github/cwf2/style_2025/blob/main/Example%201a_%20Achilles_%20speeches.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Install DICES client software

This step is only necessary once on most machines, but because Google Colab runs this notebook on a fresh virtual machine every time, we always need to install DICES as the first step.

In [None]:
!pip install -q git+https://github.com/cwf2/dices-client

### Import statements

This tells Python which ancillary functions we want to use in this notebook.

In [None]:
from dicesapi import DicesAPI
from dicesapi.text import CtsAPI
import pandas as pd

### Initialize connection to external sources

This creates connections to the speech database and to the digital library.

In [None]:
# DICES database
api = DicesAPI(logdetail = 0)

# Perseus Digital Library
cts = CtsAPI(dices_api = api)

### Get some speeches

This is the basic search function to get speeches from DICES according to specific parameters.

In [None]:
speeches = api.getSpeeches(spkr_name="Achilles")
n = len(speeches)
print(f"Retrieved {n} speeches")

### Print out some basic information about the speeches

This loops over each speech in turn and prints out its attributes, separated by a tab.


In [None]:
for speech in speeches:
    print(speech.id, speech.author.name, speech.work.title, speech.l_fi, speech.l_la, speech.getSpkrString(), speech.getAddrString(), sep="\t")

### Extract book and line numbers from the loci

The loci are recorded as *strings*, that is, sequences of characters rather than as numeric data. Here we split each locus into two parts based on the "." character, and convert each part to a number (integer or `int`).

For our purposes, we don't need to print the names of the speakers and addressees, so we'll leave that out.

In [None]:
for speech in speeches:
    loc_first = speech.l_fi
    loc_last = speech.l_la

    book_first, line_first = loc_first.split(".")
    book_last, line_last = loc_last.split(".")

    print(speech.id, speech.author.name, speech.work.title, book_first, line_first, line_last, sep="\t")

### Putting it all together

### Make a table

Python can work with tabular data like a spreadsheet with the help of the ancillary package Pandas. Here we make the same data into a Pandas DataTable.

In [None]:
# an empty list to hold the rows
rows = list()

# iterate over the speeches
for speech in speeches:
    # separate book and line numbers
    book_first, line_first = speech.l_fi.split(".")
    book_last, line_last = speech.l_la.split(".")

    # calculate length of speech
    nlines = int(line_last) - int(line_first) + 1

    # create a new row, labelling all the data values
    row = {
        "id": speech.id,
        "author": speech.author.name,
        "work": speech.work.title,
        "book": int(book_first),
        "first_line": line_first,
        "last_line": line_last,
        "num_lines": nlines,
        }

    # add the row to the list
    rows.append(row)

# make the table
table = pd.DataFrame(rows)

# write the table to a file for import to Excel
table.to_csv("speeches.tsv", sep="\t", index=False)

# display the table
display(table)

### Summarize data

Just like in Excel, we can summarize tabular data with a pivot table (draaitabel). In this example, we'll count how many speeches are attributed to Achilles in our whole corpus.

We need to specify which columns in the original table we want to use:
- The rows (or "index") of our summary table will come from `work`. Each work gets one row in the new table.
- The columns will come from `id`, i.e., each of Achilles' speeches' assigned ID number.
- We'll derive the values for each cell from the `id` column: that is, we're going to count how many speeches Achilles gets in each work.

We also need to specify how we want to summarize the speech ids. In this case, we just want to count them. We tell Python this using the `aggfunc` ("aggregation function") parameter.

In [None]:
Aspeeches = (
    table
    .groupby("work")
    .agg(
        speeches=("id", "count"),
        lines = ("num_lines", "sum"),
    )
    .sort_values("speeches", ascending=False)
)
display(Aspeeches)

### Make a graph

Pandas has some basic visualization functions built in. Let's turn the summary table above into a bar graph.

In [None]:
# generate a bar graph
plot_by_book = Aspeeches["speeches"].plot.bar(title="Speeches by Achilles", ylabel="number of speeches")

# save to an image file
plot_by_book.figure.savefig("speech_count_by_Aspeeches.png")

In [None]:
# generate a bar graph
plot_by_book = Aspeeches["lines"].plot.bar(title="Lines spoken by Achilles", ylabel="total lines")

# save to an image file
plot_by_book.figure.savefig("line_count_by_Aspeeches.png")