# Retrieving CTS Passages using MyCapytain

In this example, we'll use [MyCapytain](https://github.com/Capitains/MyCapytain) to retrieve the text of some speeches from a remote CTS server.

## Scenario

Let's say we want to know how many words Achilles speaks to each of his interlocutors. We can search the DICES database for the relevant speeches using the API. Then, to count the number of words, we'll have to retrieve the text of the speeches themselves. Since the DICES *Speech* objects include CTS URNS, we can request the passages from a remote server. 

## Preliminaries

### Install the client library

If you don't have the DICES client library, you can install it with **pip**:
```
pip install git+https://github.com/cwf2/dices-client.git
```

### The DICES API

We have to provide an endpoint for the DICES api. Here, we're using the Heroku test instance, so it runs a little slow.

We can optionally provide a CTS server for text services. We'll use the [Perseids CTS server](https://cts.perseids.org/).

In [None]:
from dicesapi import DicesAPI
dices = 'https://fierce-ravine-99183.herokuapp.com/api'
perseus = 'http://cts.perseids.org/api/cts/'

api = DicesAPI(dices_api=dices, cts_api=perseus)

For work in Jupyter, we can add a little razzle-dazzle...

In [None]:
# this is just to provide progress bars in Jupyter
from dicesapi.jupyter import NotebookPBar
api._ProgressClass = NotebookPBar

### Matplotlib for figures

Finally, we'll import pyplot for drawing a simple bar graph of the results. Note the Jupyter magic `%matplotlib inline` to display the figure right in the notebook. Some people like `%matplotlib notebook` better — it gives you some fancier display options.

In [None]:
from matplotlib import pyplot
%matplotlib inline

## Running the experiment

Here's the code for calculating Achilles' speech lengths by addressee.

### First, download the speeches

Using the hand-rolled DICES API code, we can search speeches using keywords. For now, JSON results from the API are paged, so if your search has a lot of results, you may have to wait for several pages to download. I've added a progress bar widget because I get impatient.

<div class="alert alert-warning" style="margin:1em 2em">
    <p><strong>NB:</strong> Because the server is on Heroku's free tier, it take a minute to wake up when you first run a search. Subsequent tries are usually faster.</p>
</div>

In [None]:
speeches = api.getSpeeches(spkr_name='Achilles', progress=True)

### Count the words for each speech

This involves retrieving each passage from the CTS server, and extracting the plaintext of its contents. When I wrote the `.getURN` method for Speech objects, I appended the loci to the work URN, but the resolver actuall wants them separate, so I'm splitting the URN back into work and loci strings. Then we use `getTextualNode` to retrieve the passage. MyCapytain gives us CTS Passage objects, which have a handy `.text` attribute.

**NB:** This is not how you'd really want to count words if you were serious. A proper tokenizer like [CLTK](http://cltk.org/) would be much more sophisticated.

In [None]:
# initialize our counter
count = {}

# iterate over all speeches
for speech in speeches:
    
    # retrieve the passage from the remote library
    cts_passage = speech.getCTS()
    
    # extract the text and split into words
    plaintext = cts_passage.text
    n_words = len(plaintext.split())
    
    # tally the word counts for each addressee
    for addressee in speech.addr:
        name = addressee.name
        if name == 'Achilles':
            name = 'himself'
        count[name] = count.get(name, 0) + n_words

### Examine the results

🤔 Let's see whether it worked!

In [None]:
for name in sorted(count):
    print(name, count[name])

### Make a simple graph with pyplot

Seems good. Let's visualize it with a simple bar chart.

In [None]:
# data for the graph
names = sorted(count)
y_pos = range(len(names))
bars = [count[name] for name in names]

# create a new figure
fig, ax = pyplot.subplots(figsize=(8, 8))

# draw the bars
ax.barh(y_pos, bars, align='center')

# annotate the graph
ax.set_yticks(y_pos)
ax.set_yticklabels(names)
ax.invert_yaxis()  # labels read top-to-bottom
ax.set_xlabel('Number of Words')
ax.set_ylabel('Addressee')
ax.set_title('Length of Achilles\' speeches')