# Sequencing topics in introductory Latin courses: 1/2

## Frequency of verb moods in Hyginus

This notebook is one of two showing how observations quoted in Daniel Libatique and Dominic Machado, "*Lector Intende, Laetaberis*: A Research-Based Approach to Introductory Latin" (currently in preparation) were computed. The two notebooks are:

1. Frequency of moods of finite verbs in Hyginus (this noteook)
2. Frequency of active and passive voice in Livy

The code used here is also available in the `scripts` directory of [this github repository](https://github.com/lingualatina/analysis/). 

## Summary of conclusions

- approximately 3/4 of finite verbs in Hyginus are in the indicative
- approximately 1/4 are in the subjunctive
- less than 1/2 of 1% of finite verb forms are in the imperative

## 0. Configure Jupyter notebook to find code libraries

Beginning with section **1**, all code is generic Scala you could also run in any environment that supports scala.  Specifically, you can use `sbt console` from the [Lingua Latina analysis github repository](https://github.com/lingualatina/analysis/) to run the same code.

The following two cells configure this notebook to find code libraries we will use in analyzing a parsed text of Hyginus.


In [None]:
// set up notebook to find repository
val personalRepo = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(personalRepo)

In [None]:
// ivy imports
import $ivy.`edu.holycross.shot::latincorpus:5.0.0`

## 1. Load a parsed corpus

We will load a  text of Hyginus that has been largely parsed morphologically.  Out of 27759 lexical tokens ("words"), more than 19,000 have been parsed: most of the unparsed terms are proper names that are not in our current morphological lexicon.  Of those 19000+ terms, 3628 are finite verb forms.

For measuring the relative frequency of finite verb moods, this sample is more than adequate.


In [None]:
import edu.holycross.shot.latincorpus._
val hyginusUrl = "https://raw.githubusercontent.com/LinguaLatina/analysis/master/data/hyginus/hyginus-latc.cex"
val hyginus = LatinCorpus.fromUrl(hyginusUrl)

In [None]:
println("Total tokens / lexical tokens / analyzed:")
println(hyginus.tokens.size + " total tokens / " + hyginus.lexicalTokens.size + " lexical tokens / " + hyginus.analyzed.size + " morphologically analyzed")

println("\nPossible forms / tokens analyzed")
println(hyginus.allAnalyses.size + " / " + hyginus.analyzed.size)

println("\nTokens analyzed / finite verb tokens")
println(hyginus.analyzed.size + " tokens / " + hyginus.verbs.size + " finite verb tokens")

## 2. Isolate forms analyzed to a single mood

Some finite verb forms could be analyzed as forms of more than one mood:  in Hyginus, these make up fewer than 10% of finite verb tokens.  We'll use the remaining 3327 finite verb forms to compute the frequency of each mood.

In [None]:
import edu.holycross.shot.tabulae._
// True if all analyses are in the same mood
def uniformMood(analyses: Vector[LemmatizedForm]): Boolean = {
  val distinctMoods = analyses.map(a => a.verbMood).distinct
  distinctMoods.size == 1
}

val pureMood = hyginus.verbs.filter(tkn => uniformMood(tkn.analyses))
val mixedMood = hyginus.verbs.filterNot(tkn => uniformMood(tkn.analyses))

println("Single mood / Multiple moods")
println(pureMood.size + " / " + mixedMood.size)

# 3. Compute frequency of each mood

The following cell extracts the mood value from each token analyzed to a single mood, groups the results by mood, and counts the size of each group.

It then does a little arithmetic formatting to display the percentages of each mood, rounded to an integer.


In [None]:
val moods = pureMood.map(tkn => tkn.analyses.head.verbMood.get)
val groupedByMood = moods.groupBy(mood => mood)
val frequencies = groupedByMood.toVector.map{ case (k,v) => (k, v.size) }





///println(percents.map{ case (mood, count) => mood + ": " + count }.mkString("\n"))


### QED

In [None]:
val total = frequencies.map(_._2).sum.toDouble
val percents = frequencies.map(f => (f._1, ((f._2 / total) * 100).toInt)).sortBy(pct => pct._2).reverse
println(percents.map{ case (mood, pct) => mood + ": " + pct  + "%"}.mkString("\n"))