# Thucydides: Text & Translation

## Configuring CITE libraries for almond kernel

First, we'll make a bintray repository with CITE libraries available to your almond kernel.

In [13]:
val myBT = coursierapi.MavenRepository.of("https://terracotta.hpcc.uh.edu/nexus/repository/maven-releases/")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://terracotta.hpcc.uh.edu/nexus/repository/maven-releases/)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [14]:

import $ivy.`edu.holycross.shot.cite::xcite:4.3.1`

import $ivy.`edu.holycross.shot::ohco2:10.20.5`
import $ivy.`edu.holycross.shot::scm:7.4.1`
import $ivy.`edu.holycross.shot::dse:7.1.4`
import $ivy.`edu.holycross.shot::citebinaryimage:3.2.1`
import $ivy.`edu.holycross.shot::citeobj:7.5.2`
import $ivy.`edu.holycross.shot::citerelations:2.7.1`
import $ivy.`edu.holycross.shot::cex:6.5.1`
import $ivy.`edu.holycross.shot::greek:9.1.0`



[32mimport [39m[36m$ivy.$                                     

[39m
[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                          
[39m
[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                        
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                

[39m

## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [15]:
// Import some CITE libraries
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.scm._
import edu.holycross.shot.citeobj._
import edu.holycross.shot.citerelation._
import edu.holycross.shot.dse._
import edu.holycross.shot.citebinaryimage._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.greek._

import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}

import java.io.File
import java.io.PrintWriter

import scala.io.Source



[32mimport [39m[36medu.holycross.shot.cite._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.scm._
[39m
[32mimport [39m[36medu.holycross.shot.citeobj._
[39m
[32mimport [39m[36medu.holycross.shot.citerelation._
[39m
[32mimport [39m[36medu.holycross.shot.dse._
[39m
[32mimport [39m[36medu.holycross.shot.citebinaryimage._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.greek._

[39m
[32mimport [39m[36malmond.display.UpdatableDisplay
[39m
[32mimport [39m[36malmond.interpreter.api.DisplayData.ContentType
[39m
[32mimport [39m[36malmond.interpreter.api.{DisplayData, OutputHandler}

[39m
[32mimport [39m[36mjava.io.File
[39m
[32mimport [39m[36mjava.io.PrintWriter

[39m
[32mimport [39m[36mscala.io.Source

[39m

### Set Up Plotting

In [16]:
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`
import plotly._, plotly.element._, plotly.layout._, plotly.Almond._

// if you want to have the plots available without an internet connection:
// init(offline=true)

// restrict the output height to avoid scrolling in output cells
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)

[32mimport [39m[36m$ivy.$                                      
[39m
[32mimport [39m[36mplotly._, plotly.element._, plotly.layout._, plotly.Almond._

// if you want to have the plots available without an internet connection:
// init(offline=true)

// restrict the output height to avoid scrolling in output cells
[39m

## Some Utilities

In [17]:
def showMe(v:Any):Unit = {
  v match {
    case _:StringHistogram => {
        for ( h <- v.asInstanceOf[StringHistogram].histogram ) {
            println(s"${h.count}\t${h.s}")
        }
    }
  	case _:Corpus => {
  		for ( n <- v.asInstanceOf[Corpus].nodes) {
  			println(s"${n.urn.passageComponent}\t\t${n.text}")
  		}	
  	}
    case _:Vector[Any] => println(s"""\n----\n${v.asInstanceOf[Vector[Any]].mkString("\n")}\n----\n""")
    case _:Iterable[Any] => println(s"""\n----\n${v.asInstanceOf[Iterable[Any]].mkString("\n")}\n----\n""")
    case _ => println(s"\n-----\n${v}\n----\n")
  }
}

def saveString(s:String, filePath:String = "", fileName:String = "temp.txt"):Unit = {
		 val writer = new PrintWriter(new File(s"${filePath}${fileName}"))
         writer.write(s)
         writer.close()
	}

val splitters:String = """[\[\])(·⸁.,:·;;   "?·!–—⸂⸃]"""

def splitWithSplitter(text:String, splitters:String = """[\[\]··⸁.; "?!–—⸂⸃]"""):Vector[String] = {
	val regexWithSplitter = s"(?<=${splitters})"
	text.split(regexWithSplitter).toVector.filter(_.size > 0)
}

// A function to make CSV data out of a Corpus or Vector of Nodes

def tsvFromCorpus(nv: Vector[CitableNode]): String = {
    val headerLine: String = "citation\ttext"
    val data: Vector[String] = nv.map( n => {
        s"${n.urn}\t${n.text}"
    })
    val allData: Vector[String] = headerLine +: data
    allData.mkString("\n")
}

def tsvFromCorpus(c: Corpus): String = {
    val n: Vector[CitableNode] = c.nodes
    tsvFromCorpus(n)
}



defined [32mfunction[39m [36mshowMe[39m
defined [32mfunction[39m [36msaveString[39m
[36msplitters[39m: [32mString[39m = [32m"[\\[\\])(\u00b7\u2e01.,:\u00b7;;   \"?\u00b7!\u2013\u2014\u2e02\u2e03]"[39m
defined [32mfunction[39m [36msplitWithSplitter[39m
defined [32mfunction[39m [36mtsvFromCorpus[39m
defined [32mfunction[39m [36mtsvFromCorpus[39m

## Load Tokenized Data into a CITE Library

In [18]:
//val filePath = s"https://raw.githubusercontent.com/homermultitext/hmt-archive/master/releases-cex/hmt-2020f.cex"
//val lib: CiteLibrary = CiteLibrarySource.fromUrl(filePath)

val filePath = s"all_thucydides_tokenized.cex"
val lib: CiteLibrary = CiteLibrarySource.fromFile(filePath)

Dec 01, 2021 9:15:50 AM wvlet.log.Logger log
INFO: Building text repo from cex ...
Dec 01, 2021 9:15:58 AM wvlet.log.Logger log
INFO: Building collection repo from cex ...
Dec 01, 2021 9:15:59 AM wvlet.log.Logger log
INFO: Building relations from cex ...
Dec 01, 2021 9:15:59 AM wvlet.log.Logger log
INFO: All library components built.


[36mfilePath[39m: [32mString[39m = [32m"all_thucydides_tokenized.cex"[39m
[36mlib[39m: [32mCiteLibrary[39m = [33mCiteLibrary[39m(
  [32m"CEX Library created by CEXWriter"[39m,
...

### Get parts of the CITE Library in convenient form:

In [19]:
lazy val tr: TextRepository = lib.textRepository.get
lazy val corp: Corpus = tr.corpus
lazy val cat: Catalog = tr.catalog
lazy val colls: CiteCollectionRepository = lib.collectionRepository.get
lazy val rels: CiteRelationSet = lib.relationSet.get
lazy val myDseVec: DseVector = DseVector.fromCiteLibrary(lib)

### Get Greek and English into separate Corpora

In [20]:
val thucGrkURN: CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:")
val thucEngURN: CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:")
val thucEngFURN: CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:")

val thucGrk: Corpus = corp ~~ thucGrkURN
val thucEng: Corpus = corp ~~ thucEngURN 
val thucFU: Corpus = corp ~~ thucEngFURN

[36mthucGrkURN[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0003.tlg001.grc.token:"[39m)
[36mthucEngURN[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0003.tlg001.eng.token:"[39m)
[36mthucEngFURN[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:"[39m)
[36mthucGrk[39m: [32mCorpus[39m = [33mCorpus[39m(
  [33mVector[39m(
...
[36mthucEng[39m: [32mCorpus[39m = [33mCorpus[39m(
  [33mVector[39m(
...
[36mthucFU[39m: [32mCorpus[39m = [33mCorpus[39m(
  [33mVector[39m(
...

# Counting Games

### Count Word-Tokens

In [21]:
val engTokens: Int = thucEng.size
val grkTokens: Int = thucGrk.size
val fuTokens: Int = thucFU.size

println(s"There are ${grkTokens} tokens in Jones' Greek edition of Thucydides.")
println(s"There are ${engTokens} tokens in the English translation of Thucydides.")
println(s"There are ${fuTokens} tokens in our class’s English translation of Thucydides.")

There are 150173 tokens in Jones' Greek edition of Thucydides.
There are 204771 tokens in the English translation of Thucydides.
There are 1047 tokens in our class’s English translation of Thucydides.


[36mengTokens[39m: [32mInt[39m = [32m204771[39m
[36mgrkTokens[39m: [32mInt[39m = [32m150173[39m
[36mfuTokens[39m: [32mInt[39m = [32m1047[39m

### Customized Corpora

Each translator produced a specific translation of a specific passage:

- Jackson:
   - Greek: `urn:cts:greekLit:tlg0003.tlg001.grc.token:2.47.0-2.47.93`
   - myEnglish: `urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.47.0-2.47.112`
   - Jowett: `urn:cts:greekLit:tlg0003.tlg001.eng.token:2.47.0-2.47.146`
- CWB:
   - Greek: `urn:cts:greekLit:tlg0003.tlg001.grc.token:2.47.95-2.48.61`
   - myEnglish: `urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.47.148-2.48.88`
   - Jowett: `urn:cts:greekLit:tlg0003.tlg001.eng.token:2.47.114-2.48.61`
- Sabria:
   - Greek: `urn:cts:greekLit:tlg0003.tlg001.grc.token:`
   - myEnglish: `urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:`
   - Jowett: `urn:cts:greekLit:tlg0003.tlg001.eng.token:`
- Mallory:
   - Greek: `urn:cts:greekLit:tlg0003.tlg001.grc.token:`
   - myEnglish: `urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:`
   - Jowett: `urn:cts:greekLit:tlg0003.tlg001.eng.token:`
- Ting:
   - Greek: `urn:cts:greekLit:tlg0003.tlg001.grc.token:`
   - myEnglish: `urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:`
   - Jowett: `urn:cts:greekLit:tlg0003.tlg001.eng.token:`
- Sarah:
   - Greek: `urn:cts:greekLit:tlg0003.tlg001.grc.token:`
   - myEnglish: `urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:`
   - Jowett: `urn:cts:greekLit:tlg0003.tlg001.eng.token:`
- Emma:
   - Greek: `urn:cts:greekLit:tlg0003.tlg001.grc.token:`
   - myEnglish: `urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:`
   - Jowett: `urn:cts:greekLit:tlg0003.tlg001.eng.token:`
- Callie:
   - Greek: `urn:cts:greekLit:tlg0003.tlg001.grc.token:`
   - myEnglish: `urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:`
   - Jowett: `urn:cts:greekLit:tlg0003.tlg001.eng.token:`

### Some Values You Can Use

Each member of the class translated a specific passage. Below are values that capture URN citations for each member's passage, the Greek text, their own translation, and Jowett's translation.

In [26]:
var jacksonGreek:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.47.0-2.47.93")
var jacksonEnglish:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.47.0-2.47.112")
var jacksonJowett:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.47.0-2.47.146")

var cwbGreek:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.47.95-2.48.61")
var cwbEnglish:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.47.148-2.48.88")
var cwbJowett:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.47.114-2.48.61")

var sabriaGreek:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.48.63-2.48.142")
var sabriaEnglish:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.48.90-2.48.218")
var sabriaJowett:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.48.64-2.48.161")

var malloryGreek:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.49.0-2.49.114")
var malloryEnglish:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.49.0-2.49.162")
var malloryJowett:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.49.0-2.49.125")

var tingGreek:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.49.116-2.49.185")
var tingEnglish:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.49.164-2.49.269")
var tingJowett:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.49.127-2.49.225")

var sarahGreek:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.49.187-2.49.285")
var sarahEnglish:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.49.271-2.49.399")
var sarahJowett:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.49.227-2.49.364")

var emmaGreek:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.49.287-2.49.355")
var emmaEnglish:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.49.401-2.49.500")
var emmaJowett:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.49.366-2.49.452")

var callieGreek:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.50.0-2.50.77")
var callieEnglish:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.50.0-2.50.124")
var callieJowett:CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.50.0-2.50.119")

val allGreek: CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.grc.token:2.47-2.50")
val allEnglish: CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.engFu2021.token:2.47-2.50")
val allJowett: CtsUrn = CtsUrn("urn:cts:greekLit:tlg0003.tlg001.eng.token:2.47-2.50")

## Count Your Passsage!

First we have a little function that:

1. Selects out a passage from the whole digital library.
2. Returns how many citable passages (tokens) are in it.



In [27]:
// Lets us send in a URN and get a count
def countBySection(u: CtsUrn, c: Corpus = corp): Double = {
    (c ~~ u).size.toDouble
}

defined [32mfunction[39m [36mcountBySection[39m

### Get the Numbers

Now we can get counts for each version of the text. Then we can find the ration of English words to Greek words, which is a simple fraction. If the answer is 1.0, then there is one English word for each Greek word. If the answer is less than 1.0, then it takes fewer English words to translate the Greek.

The lower the answer, the more "efficient" is the translation. 

**But of course "efficiency" is probably not a primary value.**

In [29]:
// Count each section
var howManyGreekTokens: Double = countBySection(cwbGreek)
var howManyEnglishTokens: Double = countBySection(cwbEnglish)
var howManyJowettTokens: Double = countBySection(cwbJowett)

// Get the efficiency

val englishEff: Double = howManyEnglishTokens / howManyGreekTokens
val jowettEff: Double = howManyJowettTokens / howManyGreekTokens

val differenceEff: Double = englishEff - jowettEff


// Report

println(s"My translation has ${englishEff} English words for each Greek word.")
println(s"Jowett's translation has ${jowettEff} English words for each Greek word.")




My translation has 1.7211538461538463 for each Greek word.
Jowett's translation has 1.2403846153846154 for each Greek word.


### Richness of Vocabulary and Forms

In [None]:
val normGrouped: Vector[ (Char, Vector[Char])] = bigNormString.toVector.groupBy( c => c).toVector
val lemGrouped: Vector[ (Char, Vector[Char])] = bigLemString.toVector.groupBy( c => c).toVector



In [None]:
val normHisto = normGrouped.map( g => {
    val c = g._1
    val n = g._2.size
    (c, n)
}).sortBy(_._2)

val lemHisto = lemGrouped.map( g => {
    val c = g._1
    val n = g._2.size
    (c, n)
}).sortBy(_._2)

In [None]:
for ( h <- lemHisto ) println( h )

### Plot the Histogram

In [None]:
val normCharSeq = normHisto.map(_._1.toString).toSeq
val normCountSeq = normHisto.map(_._2).toSeq

val normalGreek = Scatter(
  normCharSeq,
  normCountSeq
)

val lemCharSeq = lemHisto.map(_._1.toString).toSeq
val lemCountSeq = lemHisto.map(_._2).toSeq

val lemmatizedGreek = Scatter(
  normCharSeq,
  lemCountSeq
)

//val data = Seq(lemmatizedGreek)
//val data = Seq(normalGreek)
//val data = Seq(lemmatizedGreek, normalGreek)
val data = Seq(normalGreek, lemmatizedGreek)

plot(data)