# Finding Text Bearing Surfaces

## Configuring CITE libraries for almond kernel

First, we'll make a bintray repository with CITE libraries available to your almond kernel.

In [1]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [2]:
import $ivy.`edu.holycross.shot::ohco2:10.18.2`
import $ivy.`edu.holycross.shot.cite::xcite:4.1.1`
import $ivy.`edu.holycross.shot::scm:7.2.0`
import $ivy.`edu.holycross.shot::dse:6.0.4`
import $ivy.`edu.holycross.shot::citebinaryimage:3.1.1`
import $ivy.`edu.holycross.shot::citeobj:7.3.4`
import $ivy.`edu.holycross.shot::citerelations:2.5.2`
import $ivy.`edu.holycross.shot::cex:6.3.3`
import $ivy.`edu.holycross.shot::greek:2.3.3`


[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                     
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                          
[39m
[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                        
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                
[39m

## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [3]:
// Import some CITE libraries
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.scm._
import edu.holycross.shot.citeobj._
import edu.holycross.shot.citerelation._
import edu.holycross.shot.dse._
import edu.holycross.shot.citebinaryimage._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.greek._

import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}


[32mimport [39m[36medu.holycross.shot.cite._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.scm._
[39m
[32mimport [39m[36medu.holycross.shot.citeobj._
[39m
[32mimport [39m[36medu.holycross.shot.citerelation._
[39m
[32mimport [39m[36medu.holycross.shot.dse._
[39m
[32mimport [39m[36medu.holycross.shot.citebinaryimage._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.greek._

[39m
[32mimport [39m[36malmond.display.UpdatableDisplay
[39m
[32mimport [39m[36malmond.interpreter.api.DisplayData.ContentType
[39m
[32mimport [39m[36malmond.interpreter.api.{DisplayData, OutputHandler}
[39m

### Set Up Plotting

In [4]:
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`
import plotly._, plotly.element._, plotly.layout._, plotly.Almond._

// if you want to have the plots available without an internet connection:
// init(offline=true)

// restrict the output height to avoid scrolling in output cells
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)

[32mimport [39m[36m$ivy.$                                      
[39m
[32mimport [39m[36mplotly._, plotly.element._, plotly.layout._, plotly.Almond._

// if you want to have the plots available without an internet connection:
// init(offline=true)

// restrict the output height to avoid scrolling in output cells
[39m

## Load a CITE Library

In [5]:
//val filePath = s"https://raw.githubusercontent.com/homermultitext/hmt-archive/master/releases-cex/hmt-2020f.cex"
//val lib: CiteLibrary = CiteLibrarySource.fromUrl(filePath)

val filePath = s"allen_iliad_all.cex"
val lib: CiteLibrary = CiteLibrarySource.fromFile(filePath)

Apr 12, 2020 1:18:34 PM wvlet.log.Logger log
INFO: Building text repo from cex ...
Apr 12, 2020 1:18:36 PM wvlet.log.Logger log
INFO: Building collection repo from cex ...
Apr 12, 2020 1:18:36 PM wvlet.log.Logger log
INFO: Building relations from cex ...
Apr 12, 2020 1:18:36 PM wvlet.log.Logger log
INFO: All library components built.


[36mfilePath[39m: [32mString[39m = [32m"allen_iliad_all.cex"[39m
[36mlib[39m: [32mCiteLibrary[39m = [33mCiteLibrary[39m(
  [32m"CEX Library created by CEXWriter"[39m,
...

Get parts of the CITE Library in convenient form:

In [6]:
lazy val tr: TextRepository = lib.textRepository.get
lazy val corp: Corpus = tr.corpus
lazy val cat: Catalog = tr.catalog
lazy val colls: CiteCollectionRepository = lib.collectionRepository.get
lazy val rels: CiteRelationSet = lib.relationSet.get
lazy val myDseVec: DseVector = DseVector.fromCiteLibrary(lib)

## Count Characters!

In [7]:

val iliadUrn = CtsUrn("urn:cts:greekLit:tlg0012.tlg001.allen:")
val regularCorpNodes = corp.nodes.filter( n => {
    n.urn.dropPassage == iliadUrn
})

val iliadLemmUrn = CtsUrn("urn:cts:greekLit:tlg0012.tlg001.allen.lemmatizedToken_merged:")
val lemmCorpNodes = corp.nodes.filter( n => {
    n.urn.dropPassage == iliadLemmUrn
})

val asciiRegularIliad: Vector[String] = {
    regularCorpNodes.map( n => {
        val grkText = n.text
        val grkObj = LiteraryGreekString(grkText)
        grkObj.stripAccent.ascii.replaceAll("[)(]","")
    })
}

println(asciiRegularIliad.head)

val asciiLemmatizedIliad: Vector[String] = {
    lemmCorpNodes.map( n => {
        val grkText = n.text
        val grkObj = LiteraryGreekString(grkText)
        grkObj.stripAccent.ascii.replaceAll("[)(]","")
    })
}

println(asciiLemmatizedIliad.head)




mhnin aeide qea *phlhi+adew *axilhos
mhnis aeidw qea *phlhi+adew# *axilhos


[36miliadUrn[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:"[39m)
[36mregularCorpNodes[39m: [32mVector[39m[[32mCitableNode[39m] = [33mVector[39m(
  [33mCitableNode[39m(
...
[36miliadLemmUrn[39m: [32mCtsUrn[39m = [33mCtsUrn[39m(
  [32m"urn:cts:greekLit:tlg0012.tlg001.allen.lemmatizedToken_merged:"[39m
)
[36mlemmCorpNodes[39m: [32mVector[39m[[32mCitableNode[39m] = [33mVector[39m(
  [33mCitableNode[39m(
...
[36masciiRegularIliad[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"mhnin aeide qea *phlhi+adew *axilhos"[39m,
...
[36masciiLemmatizedIliad[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"mhnis aeidw qea *phlhi+adew# *axilhos"[39m,
...

In [8]:
val bigNormString = asciiRegularIliad.mkString.replaceAll("[*#+.,; ]","").replaceAll("\\|","i")

val bigLemString = asciiLemmatizedIliad.mkString.replaceAll("[*#+.,; ]","").replaceAll("\\|","i")

[36mbigNormString[39m: [32mString[39m = [32m"mhninaeideqeaphlhiadewaxilhosoulomenhnhmuriaxaioisalgeeqhkepollasdifqimousyuxasaidiproiayenhrwwnautousdeelwriateuxekunessinoiwnoisitepasidiosdeteleietoboulheco[39m...
[36mbigLemString[39m: [32mString[39m = [32m"mhnisaeidwqeaphlhiadewaxilhosoulomenososomuriosaxaioisalgewalgostiqhmipolusdeifqimosyuxhaidiproiayenhrwsautosdeelwrionteuxwkuwnoiwnostepasispasdiosdetelewboulh[39m...

In [9]:
val normGrouped: Vector[ (Char, Vector[Char])] = bigNormString.toVector.groupBy( c => c).toVector
val lemGrouped: Vector[ (Char, Vector[Char])] = bigLemString.toVector.groupBy( c => c).toVector



[36mnormGrouped[39m: [32mVector[39m[([32mChar[39m, [32mVector[39m[[32mChar[39m])] = [33mVector[39m(
  (
...
[36mlemGrouped[39m: [32mVector[39m[([32mChar[39m, [32mVector[39m[[32mChar[39m])] = [33mVector[39m(
  (
...

In [10]:
val normHisto = normGrouped.map( g => {
    val c = g._1
    val n = g._2.size
    (c, n)
}).sortBy(_._2)

val lemHisto = lemGrouped.map( g => {
    val c = g._1
    val n = g._2.size
    (c, n)
}).sortBy(_._2)

[36mnormHisto[39m: [32mVector[39m[([32mChar[39m, [32mInt[39m)] = [33mVector[39m(
  ([32m'y'[39m, [32m668[39m),
...
[36mlemHisto[39m: [32mVector[39m[([32mChar[39m, [32mInt[39m)] = [33mVector[39m(
  ([32m'y'[39m, [32m559[39m),
...

In [11]:
for ( h <- lemHisto ) println( h )

(y,559)
(c,1706)
(z,2583)
(b,3111)
(f,6356)
(q,8383)
(x,8843)
(g,9843)
(k,17382)
(h,18379)
(d,19364)
(l,20293)
(p,21895)
(m,25410)
(u,25575)
(t,27079)
(r,30076)
(w,31874)
(n,31992)
(s,55958)
(i,57899)
(e,64259)
(o,65226)
(a,68502)


### Plot the Histogram

In [12]:
val normCharSeq = normHisto.map(_._1.toString).toSeq
val normCountSeq = normHisto.map(_._2).toSeq

val normalGreek = Scatter(
  normCharSeq,
  normCountSeq
)

val lemCharSeq = lemHisto.map(_._1.toString).toSeq
val lemCountSeq = lemHisto.map(_._2).toSeq

val lemmatizedGreek = Scatter(
  normCharSeq,
  lemCountSeq
)

//val data = Seq(lemmatizedGreek)
//val data = Seq(normalGreek)
//val data = Seq(lemmatizedGreek, normalGreek)
val data = Seq(normalGreek, lemmatizedGreek)

plot(data)

[36mnormCharSeq[39m: [32mcollection[39m.[32mimmutable[39m.[32mSeq[39m[[32mString[39m] = [33mVector[39m(
  [32m"y"[39m,
...
[36mnormCountSeq[39m: [32mcollection[39m.[32mimmutable[39m.[32mSeq[39m[[32mInt[39m] = [33mVector[39m(
  [32m668[39m,
...
[36mnormalGreek[39m: [32mScatter[39m = [33mScatter[39m(
  [33mSome[39m(
...
[36mlemCharSeq[39m: [32mcollection[39m.[32mimmutable[39m.[32mSeq[39m[[32mString[39m] = [33mVector[39m(
  [32m"y"[39m,
...
[36mlemCountSeq[39m: [32mcollection[39m.[32mimmutable[39m.[32mSeq[39m[[32mInt[39m] = [33mVector[39m(
  [32m559[39m,
...
[36mlemmatizedGreek[39m: [32mScatter[39m = [33mScatter[39m(
  [33mSome[39m(
...
[36mdata[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
...
[36mres11_7[39m: [32mString[39m = [32m"plot-996b3b04-d63d-422c-9648-839b1031d780"[39m