# Interact with the Homer Multitext Dataset

## Configuring CITE libraries for almond kernel

First, we'll make a bintray repository with CITE libraries available to your almond kernel.

In [1]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [2]:
import $ivy.`edu.holycross.shot::ohco2:10.18.2`
import $ivy.`edu.holycross.shot.cite::xcite:4.1.1`
import $ivy.`edu.holycross.shot::scm:7.2.0`
import $ivy.`edu.holycross.shot::dse:6.0.4`
import $ivy.`edu.holycross.shot::citebinaryimage:3.1.1`
import $ivy.`edu.holycross.shot::citeobj:7.3.4`
import $ivy.`edu.holycross.shot::citerelations:2.5.2`
import $ivy.`edu.holycross.shot::cex:6.3.3`


[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                     
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                          
[39m
[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                        
[39m
[32mimport [39m[36m$ivy.$                              
[39m

## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [3]:
// Import some CITE libraries
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.scm._
import edu.holycross.shot.citeobj._
import edu.holycross.shot.citerelation._
import edu.holycross.shot.dse._
import edu.holycross.shot.citebinaryimage._
import edu.holycross.shot.ohco2._

import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}

[32mimport [39m[36medu.holycross.shot.cite._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.scm._
[39m
[32mimport [39m[36medu.holycross.shot.citeobj._
[39m
[32mimport [39m[36medu.holycross.shot.citerelation._
[39m
[32mimport [39m[36medu.holycross.shot.dse._
[39m
[32mimport [39m[36medu.holycross.shot.citebinaryimage._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._

[39m
[32mimport [39m[36malmond.display.UpdatableDisplay
[39m
[32mimport [39m[36malmond.interpreter.api.DisplayData.ContentType
[39m
[32mimport [39m[36malmond.interpreter.api.{DisplayData, OutputHandler}[39m

## Some Utilities

In [11]:
def showMe(v:Any):Unit = {
  v match {
    case _:StringHistogram => {
        for ( h <- v.asInstanceOf[StringHistogram].histogram ) {
            println(s"${h.count}\t${h.s}")
        }
    }
  	case _:Corpus => {
  		for ( n <- v.asInstanceOf[Corpus].nodes) {
  			println(s"${n.urn.passageComponent}\t\t${n.text}")
  		}	
  	}
    case _:Vector[Any] => println(s"""\n----\n${v.asInstanceOf[Vector[Any]].mkString("\n")}\n----\n""")
    case _:Iterable[Any] => println(s"""\n----\n${v.asInstanceOf[Iterable[Any]].mkString("\n")}\n----\n""")
    case _ => println(s"\n-----\n${v}\n----\n")
  }
}

defined [32mfunction[39m [36mshowMe[39m

## Load a CITE Library

In [15]:
val filePath = s"https://raw.githubusercontent.com/homermultitext/hmt-archive/master/releases-cex/hmt-2020f.cex"
val lib: CiteLibrary = CiteLibrarySource.fromUrl(filePath)

//val filePath = s"hmt-test.cex"
//val lib: CiteLibrary = CiteLibrarySource.fromFile(filePath)

[34m2020-03-31 22:47:33.616-0400[0m  [36minfo[0m [[37mCiteLibrary[0m] [36mBuilding text repo from cex ...[0m  [34m- (CiteLibrary.scala:160)[0m
[34m2020-03-31 22:47:34.551-0400[0m  [36minfo[0m [[37mCiteLibrary[0m] [36mBuilding collection repo from cex ...[0m  [34m- (CiteLibrary.scala:163)[0m
[34m2020-03-31 22:47:47.251-0400[0m  [36minfo[0m [[37mCiteLibrary[0m] [36mBuilding relations from cex ...[0m  [34m- (CiteLibrary.scala:166)[0m
[34m2020-03-31 22:47:48.530-0400[0m  [36minfo[0m [[37mCiteLibrary[0m] [36mAll library components built.[0m  [34m- (CiteLibrary.scala:168)[0m


[36mfilePath[39m: [32mString[39m = [32m"https://raw.githubusercontent.com/homermultitext/hmt-archive/master/releases-cex/hmt-2020f.cex"[39m
[36mlib[39m: [32mCiteLibrary[39m = [33mCiteLibrary[39m(
  [32m"Homer Multitext project, release 2020f"[39m,
  [33mCite2Urn[39m([32m"urn:cite2:hmt:publications.cex.2020f:all"[39m),
  [32m"Creative Commons Attribution, Non-Commercial 4.0 License <https://creativecommons.org/licenses/by-nc/4.0/>."[39m,
  [33mVector[39m(
    [33mCiteNamespace[39m([32m"hmt"[39m, http://www.homermultitext.org/citens/hmt),
    [33mCiteNamespace[39m([32m"greekLit"[39m, http://chs.harvard.edu/ctsns/greekLit)
  ),
  [33mSome[39m(
    [33mTextRepository[39m(
      [33mCorpus[39m(
        [33mVector[39m(
          [33mCitableNode[39m(
            [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg5026.msA.hmt:1.1.lemma"[39m),
            [32m"\u03bc\u1fc6\u03bd\u03b9\u03bd \u1f04\u03b5\u03b9\u03b4\u03b5"[39m
          ),
          [33mCitabl

Get parts of the CITE Library in convenient form:

In [16]:
lazy val tr: TextRepository = lib.textRepository.get
lazy val venetusACorpus: Corpus = tr.corpus
lazy val venetusACatalog: Catalog = tr.catalog
lazy val venetusACollections: CiteCollectionRepository = lib.collectionRepository.get
lazy val venetusARelations: CiteRelationSet = lib.relationSet.get
lazy val venetusAImageTextAssociations: DseVector = DseVector.fromCiteLibrary(lib)

## Some Numbers

Getting some numbers for the Homer Multitext 2020g data release:

In [17]:
val numberOfTexts = venetusACatalog.texts.size
val numberOfPassages = venetusACorpus.size
val numberOfObjects = venetusACollections.citableObjects.size
val numberOfRelations = venetusARelations.size
val numberOfImageTextAssociations = venetusAImageTextAssociations.size
val venAImages = venetusACollections.citableObjects.filter(_.urn ~~ Cite2Urn("urn:cite2:hmt:vaimg.2017a:")).size


[36mnumberOfTexts[39m: [32mInt[39m = [32m6[39m
[36mnumberOfPassages[39m: [32mInt[39m = [32m29976[39m
[36mnumberOfObjects[39m: [32mInt[39m = [32m30331[39m
[36mnumberOfRelations[39m: [32mInt[39m = [32m71861[39m
[36mnumberOfImageTextAssociations[39m: [32mInt[39m = [32m22233[39m
[36mvenAImages[39m: [32mInt[39m = [32m966[39m

## Debugging

In [18]:
val u1 = CtsUrn("urn:cts:greekLit:tlg0012.tlg001.msA:23")

[36mu1[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23"[39m)

In [28]:
val dseVec: DseVector = DseVector.fromCiteLibrary(lib)
val surfaceUrns: Vector[Cite2Urn] = dseVec.passages.map(_.surface).distinct
val sortedPassages: Vector[CtsUrn] = {
    val unsortedPassages: Vector[CtsUrn] = dseVec.passages.map(_.passage).filter( _ ~~ u1 )
    val reducedCorpus = venetusACorpus ~~ CtsUrn("urn:cts:greekLit:tlg0012.tlg001.msA:23")
    reducedCorpus.sortPassages(unsortedPassages)
}


[36mdseVec[39m: [32mDseVector[39m = [33mDseVector[39m(
  [33mVector[39m(
    [33mDsePassage[39m(
      [33mCite2Urn[39m([32m"urn:cite2:hmt:va_dse.v1:il2168"[39m),
      [32m"DSE record for Iliad 4.217"[39m,
      [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:4.217"[39m),
      [33mCite2Urn[39m(
        [32m"urn:cite2:hmt:vaimg.2017a:VA055VN_0557@0.4865,0.3644,0.3954,0.0391"[39m
      ),
      [33mCite2Urn[39m([32m"urn:cite2:hmt:msA.v1:55v"[39m)
    ),
    [33mDsePassage[39m(
      [33mCite2Urn[39m([32m"urn:cite2:hmt:va_dse.v1:il11826"[39m),
      [32m"DSE record for Iliad 18.529"[39m,
      [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:18.529"[39m),
      [33mCite2Urn[39m(
        [32m"urn:cite2:hmt:vaimg.2017a:VA249RN_0420@0.19,0.6589,0.427,0.0331"[39m
      ),
      [33mCite2Urn[39m([32m"urn:cite2:hmt:msA.v1:249r"[39m)
    ),
    [33mDsePassage[39m(
      [33mCite2Urn[39m([32m"urn:cite2:hmt:va_dse.v1:il6005"[39

In [29]:
val corp23 = (venetusACorpus ~~ CtsUrn("urn:cts:greekLit:tlg0012.tlg001.msA:23")).urns

[36mcorp23[39m: [32mVector[39m[[32mCtsUrn[39m] = [33mVector[39m(
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.1"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.2"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.3"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.4"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.5"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.6"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.7"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.8"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.9"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.10"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.11"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:23.12"[39m),
  [33mCtsUrn[39m([32m"urn:cts:gre

In [32]:
corp23.diff(sortedPassages).size

[36mres31[39m: [32mInt[39m = [32m809[39m

In [33]:
809/23

[36mres32[39m: [32mInt[39m = [32m35[39m