# Working with Alignments in CITE 2

## Configuring CITE libraries for almond kernel

First, we'll make a bintray repository with CITE libraries available to your almond kernel.

In [1]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [2]:
import $ivy.`edu.holycross.shot::ohco2:10.16.0`
import $ivy.`edu.holycross.shot.cite::xcite:4.1.1`
import $ivy.`edu.holycross.shot::scm:7.2.0`
import $ivy.`edu.holycross.shot::dse:5.2.2`
import $ivy.`edu.holycross.shot::citebinaryimage:3.1.1`
import $ivy.`edu.holycross.shot::citeobj:7.3.4`
import $ivy.`edu.holycross.shot::citerelations:2.5.2`
import $ivy.`edu.holycross.shot::cex:6.3.3`
import $ivy.`edu.holycross.shot::greek:2.3.3`


[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                     
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                          
[39m
[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                        
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                
[39m

## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [3]:
// Import some CITE libraries
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.scm._
import edu.holycross.shot.citeobj._
import edu.holycross.shot.citerelation._
import edu.holycross.shot.dse._
import edu.holycross.shot.citebinaryimage._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.greek._

import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}

import java.io.File
import java.io.PrintWriter

import scala.io.Source


[32mimport [39m[36medu.holycross.shot.cite._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.scm._
[39m
[32mimport [39m[36medu.holycross.shot.citeobj._
[39m
[32mimport [39m[36medu.holycross.shot.citerelation._
[39m
[32mimport [39m[36medu.holycross.shot.dse._
[39m
[32mimport [39m[36medu.holycross.shot.citebinaryimage._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.greek._

[39m
[32mimport [39m[36malmond.display.UpdatableDisplay
[39m
[32mimport [39m[36malmond.interpreter.api.DisplayData.ContentType
[39m
[32mimport [39m[36malmond.interpreter.api.{DisplayData, OutputHandler}

[39m
[32mimport [39m[36mjava.io.File
[39m
[32mimport [39m[36mjava.io.PrintWriter

[39m
[32mimport [39m[36mscala.io.Source
[39m

## Useful Functions

Save a string:

In [4]:
def saveString(s:String, filePath:String = "", fileName:String = "temp.txt"):Unit = {
		 val writer = new PrintWriter(new File(s"${filePath}${fileName}"))
         writer.write(s)
         writer.close()
	}

defined [32mfunction[39m [36msaveString[39m

Pretty Print many things:

In [5]:
def showMe(v:Any):Unit = {
  v match {
    case _:StringHistogram => {
        for ( h <- v.asInstanceOf[StringHistogram].histogram ) {
            println(s"${h.count}\t${h.s}")
        }
    }
  	case _:Corpus => {
  		for ( n <- v.asInstanceOf[Corpus].nodes) {
  			println(s"${n.urn.passageComponent}\t\t${n.text}")
  		}	
  	}
    case _:Vector[Any] => {
       for (a <- v.asInstanceOf[Vector[Any]]){
           a match {
               case _:Corpus => {
                println("---------")
                  println( s"${a.asInstanceOf[Corpus].nodes.head.urn.dropPassage}")
                  println("")
                  showMe(a)
                  println("")
               }
               case _:CtsUrn => println(s"${a}")
               case _:Cite2Urn => println(s"${a}")
               case _ => {
                   println("-----------")
                   showMe(a)
               }
           }
       }
    }
    case _:Iterable[Any] => println(s"""\n----\n${v.asInstanceOf[Iterable[Any]].mkString("\n")}\n----\n""")
    case _ => println(s"\n-----\n${v}\n----\n")
  }
}

defined [32mfunction[39m [36mshowMe[39m

## Load Library

We will load a Version-level, bilingual file, to start:

In [6]:
val cexPath = "cex/Catullus1-aligned.cex"
val lib = CiteLibrary(scala.io.Source.fromFile(cexPath).mkString)

Feb 27, 2020 8:00:02 AM wvlet.log.Logger log
INFO: Building text repo from cex ...
Feb 27, 2020 8:00:02 AM wvlet.log.Logger log
INFO: Building collection repo from cex ...
Feb 27, 2020 8:00:02 AM wvlet.log.Logger log
INFO: Building relations from cex ...
Feb 27, 2020 8:00:02 AM wvlet.log.Logger log
INFO: All library components built.


[36mcexPath[39m: [32mString[39m = [32m"cex/Catullus1-aligned.cex"[39m
[36mlib[39m: [32mCiteLibrary[39m = [33mCiteLibrary[39m(
  [32m"CITE Library generated by the Ducat application, Thu Nov 07 2019 12:30:10 GMT-0500 (EST)"[39m,
  [33mCite2Urn[39m([32m"urn:cite2:cex:ducatauto.201910:12_30_10_534"[39m),
  [32m"CC Share Alike."[39m,
  [33mVector[39m(),
  [33mSome[39m(
    [33mTextRepository[39m(
      [33mCorpus[39m(
        [33mVector[39m(
          [33mCitableNode[39m(
            [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.1.0"[39m),
            [32m"Cui "[39m
          ),
          [33mCitableNode[39m(
            [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.1.1"[39m),
            [32m"dono "[39m
          ),
          [33mCitableNode[39m(
            [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.1.2"[39m),
            [32m"lepidum "[39m
          ),
          [3

Get parts of the library where we can use them:

In [7]:
lazy val tr: TextRepository = lib.textRepository.get
lazy val corp: Corpus = tr.corpus
lazy val cat: Catalog = tr.catalog

## Alignment-Specific Setup

Alignment collections will be recorded as belonging to this data-model:

In [8]:
val alignModel = Cite2Urn("urn:cite2:cite:datamodels.v1:alignment")

[36malignModel[39m: [32mCite2Urn[39m = [33mCite2Urn[39m([32m"urn:cite2:cite:datamodels.v1:alignment"[39m)

This is the CITE Relations verb that “glues” passages to a given alignment:

In [9]:
val alignVerb = Cite2Urn("urn:cite2:cite:verbs.v1:aligns")

[36malignVerb[39m: [32mCite2Urn[39m = [33mCite2Urn[39m([32m"urn:cite2:cite:verbs.v1:aligns"[39m)

We can get a Vector of Alignment-collections:

In [10]:
val alignmentCollections: Vector[Cite2Urn] = lib.collectionsForModel(alignModel)

[36malignmentCollections[39m: [32mVector[39m[[32mCite2Urn[39m] = [33mVector[39m(
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:"[39m)
)

## Aligmnment-Specific Functions

(These functions assume values defined above. They do not do elaborate checking for necessary components of a CITE Library, for example. They are not ready to be abstracted out of this notebook!)

### `alignmentsForPassage( psg: CtsUrn ): Vector[Cite2Urn]`

In [11]:
def alignmentsForPassage( psg: CtsUrn ): Vector[Cite2Urn] = {
    
    val crs: CiteRelationSet = lib.relationSet.get.verb(alignVerb) // assumption the Option is not None!
    // The "psg" might be a range or a container, so expand it
    val allPsgs: Vector[CtsUrn] = corp.validReff(psg) :+ psg distinct
    
    val allRelations: Set[CiteTriple] = crs.relations.filter( t => {
        allPsgs.contains( t.urn2.asInstanceOf[CtsUrn])
    })
    
    allRelations.map(_.urn1.asInstanceOf[Cite2Urn]).toVector.distinct
}

defined [32mfunction[39m [36malignmentsForPassage[39m

Test:

In [12]:
val testPsgU = CtsUrn("urn:cts:latinLit:phi0472.phi001.merrill.token:1.4.1")
val testAlignments: Vector[Cite2Urn] = alignmentsForPassage( testPsgU )
assert ( testAlignments.size == 2 )
assert ( testAlignments.contains( Cite2Urn("urn:cite2:ducat:alignments.temp:20191012_31_11_675_0") ))

[36mtestPsgU[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.4.1"[39m)
[36mtestAlignments[39m: [32mVector[39m[[32mCite2Urn[39m] = [33mVector[39m(
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_13"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_0"[39m)
)

### `passagesForAlignment(  alignment: Cite2Urn  ): Vector[CtsUrn]`

In [13]:
def passagesForAlignment(  alignment: Cite2Urn  ): Vector[CtsUrn] = {
    val crs: CiteRelationSet = lib.relationSet.get.verb(alignVerb) // assumption the Option is not None!
    val rels: CiteRelationSet = crs.urn1Match(alignment)
    val unsortedUrns: Vector[CtsUrn] = rels.relations.map( r => {
        r.urn2.asInstanceOf[CtsUrn]
    }).toVector
    corp.sortPassages(unsortedUrns)
}

def passagesForAlignment( alignments: Vector[Cite2Urn] ): Vector[CtsUrn] = {
    val psgs: Vector[CtsUrn] = alignments.map( a => {
        passagesForAlignment( a )
    }).flatten
    corp.sortPassages(psgs)
}

defined [32mfunction[39m [36mpassagesForAlignment[39m
defined [32mfunction[39m [36mpassagesForAlignment[39m

Test:

In [14]:
val testPassages: Vector[CtsUrn] = passagesForAlignment( testAlignments.head )
assert( testPassages.size == 2 )


[36mtestPassages[39m: [32mVector[39m[[32mCtsUrn[39m] = [33mVector[39m(
  [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.4.1"[39m),
  [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.ozlam.token:1.4.3"[39m)
)

### `textsForAlignment( alignment: Cite2Urn): Vector[Corpus]`

In [15]:
def textsForAlignment( alignment: Cite2Urn): Vector[Corpus] = {
    val psgs: Vector[CtsUrn] = passagesForAlignment( alignment )
    (corp ~~ psgs).chunkByText
}

def textsForAlignment( alignments: Vector[Cite2Urn] ): Vector[Corpus] = {
    val psgs: Vector[CtsUrn] = alignments.map( a => {
        passagesForAlignment(a)
    }).flatten
    (corp ~~ psgs).chunkByText
}

defined [32mfunction[39m [36mtextsForAlignment[39m
defined [32mfunction[39m [36mtextsForAlignment[39m

Test:

In [16]:
val testTexts: Vector[Corpus] = textsForAlignment( testAlignments.head )
assert( testTexts.size == 2 )
showMe(testTexts)

---------
urn:cts:latinLit:phi0472.phi001.merrill.token:

1.4.1		esse 

---------
urn:cts:latinLit:phi0472.phi001.ozlam.token:

1.4.3		were 



[36mtestTexts[39m: [32mVector[39m[[32mCorpus[39m] = [33mVector[39m(
  [33mCorpus[39m(
    [33mVector[39m(
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.4.1"[39m),
        [32m"esse "[39m
      )
    )
  ),
  [33mCorpus[39m(
    [33mVector[39m(
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.ozlam.token:1.4.3"[39m),
        [32m"were "[39m
      )
    )
  )
)

### `alignedTexts( psg: CtsUrn ): Vector[Corpus]`

In [17]:
def alignedTexts( psg: CtsUrn ): Vector[Corpus] = {
    val alignments: Vector[Cite2Urn] = alignmentsForPassage(psg)
    val texts: Vector[Corpus] = textsForAlignment( alignments )
    texts
}

def alignedTexts( psgs: Vector[CtsUrn] ): Vector[Corpus] = {
    
    val alignments: Vector[Cite2Urn] = psgs.map ( p => {
        alignmentsForPassage(p)
    }).flatten
    val texts: Vector[Corpus] = textsForAlignment( alignments )
    texts
}

defined [32mfunction[39m [36malignedTexts[39m
defined [32mfunction[39m [36malignedTexts[39m

Test:

In [18]:
val testAlignedTexts1: Vector[Corpus] = alignedTexts(testPsgU)
assert( testAlignedTexts1.size == 2 )
showMe(testAlignedTexts1)

---------
urn:cts:latinLit:phi0472.phi001.merrill.token:

1.4.1		esse 

---------
urn:cts:latinLit:phi0472.phi001.ozlam.token:

1.4.3		were 



[36mtestAlignedTexts1[39m: [32mVector[39m[[32mCorpus[39m] = [33mVector[39m(
  [33mCorpus[39m(
    [33mVector[39m(
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.4.1"[39m),
        [32m"esse "[39m
      )
    )
  ),
  [33mCorpus[39m(
    [33mVector[39m(
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.ozlam.token:1.4.3"[39m),
        [32m"were "[39m
      )
    )
  )
)

### `alignmentsForString( s: String, tokenMatch: Boolean = true ): Vector[Corpus]`

In [37]:
def alignmentsForString( s: String, tokenMatch: Boolean = true ): Vector[Cite2Urn] = {
    val strCorpus: Corpus = {
        if (tokenMatch) {
            val nodes: Vector[CitableNode] = corp.nodes.filter( _.text.trim == s.trim)
            Corpus(nodes)
        } else {
            corp.find(s)
        }
    }
    
    val strUrns: Vector[CtsUrn] = strCorpus.urns
    
    val crs: CiteRelationSet = {
      val allCrs =  lib.relationSet.get.verb(alignVerb) // assumption the Option is not None!
    
      val foundSet = allCrs.relations.filter( u2 => {
          strCorpus.urns.contains(u2.urn2.asInstanceOf[CtsUrn])
      }).toSet
      CiteRelationSet(foundSet)
    }
    val foundAlignments: Vector[Cite2Urn] = crs.relations.map( a => {
        a.urn1.asInstanceOf[Cite2Urn]
    }).toVector.distinct  
    foundAlignments
}

defined [32mfunction[39m [36malignmentsForString[39m

Test:

In [38]:
val testString = "you"
val testStringAlignments: Vector[Cite2Urn] = alignmentsForString(testString)
showMe(testStringAlignments)

urn:cite2:ducat:alignments.temp:20191012_31_11_675_16
urn:cite2:ducat:alignments.temp:20191012_31_11_675_23
urn:cite2:ducat:alignments.temp:20191012_31_11_675_17
urn:cite2:ducat:alignments.temp:20191010_16_18_248_6
urn:cite2:ducat:alignments.temp:20191010_16_18_248_9
urn:cite2:ducat:alignments.temp:20191010_16_18_248_37
urn:cite2:ducat:alignments.temp:20191012_31_11_675_36
urn:cite2:ducat:alignments.temp:20191010_16_18_248_18


[36mtestString[39m: [32mString[39m = [32m"you"[39m
[36mtestStringAlignments[39m: [32mVector[39m[[32mCite2Urn[39m] = [33mVector[39m(
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_16"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_23"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_17"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_6"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_9"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_37"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_36"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_18"[39m)
)

# Playground

## Catullus-Specific Things

Make URN work easier:

In [39]:
val catullusUrn = CtsUrn("urn:cts:latinLit:phi0472.phi001:")
val latUrn = CtsUrn("urn:cts:latinLit:phi0472.phi001.merrill:")
val engUrn = CtsUrn("urn:cts:latinLit:phi0472.phi001.ozlam:")


[36mcatullusUrn[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001:"[39m)
[36mlatUrn[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill:"[39m)
[36mengUrn[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.ozlam:"[39m)

## See Alignments for a Passage

**Edit This!** A passage of Catullus 1:

In [40]:
val myPassage = "1.1"

[36mmyPassage[39m: [32mString[39m = [32m"1.1"[39m

**See Alignments!**

In [41]:
val myAlignedTexts: Vector[Corpus] = alignedTexts( catullusUrn.addPassage(myPassage))

showMe(myAlignedTexts)

---------
urn:cts:latinLit:phi0472.phi001.merrill.token:

1.1.3		novum 
1.1.4		libellum
1.1.1		dono 
1.1.0		Cui 
1.1.2		lepidum 

---------
urn:cts:latinLit:phi0472.phi001.ozlam.token:

1.1.5		this 
1.1.6		charming 
1.1.7		slim 
1.1.8		volume,
1.1.2		do 
1.1.3		I 
1.1.4		dedicate 
1.1.0		To 
1.1.1		whom 



[36mmyAlignedTexts[39m: [32mVector[39m[[32mCorpus[39m] = [33mVector[39m(
  [33mCorpus[39m(
    [33mVector[39m(
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.1.3"[39m),
        [32m"novum "[39m
      ),
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.1.4"[39m),
        [32m"libellum"[39m
      ),
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.1.1"[39m),
        [32m"dono "[39m
      ),
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.1.0"[39m),
        [32m"Cui "[39m
      ),
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:latinLit:phi0472.phi001.merrill.token:1.1.2"[39m),
        [32m"lepidum "[39m
      )
    )
  ),
  [33mCorpus[39m(
    [33mVector[39m(
      [33mCitableNode[39m(
        [33mCt

**Dynamic Lexicon!** Edit your preferred word:

In [46]:
val myWord = "you"
val tokenMatch = false
val dynLex: Vector[Cite2Urn] = alignmentsForString( myWord, tokenMatch )

[36mmyWord[39m: [32mString[39m = [32m"you"[39m
[36mtokenMatch[39m: [32mBoolean[39m = false
[36mdynLex[39m: [32mVector[39m[[32mCite2Urn[39m] = [33mVector[39m(
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_16"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_23"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_38"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_17"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_6"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_9"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_37"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191012_31_11_675_36"[39m),
  [33mCite2Urn[39m([32m"urn:cite2:ducat:alignments.temp:20191010_16_18_248_18"[39m),
  [33mCite2Urn[39m([32

Show the results:

In [47]:
for ( dl <- dynLex) {
    println(s"==========\nAlignment\n")
    println(s"${dl}")
    val texts: Vector[Corpus] = textsForAlignment(dl)
    showMe(texts)
}

Alignment

urn:cite2:ducat:alignments.temp:20191012_31_11_675_16
---------
urn:cts:latinLit:phi0472.phi001.merrill.token:

1.3.1		tibi;

---------
urn:cts:latinLit:phi0472.phi001.ozlam.token:

1.3.0		For 
1.3.1		you 

Alignment

urn:cite2:ducat:alignments.temp:20191012_31_11_675_23
---------
urn:cts:latinLit:phi0472.phi001.merrill.token:

1.9.0		qualecumque, 

---------
urn:cts:latinLit:phi0472.phi001.ozlam.token:

1.9.0		and 
1.9.1		whatever 
1.9.2		you 
1.9.3		like, 

Alignment

urn:cite2:ducat:alignments.temp:20191012_31_11_675_38
---------
urn:cts:latinLit:phi0472.phi001.merrill.token:

1.8.2		tibi 

---------
urn:cts:latinLit:phi0472.phi001.ozlam.token:

1.8.4		for 
1.8.5		yourself 

Alignment

urn:cite2:ducat:alignments.temp:20191012_31_11_675_17
---------
urn:cts:latinLit:phi0472.phi001.merrill.token:

1.3.4		tu 

---------
urn:cts:latinLit:phi0472.phi001.ozlam.token:

1.3.4		you 

Alignment

urn:cite2:ducat:alignments.temp:20191010_16_18_248_6
---------
urn:cts:latinLit:phi0472