# Making a CEX File from a Text File




## Configuring CITE libraries for almond kernel

First, we'll make a bintray repository with CITE libraries available to your almond kernel.

In [118]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [119]:
import $ivy.`edu.holycross.shot::ohco2:10.16.0`
import $ivy.`edu.holycross.shot.cite::xcite:4.1.1`
import $ivy.`edu.holycross.shot::scm:7.2.0`
import $ivy.`edu.holycross.shot::dse:5.2.2`
import $ivy.`edu.holycross.shot::citebinaryimage:3.1.1`
import $ivy.`edu.holycross.shot::citeobj:7.3.4`
import $ivy.`edu.holycross.shot::citerelations:2.5.2`
import $ivy.`edu.holycross.shot::cex:6.3.3`
import $ivy.`edu.holycross.shot::greek:2.3.3`


[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                     
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                          
[39m
[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                        
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                
[39m

## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [120]:
// Import some CITE libraries
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.scm._
import edu.holycross.shot.citeobj._
import edu.holycross.shot.citerelation._
import edu.holycross.shot.dse._
import edu.holycross.shot.citebinaryimage._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.greek._

import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}

import java.io.File
import java.io.PrintWriter

import scala.io.Source


[32mimport [39m[36medu.holycross.shot.cite._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.scm._
[39m
[32mimport [39m[36medu.holycross.shot.citeobj._
[39m
[32mimport [39m[36medu.holycross.shot.citerelation._
[39m
[32mimport [39m[36medu.holycross.shot.dse._
[39m
[32mimport [39m[36medu.holycross.shot.citebinaryimage._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.greek._

[39m
[32mimport [39m[36malmond.display.UpdatableDisplay
[39m
[32mimport [39m[36malmond.interpreter.api.DisplayData.ContentType
[39m
[32mimport [39m[36malmond.interpreter.api.{DisplayData, OutputHandler}

[39m
[32mimport [39m[36mjava.io.File
[39m
[32mimport [39m[36mjava.io.PrintWriter

[39m
[32mimport [39m[36mscala.io.Source
[39m

## Useful Functions

Save a string:

In [121]:
def saveString(s:String, filePath:String = "", fileName:String = "temp.txt"):Unit = {
		 val writer = new PrintWriter(new File(s"${filePath}${fileName}"))
         writer.write(s)
         writer.close()
	}

defined [32mfunction[39m [36msaveString[39m

Pretty Print many things:

In [122]:
def showMe(v:Any):Unit = {
  v match {
    case _:StringHistogram => {
        for ( h <- v.asInstanceOf[StringHistogram].histogram ) {
            println(s"${h.count}\t${h.s}")
        }
    }
  	case _:Corpus => {
  		for ( n <- v.asInstanceOf[Corpus].nodes) {
  			println(s"${n.urn.passageComponent}\t\t${n.text}")
  		}	
  	}
    case _:Vector[Any] => println(s"""\n----\n${v.asInstanceOf[Vector[Any]].mkString("\n")}\n----\n""")
    case _:Iterable[Any] => println(s"""\n----\n${v.asInstanceOf[Iterable[Any]].mkString("\n")}\n----\n""")
    case _ => println(s"\n-----\n${v}\n----\n")
  }
}

defined [32mfunction[39m [36mshowMe[39m

## Load a Template File

Make the CEX Header

In [123]:
val cexTop: String = """
#!cexversion
3.0

#!citelibrary
name#CEX library
urn#urn:cite2:cex:TEMPCOLL.TEMPVERSION:TEMP_ID
license#CC 3.0 NC-BY

#!ctscatalog
urn#citationScheme#groupName#workTitle#versionLabel#exemplarLabel#online#lang"""

val urnStr = "urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:"
val citationSch = "section/subsection"
val groupName = "Aristotle"
val workTitle = "Poetics"
val versionLabel = "W.H. Fyfe, trans. 1932"
val exemplarLabel = ""
val online = "true"
val lang = "eng"

val headerLine = Vector(
    urnStr,
    citationSch,
    groupName,
    workTitle,
    versionLabel,
    exemplarLabel,
    online,
    lang
).mkString("#")

val cexHeader = cexTop + "\n" + headerLine + "\n\n#!ctsdata\n"

[36mcexTop[39m: [32mString[39m = [32m"""
#!cexversion
3.0

#!citelibrary
name#CEX library
urn#urn:cite2:cex:TEMPCOLL.TEMPVERSION:TEMP_ID
license#CC 3.0 NC-BY

#!ctscatalog
urn#citationScheme#groupName#workTitle#versionLabel#exemplarLabel#online#lang"""[39m
[36murnStr[39m: [32mString[39m = [32m"urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:"[39m
[36mcitationSch[39m: [32mString[39m = [32m"section/subsection"[39m
[36mgroupName[39m: [32mString[39m = [32m"Aristotle"[39m
[36mworkTitle[39m: [32mString[39m = [32m"Poetics"[39m
[36mversionLabel[39m: [32mString[39m = [32m"W.H. Fyfe, trans. 1932"[39m
[36mexemplarLabel[39m: [32mString[39m = [32m""[39m
[36monline[39m: [32mString[39m = [32m"true"[39m
[36mlang[39m: [32mString[39m = [32m"eng"[39m
[36mheaderLine[39m: [32mString[39m = [32m"urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:#section/subsection#Aristotle#Poetics#W.H. Fyfe, trans. 1932##true#eng"[39m
[36mcexHeader[39m: [32mString[39m = [32m"""

Load it the (pre-processed) text file:

In [124]:
val filePath = s"pre_cex_eng.txt"
val lines: Vector[String] = {
    scala.io.Source.fromFile(filePath).mkString.split("\n").toVector.filter( _.size > 0 )
}

[36mfilePath[39m: [32mString[39m = [32m"pre_cex_eng.txt"[39m
[36mlines[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"head#Poetics "[39m,
  [32m"1.1# Let us here deal with Poetry, its essence and its several species, with the characteristic function of each species and the way in which plots must be constructed if the poem is to be a success; and also with the number and character of the constituent parts of a poem, and similarly with all other matters proper to this same inquiry; and let us, as nature directs, begin first with first principles.  "[39m,
  [32m"1.2# Epic poetry, then, and the poetry of tragic drama, and, moreover, comedy and dithyrambic poetry, and most flute-playing and harp-playing, these, speaking generally, may all be said to be \"representations of life.\" "[39m,
  [32m"1.3# But they differ one from another in three ways: either in using means generically different or in representing different objects or in representing objects no

We are looking for embedded Greek words. They will look like this: `<f>A)/GW</f>`. That is, an XML `<f>` element containing Greek in Beta-Code.

We'll start by defining a Regular Expression that will match these:

In [125]:
val fMatcher = "<f>([^<]+)</f>".r

[36mfMatcher[39m: [32mRegex[39m = <f>([^<]+)</f>

We make a Function to return the contents of thos `<f>` elements, for a string:

In [126]:
def getGroupsInMatch( s: String, rx: scala.util.matching.Regex, grp: Int ): Vector[String] = {
    val string = s
    val pattern = rx
    pattern.findAllIn(string).matchData.map( md => {
        md.group(grp)
    }).toVector
}



defined [32mfunction[39m [36mgetGroupsInMatch[39m

We create an overloaded function to report `true` or `false`, based on whether a string matches our regex. (There is no point doing all this to a line that has no `<f>` element!)

In [127]:
def containsMatch( s: String, rx: scala.util.matching.Regex): Boolean = {
    rx.findAllIn(s).size > 0
}

def containsMatch( s: String, rs: String): Boolean = {
    s.contains(rs)
}

defined [32mfunction[39m [36mcontainsMatch[39m
defined [32mfunction[39m [36mcontainsMatch[39m

We make a little function to use the Greek library to convert beta-code to Unicode:

In [128]:
def betaToUnicode( bc: String ): String = {
    val gs = LiteraryGreekString(bc)
    gs.ucode
}

defined [32mfunction[39m [36mbetaToUnicode[39m

We want a Map that has the `<f>` element as they key, and the unicode text (minus the tags) as the value:

In [129]:
def matchesToUcodeReplacementMap( v: Vector[String] ): Map[String, String] = {
    val map: Map[String, String] = v.distinct.map( b => {
        val beta: String = b
        val uni: String = betaToUnicode(b.toLowerCase)
        (s"<f>${b}</f>", uni)
    }).toMap
    map
}

defined [32mfunction[39m [36mmatchesToUcodeReplacementMap[39m

This is the fancy, recursive way we go through a map of replacement-pairs, and replace them, while never using mutable data. 

> **Note** `str.replace(replVec.head._1.toCharArray, replVec.head._2.toCharArray)` is what we use here instead of `str.replaceAll(s, r)`. `.replaceAll` will treat the first parameter as a regular expression; the parenthese and other marks in Beta Code will throw errors. This is the workaround.

In [130]:

def recursiveReplace( str: String, replMap: Map[String, String]): String = {
    
    def doIt( str: String, replVec: Vector[(String, String)]): String = {
        if (replVec.size == 1) {
            val newStr: String = str.replace(replVec.head._1.toCharArray, replVec.head._2.toCharArray)
            newStr
        } else {
            val newStr: String = str.replace(replVec.head._1.toCharArray, replVec.head._2.toCharArray)
            doIt( newStr, replVec.tail)
        }
    }
    
    doIt( str, replMap.toVector )
}



defined [32mfunction[39m [36mrecursiveReplace[39m

Let's test this complex setup!

In [131]:
val testStr: String = """3.6# Their name, they say, for suburb villages is <f>KW=MAI</f>—the Athenians call them "Demes"—and comedians are so called not from <f>KWMA/ZEIN</f>, "to revel," but because they were turned out of the towns and went strolling round the villages( <f>KW=MAI</f>). Their word for action, they add, is <f>DRA=N</f>, whereas the Athenian word is <f>PRA/TTEIN</f>. So much then for the differences, their number, and their nature. """

val matchVec: Vector[String] = getGroupsInMatch(testStr, fMatcher, 1)
val replaceMap: Map[String, String] = matchesToUcodeReplacementMap(matchVec)
val testRepl: String = recursiveReplace( testStr, replaceMap )
println(testRepl)

3.6# Their name, they say, for suburb villages is κῶμαι—the Athenians call them "Demes"—and comedians are so called not from κωμάζειν, "to revel," but because they were turned out of the towns and went strolling round the villages( κῶμαι). Their word for action, they add, is δρᾶν, whereas the Athenian word is πράττειν. So much then for the differences, their number, and their nature. 


[36mtestStr[39m: [32mString[39m = [32m"3.6# Their name, they say, for suburb villages is <f>KW=MAI</f>\u2014the Athenians call them \"Demes\"\u2014and comedians are so called not from <f>KWMA/ZEIN</f>, \"to revel,\" but because they were turned out of the towns and went strolling round the villages( <f>KW=MAI</f>). Their word for action, they add, is <f>DRA=N</f>, whereas the Athenian word is <f>PRA/TTEIN</f>. So much then for the differences, their number, and their nature. "[39m
[36mmatchVec[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"KW=MAI"[39m,
  [32m"KWMA/ZEIN"[39m,
  [32m"KW=MAI"[39m,
  [32m"DRA=N"[39m,
  [32m"PRA/TTEIN"[39m
)
[36mreplaceMap[39m: [32mMap[39m[[32mString[39m, [32mString[39m] = [33mMap[39m(
  [32m"<f>KW=MAI</f>"[39m -> [32m"\u03ba\u1ff6\u03bc\u03b1\u03b9"[39m,
  [32m"<f>KWMA/ZEIN</f>"[39m -> [32m"\u03ba\u03c9\u03bc\u03ac\u03b6\u03b5\u03b9\u03bd"[39m,
  [32m"<f>DRA=N</f>"[39m -> [32m"\u03b4\u03c1\u1fb6\u0

We want to map each line of text (the Vector `lines`), looking for Greek words. When we find some, we will turn Beta-code into Unicode using the library [`edu.holycross.shot.greek`](https://neelsmith.github.io/greek/).

In [132]:
val greekedLines: Vector[String] = {
    lines.map( l => {
        if ( containsMatch( l, fMatcher) ) {
           val matchVec: Vector[String] = getGroupsInMatch(l, fMatcher, 1)
           val replaceMap: Map[String, String] = matchesToUcodeReplacementMap(matchVec)
           recursiveReplace( l, replaceMap )
        } else l
    })
}

[36mgreekedLines[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"head#Poetics "[39m,
  [32m"1.1# Let us here deal with Poetry, its essence and its several species, with the characteristic function of each species and the way in which plots must be constructed if the poem is to be a success; and also with the number and character of the constituent parts of a poem, and similarly with all other matters proper to this same inquiry; and let us, as nature directs, begin first with first principles.  "[39m,
  [32m"1.2# Epic poetry, then, and the poetry of tragic drama, and, moreover, comedy and dithyrambic poetry, and most flute-playing and harp-playing, these, speaking generally, may all be said to be \"representations of life.\" "[39m,
  [32m"1.3# But they differ one from another in three ways: either in using means generically different or in representing different objects or in representing objects not in the same way but in a different manner.  "[39m,
  [32

In [133]:
greekedLines.filter( _.contains("##"))

[36mres132[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m()

## Final Assembly

We can now map our new string vector to prepend the urn. We'll do a little space-normalization while we're at it:

In [139]:
val cexTextVec: Vector[String] = greekedLines.map( l => {
    s"${urnStr}${l.trim}".replaceAll(" +"," ")
                         .replaceAll("# ","#")
                         .replaceAll(""" " """, """ """")
})

[36mcexTextVec[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:head#Poetics"[39m,
  [32m"urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:1.1#Let us here deal with Poetry, its essence and its several species, with the characteristic function of each species and the way in which plots must be constructed if the poem is to be a success; and also with the number and character of the constituent parts of a poem, and similarly with all other matters proper to this same inquiry; and let us, as nature directs, begin first with first principles."[39m,
  [32m"urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:1.2#Epic poetry, then, and the poetry of tragic drama, and, moreover, comedy and dithyrambic poetry, and most flute-playing and harp-playing, these, speaking generally, may all be said to be \"representations of life.\""[39m,
  [32m"urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:1.3#But they differ one from another in three ways: either in using mea

And we make it One Big String:

In [140]:
val cexText: String = cexTextVec.mkString("\n")
val finalCex: String = cexHeader + cexText

[36mcexText[39m: [32mString[39m = [32m"""urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:head#Poetics
urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:1.1#Let us here deal with Poetry, its essence and its several species, with the characteristic function of each species and the way in which plots must be constructed if the poem is to be a success; and also with the number and character of the constituent parts of a poem, and similarly with all other matters proper to this same inquiry; and let us, as nature directs, begin first with first principles.
urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:1.2#Epic poetry, then, and the poetry of tragic drama, and, moreover, comedy and dithyrambic poetry, and most flute-playing and harp-playing, these, speaking generally, may all be said to be "representations of life."
urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:1.3#But they differ one from another in three ways: either in using means generically different or in representing different objects or in representing objec

And we save it!

In [141]:
val fileName = "aristot_poetics_eng.cex"

[36mfileName[39m: [32mString[39m = [32m"aristot_poetics_eng.cex"[39m

In [142]:
saveString(finalCex, "cex/", fileName)

## Final Validation

We want to confirm that we can load this library!

In [143]:
val cexPath = "cex/aristot_poetics_eng.cex"
val lib = CiteLibrary(scala.io.Source.fromFile(cexPath).mkString)

Feb 12, 2020 12:23:18 AM wvlet.log.Logger log
INFO: Building text repo from cex ...
Feb 12, 2020 12:23:18 AM wvlet.log.Logger log
INFO: Building collection repo from cex ...
Feb 12, 2020 12:23:18 AM wvlet.log.Logger log
INFO: Building relations from cex ...
Feb 12, 2020 12:23:18 AM wvlet.log.Logger log
INFO: All library components built.


[36mcexPath[39m: [32mString[39m = [32m"cex/aristot_poetics_eng.cex"[39m
[36mlib[39m: [32mCiteLibrary[39m = [33mCiteLibrary[39m(
  [32m"CEX library"[39m,
  [33mCite2Urn[39m([32m"urn:cite2:cex:TEMPCOLL.TEMPVERSION:TEMP_ID"[39m),
  [32m"CC 3.0 NC-BY"[39m,
  [33mVector[39m(),
  [33mSome[39m(
    [33mTextRepository[39m(
      [33mCorpus[39m(
        [33mVector[39m(
          [33mCitableNode[39m(
            [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:head"[39m),
            [32m"Poetics"[39m
          ),
          [33mCitableNode[39m(
            [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0086.tlg034.fyfe_fu:1.1"[39m),
            [32m"Let us here deal with Poetry, its essence and its several species, with the characteristic function of each species and the way in which plots must be constructed if the poem is to be a success; and also with the number and character of the constituent parts of a poem, and similarly with all other matt