# Indexing Collections to Text

## Configuring CITE libraries for almond kernel

First, we'll make a bintray repository with CITE libraries available to your almond kernel.

In [1]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [2]:
import $ivy.`edu.holycross.shot::ohco2:10.16.0`
import $ivy.`edu.holycross.shot.cite::xcite:4.1.1`
import $ivy.`edu.holycross.shot::scm:7.2.0`
import $ivy.`edu.holycross.shot::dse:5.2.2`
import $ivy.`edu.holycross.shot::citebinaryimage:3.1.1`
import $ivy.`edu.holycross.shot::citeobj:7.3.4`
import $ivy.`edu.holycross.shot::citerelations:2.5.2`
import $ivy.`edu.holycross.shot::cex:6.3.3`


[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                     
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                              
[39m
[32mimport [39m[36m$ivy.$                                          
[39m
[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                        
[39m
[32mimport [39m[36m$ivy.$                              
[39m

## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [3]:
// Import some CITE libraries
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.scm._
import edu.holycross.shot.citeobj._
import edu.holycross.shot.citerelation._
import edu.holycross.shot.dse._
import edu.holycross.shot.citebinaryimage._
import edu.holycross.shot.ohco2._

// Import some other stuff
import scala.xml.XML

import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}

import java.io.File
import java.io.PrintWriter

import scala.io.Source

[32mimport [39m[36medu.holycross.shot.cite._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._
[39m
[32mimport [39m[36medu.holycross.shot.scm._
[39m
[32mimport [39m[36medu.holycross.shot.citeobj._
[39m
[32mimport [39m[36medu.holycross.shot.citerelation._
[39m
[32mimport [39m[36medu.holycross.shot.dse._
[39m
[32mimport [39m[36medu.holycross.shot.citebinaryimage._
[39m
[32mimport [39m[36medu.holycross.shot.ohco2._

// Import some other stuff
[39m
[32mimport [39m[36mscala.xml.XML

[39m
[32mimport [39m[36malmond.display.UpdatableDisplay
[39m
[32mimport [39m[36malmond.interpreter.api.DisplayData.ContentType
[39m
[32mimport [39m[36malmond.interpreter.api.{DisplayData, OutputHandler}

[39m
[32mimport [39m[36mjava.io.File
[39m
[32mimport [39m[36mjava.io.PrintWriter

[39m
[32mimport [39m[36mscala.io.Source[39m

## Useful Functions

Save a String

In [4]:
def saveString(s:String, filePath:String = "", fileName:String = "temp.txt"):Unit = {
		 val writer = new PrintWriter(new File(s"${filePath}${fileName}"))
         writer.write(s)
         writer.close()
	}

defined [32mfunction[39m [36msaveString[39m

Pretty Print many things:

In [5]:
def showMe(v:Any):Unit = {
  v match {
    case _:StringHistogram => {
        for ( h <- v.asInstanceOf[StringHistogram].histogram ) {
            println(s"${h.count}\t${h.s}")
        }
    }
  	case _:Corpus => {
  		for ( n <- v.asInstanceOf[Corpus].nodes) {
  			println(s"${n.urn.passageComponent}\t\t${n.text}")
  		}	
  	}
    case _:Vector[Any] => println(s"""\n----\n${v.asInstanceOf[Vector[Any]].mkString("\n")}\n----\n""")
    case _:Iterable[Any] => println(s"""\n----\n${v.asInstanceOf[Iterable[Any]].mkString("\n")}\n----\n""")
    case _ => println(s"\n-----\n${v}\n----\n")
  }
}

defined [32mfunction[39m [36mshowMe[39m

## Set Up for Working With Base XML Texts in CEX

We will a CITE Library of texts whose CitableNodes consist of well-formed XML. We'll use a custom Class, `TextVersion` to make it a bit easier to generate catalog information for our new plain-text editions. We need two catalog entries for each because each contains two texts, and Introduction and the Itinerary.

Class `CatalogEntry` is part of the CITE OHCO2 library: <https://cite-architecture.github.io/cite-api-docs/ohco2/api/edu/holycross/shot/ohco2/CatalogEntry.html>.

In [6]:
val cexCatalogTemplatePath: String = "../data/cex_template.cex"

// Get it as a String

val rawCexTemplateString: String = scala.io.Source.fromFile(cexCatalogTemplatePath).mkString

// Give it a valid URN

val basicCatalogDesc: String = "Demonstration CEX of Benjamin of Tudela’s Itineraries. Plain text editions and Indices."

val basicCatalogUrn: Cite2Urn = Cite2Urn("urn:cite2:fu_elijah:cexCatalogs.2021a:bot_indexed_editions")

val cexTemplateString = rawCexTemplateString
            .replaceAll("CEX_URN_GOES_HERE",basicCatalogUrn.toString)
            .replaceAll("CEX_DESC_GOES_HERE", basicCatalogDesc)

val xmlCexPath: String = "../BoT_Cex/BoT_XML.cex"

val textCexPath: String = "../BoT_Cex/"
val textCexFN: String = "BoT_rich.cex"

case class TextVersion(
    baseMainUrn: CtsUrn,
    mainCatalogEntry: CatalogEntry, 
    baseIntroUrn: CtsUrn,
    introCatalogEntry: CatalogEntry,
    path: String = xmlCexPath
)


val engPT =  TextVersion(
    baseMainUrn = CtsUrn("urn:cts:elijahlab:benTud.itin.englishXml:"),
    baseIntroUrn = CtsUrn("urn:cts:elijahlab:benTud.itinIntro.englishXml:"),
    mainCatalogEntry = CatalogEntry(
        urn = CtsUrn("urn:cts:elijahlab:benTud.itin.english:"),
        citationScheme = "geographic narrative / section",
        lang = "eng",
        groupName = "Benjamin of Tudela",
        workTitle = "Itineraries",
        versionLabel = Some("English translation, plain-text.  Marcus Nathan Adler, The Itinerary of Benjamin of Tudela, Critical Text, Translation and Commentary. London 1907, as made available in Project Gutenberg, https://www.gutenberg.org/files/14981/14981-h/14981-h.htm"),
        exemplarLabel = None,
        online = true
    ),
    introCatalogEntry = CatalogEntry(
        urn = CtsUrn("urn:cts:elijahlab:benTud.itinIntro.english:"),
        citationScheme = "head, body",
        lang = "eng",
        groupName = "Benjamin of Tudela",
        workTitle = "Introduction to the Itineraries",
        versionLabel = Some("English translation, plain-text.  Marcus Nathan Adler, The Itinerary of Benjamin of Tudela, Critical Text, Translation and Commentary. London 1907, as made available in Project Gutenberg, https://www.gutenberg.org/files/14981/14981-h/14981-h.htm"),
        exemplarLabel = None,
        online = true
    )
)

val hebPT = TextVersion(
    baseMainUrn = CtsUrn("urn:cts:elijahlab:benTud.itin.hebrewXml:"),
    baseIntroUrn = CtsUrn("urn:cts:elijahlab:benTud.itinIntro.hebrewXml:"),
    mainCatalogEntry = CatalogEntry(
        urn = CtsUrn("urn:cts:elijahlab:benTud.itin.hebrew:"),
        citationScheme = "geographic narrative / section",
        lang = "heb",
        groupName = "Benjamin of Tudela",
        workTitle = "Itineraries",
        versionLabel = Some("Hebrew edition, plain-text.  Abraham Asher, The Itinerary of Rabbi Benjamin of Tudela. London-Berlin 1840-1841. Vol. 1"),
        exemplarLabel = None,
        online = true
    ),
    introCatalogEntry = CatalogEntry(
        urn = CtsUrn("urn:cts:elijahlab:benTud.itinIntro.hebrew:"),
        citationScheme = "head, body",
        lang = "heb",
        groupName = "Benjamin of Tudela",
        workTitle = "Introduction to the Itineraries",
        versionLabel = Some("Hebrew edition, plain-text.  Abraham Asher, The Itinerary of Rabbi Benjamin of Tudela. London-Berlin 1840-1841. Vol. 1"),
        exemplarLabel = None,
        online = true
    )
)

val araPT =  TextVersion(
    baseMainUrn = CtsUrn("urn:cts:elijahlab:benTud.itin.arabicXml:"),
    baseIntroUrn = CtsUrn("urn:cts:elijahlab:benTud.itinIntro.arabicXml:"),
    mainCatalogEntry = CatalogEntry(
        urn = CtsUrn("urn:cts:elijahlab:benTud.itin.arabic:"),
        citationScheme = "geographic narrative / section",
        lang = "eng",
        groupName = "Benjamin of Tudela",
        workTitle = "Itineraries",
        versionLabel = Some("Arabic translation, plain-text. Translated from the Hebrew original, with Introduction, Notes and Appendixes By Ezra H. Haddad, Baghdad 1945"),
        exemplarLabel = None,
        online = true
    ),
    introCatalogEntry = CatalogEntry(
        urn = CtsUrn("urn:cts:elijahlab:benTud.itinIntro.arabic:"),
        citationScheme = "head, body",
        lang = "heb",
        groupName = "Benjamin of Tudela",
        workTitle = "Introduction to the Itineraries",
        versionLabel = Some("Arabic translation, plain-text. Translated from the Hebrew original, with Introduction, Notes and Appendixes By Ezra H. Haddad, Baghdad 1945"),
        exemplarLabel = None,
        online = true
    )

)

// We'll throw those into a Vector so we can iterate across them.

val textVec: Vector[TextVersion] = Vector(engPT, hebPT, araPT)






[36mcexCatalogTemplatePath[39m: [32mString[39m = [32m"../data/cex_template.cex"[39m
[36mrawCexTemplateString[39m: [32mString[39m = [32m"""// 

#!cexversion
3.0

#!citelibrary
name#CEX_DESC_GOES_HERE
urn#CEX_URN_GOES_HERE
license#CC Share Alike.  For details, see more info.

"""[39m
[36mbasicCatalogDesc[39m: [32mString[39m = [32m"Demonstration CEX of Benjamin of Tudela\u2019s Itineraries. Plain text editions and Indices."[39m
[36mbasicCatalogUrn[39m: [32mCite2Urn[39m = [33mCite2Urn[39m(
  [32m"urn:cite2:fu_elijah:cexCatalogs.2021a:bot_indexed_editions"[39m
)
[36mcexTemplateString[39m: [32mString[39m = [32m"""// 

#!cexversion
3.0

#!citelibrary
name#Demonstration CEX of Benjamin of Tudela’s Itineraries. Plain text editions and Indices.
urn#urn:cite2:fu_elijah:cexCatalogs.2021a:bot_indexed_editions
license#CC Share Alike.  For details, see more info.

"""[39m
[36mxmlCexPath[39m: [32mString[39m = [32m"../BoT_Cex/BoT_XML.cex"[39m
[36mtextCexPath[39m:

## Load the XML Versions into a Cite Library

In [7]:
val lib: CiteLibrary = CiteLibrarySource.fromFile(xmlCexPath)

val tr: TextRepository = lib.textRepository.get

May 16, 2021 6:58:15 PM wvlet.log.Logger log
INFO: Building text repo from cex ...
May 16, 2021 6:58:15 PM wvlet.log.Logger log
INFO: Building collection repo from cex ...
May 16, 2021 6:58:15 PM wvlet.log.Logger log
INFO: Building relations from cex ...
May 16, 2021 6:58:15 PM wvlet.log.Logger log
INFO: All library components built.


[36mlib[39m: [32mCiteLibrary[39m = [33mCiteLibrary[39m(
  [32m"Demonstration CEX of Benjamin of Tudela\u2019s Itineraries. XML Editions"[39m,
  [33mCite2Urn[39m([32m"urn:cite2:fu_elijah:cexCatalogs.2021a:bot_plainText_editions"[39m),
  [32m"CC Share Alike.  For details, see more info."[39m,
  [33mVector[39m(),
  [33mSome[39m(
    [33mTextRepository[39m(
      [33mCorpus[39m(
        [33mVector[39m(
          [33mCitableNode[39m(
            [33mCtsUrn[39m([32m"urn:cts:elijahlab:benTud.itinIntro.englishXml:0"[39m),
            [32m"<head xmlns=\"http://www.tei-c.org/ns/1.0\"> THE ITINERARY OF <persName xml:id=\"recogito-9ea93359-2c2c-4427-a28b-55a60927450d\">BENJAMIN</persName> OF TUDELA. HEBREW INTRODUCTION.</head>"[39m
          ),
          [33mCitableNode[39m(
            [33mCtsUrn[39m([32m"urn:cts:elijahlab:benTud.itinIntro.englishXml:1"[39m),
            [32m"<ab xmlns=\"http://www.tei-c.org/ns/1.0\"> This is the book of travels, which was c

## Load XML data for Places

We've got two XML files of places mentioned in the Itinerary. Let's get both into a Vector of a good data-structure, and then merge them.


In [8]:
val placesHebPath: String = "../data/BTAsher20210429.xml"

val placesEngPath: String = "../data/BTAdler20210419.xml"

val hebXml: xml.Elem = XML.loadFile(placesHebPath)

val engXml: xml.Elem = XML.loadFile(placesEngPath)


[36mplacesHebPath[39m: [32mString[39m = [32m"../data/BTAsher20210429.xml"[39m
[36mplacesEngPath[39m: [32mString[39m = [32m"../data/BTAdler20210419.xml"[39m
[36mhebXml[39m: [32mxml[39m.[32mElem[39m = <TEI xml:id="TEI_r3h_nxn_wmb" xmlns="http://www.tei-c.org/ns/1.0">
	<?xml-stylesheet type="text/css" href="../../../../../Applications/Oxygen%20XML%20Editor/frameworks/tei/xml/tei/stylesheet/tei.css"?>
	<?xml-stylesheet type="text/css" href="../travelab.css"?>
	<teiHeader>
		<fileDesc>
			<titleStmt>
				<title type="main">Asher's Benjamin of Tudela</title>
			</titleStmt>
			<publicationStmt>
				<publisher>tranScriptorium</publisher>
			</publicationStmt>
			<sourceDesc corresp="BT" xml:id="Asher">
				<biblStruct>
					<monogr>
						<title>The Itinerary of Rabbi Benjamin of Tudela</title>
						<author>Benjamin of Tudela</author>
						<editor>A. Asher</editor>
						<imprint>
							<date>1840</date>
							<pubPlace>London</pubPlace>
							<publisher>A. Asher and co

In [9]:
val hXmlPlaces: xml.NodeSeq = hebXml \\ "place"

val eXmlPlaces: xml.NodeSeq = hebXml \\ "place"

[36mhXmlPlaces[39m: [32mxml[39m.[32mNodeSeq[39m = [33mNodeSeq[39m(
  <place type="point" xml:id="K6347" xmlns="http://www.tei-c.org/ns/1.0">
						<placeName>Zaragoza</placeName>
						<location>
							<geo>41.65606 -0.87734</geo>
						</location>
						<idno type="URI">http://geo-kima.org/Place/6347</idno>
					</place>,
  <place type="line" xml:id="GN3123754" xmlns="http://www.tei-c.org/ns/1.0">
						<placeName>Spain</placeName>
						<location><geo>40.73024, 0.86985</geo></location>
						<idno type="URI">https://www.geonames.org/3123754/</idno>
					</place>,
  <place type="point" xml:id="K9805" xmlns="http://www.tei-c.org/ns/1.0">
						<placeName>Tortosa</placeName>
						<location>
							<geo>40.815111 0.523778</geo>
						</location>
						<idno type="URI">http://geo-kima.org/place/9805</idno>
					</place>,
  <place type="point" xml:id="K7559" xmlns="http://www.tei-c.org/ns/1.0">
						<placeName>Tarragona</placeName>
						<location>
							<geo>41.119196 1.258058

In [10]:
hXmlPlaces.head \\ "placeName"

val hPlaces: Vector[(String, String, String, String)] = hXmlPlaces.map( x => {
    val idno: String = {
        val s = x.attributes.toString
        val rx1 = """type="[^"]+""""
        val rx2 = "xml:id="
        s.replaceAll(rx1, "").replaceAll(rx2, "").replaceAll(""""""","").trim
    }
    val placeName: String = (x \\ "placeName").text
    val uri: String = (x \\ "idno").text
    val location: String = (x \\ "geo").text
    
    (idno, placeName, uri, location)
    
}).toVector

val ePlaces: Vector[(String, String, String, String)] = eXmlPlaces.map( x => {
    val idno: String = {
        val s = x.attributes.toString
        val rx1 = """type="[^"]+""""
        val rx2 = "xml:id="
        s.replaceAll(rx1, "").replaceAll(rx2, "").replaceAll(""""""","").trim
    }   
    val placeName: String = (x \\ "placeName").text
    val uri: String = (x \\ "idno").text
    val location: String = (x \\ "geo").text
    
    (idno, placeName, uri, location)
    
}).toVector


[36mres9_0[39m: [32mxml[39m.[32mNodeSeq[39m = [33mNodeSeq[39m(
  <placeName xmlns="http://www.tei-c.org/ns/1.0">Zaragoza</placeName>
)
[36mhPlaces[39m: [32mVector[39m[([32mString[39m, [32mString[39m, [32mString[39m, [32mString[39m)] = [33mVector[39m(
  ([32m"K6347"[39m, [32m"Zaragoza"[39m, [32m"http://geo-kima.org/Place/6347"[39m, [32m"41.65606 -0.87734"[39m),
  (
    [32m"GN3123754"[39m,
    [32m"Spain"[39m,
    [32m"https://www.geonames.org/3123754/"[39m,
    [32m"40.73024, 0.86985"[39m
  ),
  ([32m"K9805"[39m, [32m"Tortosa"[39m, [32m"http://geo-kima.org/place/9805"[39m, [32m"40.815111 0.523778"[39m),
  ([32m"K7559"[39m, [32m"Tarragona"[39m, [32m"http://geo-kima.org/place/7559"[39m, [32m"41.119196 1.258058"[39m),
  ([32m"K6471"[39m, [32m"Spain"[39m, [32m"http://geo-kima.org/place/6471"[39m, [32m"40, -3"[39m),
  ([32m"K582"[39m, [32m"Barcelona"[39m, [32m"http://geo-kima.org/place/582"[39m, [32m"41.384106 2.175422"[3

## Verification

Let's do some verification…

In [11]:
val misMatches: Vector[((String, String, String, String), Int)] = {
    hPlaces.zipWithIndex.filter( h => {
        h._1 != ePlaces(h._2)
    })
}

assert( misMatches.size == 0 )

[36mmisMatches[39m: [32mVector[39m[(([32mString[39m, [32mString[39m, [32mString[39m, [32mString[39m), [32mInt[39m)] = [33mVector[39m()

If the above passes muster, we don't need to work with both sets, so we'll work just with `hPlaces`.

In [12]:
val missingId: Vector[(String, String, String, String)] = {
    hPlaces.filter(_._1 == "")
}

println( s"${missingId.size} records missing an ID." )



val missingLocation: Vector[(String, String, String, String)] = {
    hPlaces.filter(_._4 == "")
}

println( s"${missingLocation.size} records missing latitude and longitude." )

showMe(missingLocation.map(_._1))

val missingPlaceName: Vector[(String, String, String, String)] = {
    hPlaces.filter(_._2 == "")
}

println( s"${missingPlaceName.size} records missing a place-name." )


val missingUri: Vector[(String, String, String, String)] = {
    hPlaces.filter(_._3 == "")
}

println( s"${missingUri.size} records missing a URI." )

val justIds: Vector[String] = hPlaces.map(_._1)

// check for uniqueness
assert( justIds == justIds.distinct)



0 records missing an ID.
24 records missing latitude and longitude.

----
K2501
K7951
K10968
K7637
K6399
K12708
K2047
K8962
U1
K9259
U4
Q1404297
K7639
U6
U2
U3
U7
U10
U11
U9
U8
U12
K21499
U5
----

128 records missing a place-name.
12 records missing a URI.


[36mmissingId[39m: [32mVector[39m[([32mString[39m, [32mString[39m, [32mString[39m, [32mString[39m)] = [33mVector[39m()
[36mmissingLocation[39m: [32mVector[39m[([32mString[39m, [32mString[39m, [32mString[39m, [32mString[39m)] = [33mVector[39m(
  ([32m"K2501"[39m, [32m""[39m, [32m"http://geo-kima.org/place/2501"[39m, [32m""[39m),
  ([32m"K7951"[39m, [32m""[39m, [32m"http://geo-kima.org/place/7951"[39m, [32m""[39m),
  ([32m"K10968"[39m, [32m""[39m, [32m"http://geo-kima.org/place/10968"[39m, [32m""[39m),
  ([32m"K7637"[39m, [32m""[39m, [32m"http://geo-kima.org/place/7637"[39m, [32m""[39m),
  ([32m"K6399"[39m, [32m""[39m, [32m"http://geo-kima.org/place/6399"[39m, [32m""[39m),
  ([32m"K12708"[39m, [32m""[39m, [32m"http://geo-kima.org/place/12708"[39m, [32m""[39m),
  ([32m"K2047"[39m, [32m""[39m, [32m"http://geo-kima.org/place/2047"[39m, [32m""[39m),
  ([32m"K8962"[39m, [32m""[39m, [32m"http://geo-kima.or

### Temporary Expedient

For now, we'll just use "No Place Name" and an arbitrary point near the South Pole (-80, 0.00) where we are missing data. As the XML files get updated, those will go away.

## A Data Structure for Places

In [13]:
case class BotPlace(urn: Cite2Urn, label: String, location: String, placeName: String, kimaUri: String)

defined [32mclass[39m [36mBotPlace[39m

### Parameters for Collections

In [14]:
val collectionTemplatePath: String = "../data/collections_template.cex"

val collUrn: Cite2Urn = Cite2Urn("urn:cite2:elijahfurman:botPlaces.2021a:")

val justCollectionsFileName: String = "collections_only.cex"

[36mcollectionTemplatePath[39m: [32mString[39m = [32m"../data/collections_template.cex"[39m
[36mcollUrn[39m: [32mCite2Urn[39m = [33mCite2Urn[39m([32m"urn:cite2:elijahfurman:botPlaces.2021a:"[39m)
[36mjustCollectionsFileName[39m: [32mString[39m = [32m"collections_only.cex"[39m

In [15]:
def placeObjects: Vector[BotPlace] = hPlaces.map( h => {
    val urn: Cite2Urn = collUrn.addSelector(h._1)
    val label: String = s"Place ID ${h._1}"
    val location: String = {
        if (h._4 == "") { "-80, 0" }
        else { h._4.split(" ").mkString(", ")}
    }
    val placeName: String = {
        if (h._2 == "") { "No place name." }
        else h._2
    }
    val uri: String = {
        if (h._3 == "") { "No URI." }
        else s"[Database record](${h._3})"
    }
    
    BotPlace( urn, label, location, placeName, uri)
})

defined [32mfunction[39m [36mplaceObjects[39m

## Generate Collection Records

In [16]:
val citeDataHeader: String = s"""\n\n#!citedata\nurn#label#location#placeName#kimaUri\n"""

val collVec: Vector[String] = placeObjects.map( po => {
    s"${po.urn}#${po.label}#${po.location}#${po.placeName}#${po.kimaUri}"
})

val collectionData: String = {
    citeDataHeader + collVec.mkString("\n") + "\n\n"
}

val rawCollCexTemplateString: String = scala.io.Source.fromFile(collectionTemplatePath).mkString

// Give it a valid URN

val collCatalogDesc: String = "Demonstration CEX of Benjamin of Tudela’s Itineraries. Plain text editions and Indices."

val collCatalogUrn: Cite2Urn = Cite2Urn("urn:cite2:fu_elijah:cexCatalogs.2021a:bot_indexed_editions")

val cexTemplateString = rawCollCexTemplateString
            .replaceAll("CEX_URN_GOES_HERE",collCatalogUrn.toString)
            .replaceAll("CEX_DESC_GOES_HERE", collCatalogDesc)

val collCex: String = cexTemplateString + "\n\n" + collectionData

saveString(collCex, textCexPath, justCollectionsFileName)

[36mciteDataHeader[39m: [32mString[39m = [32m"""

#!citedata
urn#label#location#placeName#kimaUri
"""[39m
[36mcollVec[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"urn:cite2:elijahfurman:botPlaces.2021a:K6347#Place ID K6347#41.65606, -0.87734#Zaragoza#[Database record](http://geo-kima.org/Place/6347)"[39m,
  [32m"urn:cite2:elijahfurman:botPlaces.2021a:GN3123754#Place ID GN3123754#40.73024,, 0.86985#Spain#[Database record](https://www.geonames.org/3123754/)"[39m,
  [32m"urn:cite2:elijahfurman:botPlaces.2021a:K9805#Place ID K9805#40.815111, 0.523778#Tortosa#[Database record](http://geo-kima.org/place/9805)"[39m,
  [32m"urn:cite2:elijahfurman:botPlaces.2021a:K7559#Place ID K7559#41.119196, 1.258058#Tarragona#[Database record](http://geo-kima.org/place/7559)"[39m,
  [32m"urn:cite2:elijahfurman:botPlaces.2021a:K6471#Place ID K6471#40,, -3#Spain#[Database record](http://geo-kima.org/place/6471)"[39m,
  [32m"urn:cite2:elijahfurman:botPlaces.2021a:K582#P

## Make Plain-Text Editions

The steps are: 

- For each text in `textVec`…
- twiddle our Corpus for both the `_.baseMainUrn` and `_.baseIntroUrn`,
- for each `CitableNode`, load the `_.text` into a Scala `xml.NodeSeq`,
- Get the `.text` content.
- Create a new CitableNode with new URN and this plain-text component,
- Wrap them all into a `Corpus`,
- Combine with the `CatalogEntry`,
- Serialize to CEX and save.

In [17]:
// Get a vector of Corpus objects

val newCorpora: Vector[Corpus] = {
    textVec.map( tv => {
        val mainCorp: Corpus = tr.corpus ~~ tv.baseMainUrn
        val introCorp: Corpus = tr.corpus ~~ tv.baseIntroUrn
             
        val newMainNodes: Vector[CitableNode] = mainCorp.nodes.map( c => {
            val newUrn: CtsUrn = tv.mainCatalogEntry.urn.addPassage(c.urn.passageComponent)
            val xmlText: xml.NodeSeq = xml.XML.loadString(c.text)
            val newText: String = xmlText.head.text
            CitableNode(newUrn, newText)
        })
        
        val newMainCorp = Corpus(newMainNodes)
        
        val newIntroNodes: Vector[CitableNode] = introCorp.nodes.map( c => {
            val newUrn: CtsUrn = tv.introCatalogEntry.urn.addPassage(c.urn.passageComponent)
            val xmlText: xml.NodeSeq = xml.XML.loadString(c.text)
            val newText: String = xmlText.head.text.trim
            CitableNode(newUrn, newText)
        })
        
        val newIntroCorp = Corpus(newIntroNodes)
        
        Vector(newIntroCorp, newMainCorp)
        
    }).flatten
}



[36mnewCorpora[39m: [32mVector[39m[[32mCorpus[39m] = [33mVector[39m(
  [33mCorpus[39m(
    [33mVector[39m(
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:elijahlab:benTud.itinIntro.english:0"[39m),
        [32m"THE ITINERARY OF BENJAMIN OF TUDELA. HEBREW INTRODUCTION."[39m
      ),
      [33mCitableNode[39m(
        [33mCtsUrn[39m([32m"urn:cts:elijahlab:benTud.itinIntro.english:1"[39m),
        [32m"This is the book of travels, which was compiled by Rabbi Benjamin, the son of Jonah, of the land of Navarre--his repose be in Paradise. The said Rabbi Benjamin set forth from Tudela, his native city, and passed through many remote countries, as is related in his book. In every place which he entered, he made a record of all that he saw, or was told of by trustworthy persons--matters not previously heard of in the land of Sepharad. Also he mentions some of the sages and illustrious men residing in each place. He brought this book with him on his ret

### Generate the CEX String for the Editions

In [18]:
// textCexPath

def processCatalogEntry( tv: TextVersion ): String = {
    val txt: String = {
        tv.mainCatalogEntry.cex("#")
    }
    val intro: String = {
        tv.introCatalogEntry.cex("#")
    }
    txt + "\n" + intro
}

def makePTEdition( tv: Vector[TextVersion], wrapped: Boolean, path: String, fn: String ): String = {
    
    // make the catalog
    
    val ptCexCatalog: Vector[String] = {
        val intros: Vector[String] = tv.map( tv => {
            processCatalogEntry(tv)
        })

        s"""#!ctscatalog
urn#citationScheme#groupName#workTitle#versionLabel#exemplarLabel#online#lang""" +: intros
    }    
    
    // make the Corpora
    
    val ptCexTexts: Vector[String] = newCorpora.map( nc => {
        val nodeVec: String = nc.cex("#") 
        Vector("#!ctsdata", nodeVec).mkString("\n")
    })

    val outputCexString: String = {
        val vec: Vector[String] = {
            Vector("\n", ptCexCatalog.mkString("\n"), ptCexTexts.mkString("\n"))
        }

        vec.mkString("\n\n")
    }

    outputCexString  
    
}


val textCexString: String = makePTEdition( textVec, false, textCexPath, textCexFN)



defined [32mfunction[39m [36mprocessCatalogEntry[39m
defined [32mfunction[39m [36mmakePTEdition[39m
[36mtextCexString[39m: [32mString[39m = [32m"""


#!ctscatalog
urn#citationScheme#groupName#workTitle#versionLabel#exemplarLabel#online#lang
urn:cts:elijahlab:benTud.itin.english:#geographic narrative / section#Benjamin of Tudela#Itineraries#English translation, plain-text.  Marcus Nathan Adler, The Itinerary of Benjamin of Tudela, Critical Text, Translation and Commentary. London 1907, as made available in Project Gutenberg, https://www.gutenberg.org/files/14981/14981-h/14981-h.htm##true#eng
urn:cts:elijahlab:benTud.itinIntro.english:#head, body#Benjamin of Tudela#Introduction to the Itineraries#English translation, plain-text.  Marcus Nathan Adler, The Itinerary of Benjamin of Tudela, Critical Text, Translation and Commentary. London 1907, as made available in Project Gutenberg, https://www.gutenberg.org/files/14981/14981-h/14981-h.htm##true#eng
urn:cts:elijahlab:benTud.it

# Generate the Index

We need to examine each CitableNode in our XML text-repository, to find `<name ref="#U9" type="place">`, *vel sim.*

In [23]:
val corp: Corpus = tr.corpus

//val xmlText: xml.NodeSeq = xml.XML.loadString(c.text)

val passageIndex: Vector[(Cite2Urn, CtsUrn)] = corp.nodes.map( n => {
    
    val t: String = n.text
    val x: scala.xml.Elem = xml.XML.loadString(t)
    val ns: scala.xml.NodeSeq = x \\ "name"
    val optionStringVec: Vector[Option[String]] = {
        ns.map( n => {
            val attr: String = n.attributes.toString.replaceAll("""[ "#]""","")
            if (attr.contains("type=place")) {
                Some( attr.replaceAll("type=place","").replaceAll("ref=","").trim) 
            } else { None}
        }).toVector
    }
    
    
    val citeUrnVec: Vector[Cite2Urn] = optionStringVec.filter(_ != None).map( c => {
        collUrn.addSelector(c.get)
    })
    
    
    citeUrnVec.map( uv => { ( uv, n.urn )})
}).flatten

val indexHeader: String = """#!relations"""

val indexCex0: String = passageIndex.map( pi => {
    s"""${pi._1}#urn:cite2:cite:verbs.v1:commentsOn#${pi._2.toString.replaceAll("Xml:",":")}"""
}).mkString("\n")

val indexCex: String = "\n" + indexHeader + "\n" + indexCex0


[36mcorp[39m: [32mCorpus[39m = [33mCorpus[39m(
  [33mVector[39m(
    [33mCitableNode[39m(
      [33mCtsUrn[39m([32m"urn:cts:elijahlab:benTud.itinIntro.englishXml:0"[39m),
      [32m"<head xmlns=\"http://www.tei-c.org/ns/1.0\"> THE ITINERARY OF <persName xml:id=\"recogito-9ea93359-2c2c-4427-a28b-55a60927450d\">BENJAMIN</persName> OF TUDELA. HEBREW INTRODUCTION.</head>"[39m
    ),
    [33mCitableNode[39m(
      [33mCtsUrn[39m([32m"urn:cts:elijahlab:benTud.itinIntro.englishXml:1"[39m),
      [32m"<ab xmlns=\"http://www.tei-c.org/ns/1.0\"> This is the book of travels, which was compiled by <persName xml:id=\"recogito-086c8115-7e43-4f3f-ae4d-549dfc0f5410\">Rabbi Benjamin</persName>, the son of <persName xml:id=\"recogito-b100b332-712d-42ff-95b3-33d9de17339c\">Jonah</persName>, of the land of <placeName xml:id=\"recogito-0bce778c-e196-4694-8130-988d0d725d35\">Navarre</placeName>--his repose be in Paradise. The said <persName xml:id=\"recogito-59cd30b2-103c-42de-89dd-8d

# Write it All Out

In [24]:
// collCex

val deluxeFileName: String = "benjamine_of_tudela.cex"

val deluxeContent: String = collCex + "\n" + textCexString + "\n" + indexCex

saveString(deluxeContent, textCexPath, deluxeFileName)

[36mdeluxeFileName[39m: [32mString[39m = [32m"benjamine_of_tudela.cex"[39m
[36mdeluxeContent[39m: [32mString[39m = [32m"""// 

#!cexversion
3.0

#!citelibrary
name#Demonstration CEX of Benjamin of Tudela’s Itineraries. Plain text editions and Indices.
urn#urn:cite2:fu_elijah:cexCatalogs.2021a:bot_indexed_editions
license#CC Share Alike.  For details, see more info.


#!datamodels
Collection#Model#Label#Description
urn:cite2:cite:verbs.v1:#urn:cite2:cite:datamodels.v1:commentarymodel#Commentary Model#URN comments on URN. See documentation at <https://github.com/cite-architecture/commentary>.
//
// We add an extensions_text datamodel. This points to a collection of extended_text_properties.
//
urn:cite2:fufolio:extended_text_properties.v1:#urn:cite2:cite:datamodels.v1:extensions_text#Extended Text Properies#Extended Text Property. See documentation at <https://github.com/cite-architecture/>.

//
// Collection Inventory:
// Commentaries can be in the form of Collections as well