# Transforming a Text file to CEX



## Configuring CITE libraries for almond kernel

First, we'll make a bintray repository with CITE libraries available to your almond kernel.

In [None]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [None]:
import $ivy.`edu.holycross.shot::ohco2:10.16.0`
import $ivy.`edu.holycross.shot.cite::xcite:4.1.1`
import $ivy.`edu.holycross.shot::scm:7.2.0`
import $ivy.`edu.holycross.shot::dse:5.2.2`
import $ivy.`edu.holycross.shot::citebinaryimage:3.1.1`
import $ivy.`edu.holycross.shot::citeobj:7.3.4`
import $ivy.`edu.holycross.shot::citerelations:2.5.2`
import $ivy.`edu.holycross.shot::cex:6.3.3`


## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [None]:
// Import some CITE libraries
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.scm._
import edu.holycross.shot.citeobj._
import edu.holycross.shot.citerelation._
import edu.holycross.shot.dse._
import edu.holycross.shot.citebinaryimage._
import edu.holycross.shot.ohco2._

import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}

import java.io.File
import java.io.PrintWriter

import scala.io.Source


## Useful Functions

Save a string to a names file:

In [None]:
def saveString(s:String, filePath:String = "", fileName:String = "temp.txt"):Unit = {
		 val writer = new PrintWriter(new File(s"${filePath}${fileName}"))
         writer.write(s)
         writer.close()
	}

Convert a Roman Numeral to an Integer:

In [None]:
def fromRoman(s: String) : Int = {
	try {
		val numerals = Map('I' -> 1, 'V' -> 5, 'X' -> 10, 'L' -> 50, 'C' -> 100, 'D' -> 500, 'M' -> 1000)

		s.toUpperCase.map(numerals).foldLeft((0,0)) {
		  case ((sum, last), curr) =>  (sum + curr + (if (last < curr) -2*last else 0), curr) }._1
	} catch {
		case e:Exception => throw new Exception(s""" "${s}" is not a valid Roman Numeral.""")
	}
}

## Load a Template File

Load it into a Vector[String], filtering out any empty lines:

In [None]:
val filePath = s"torq.txt"
val lines: Vector[String] = {
    scala.io.Source.fromFile(filePath).mkString.split("\n").toVector.filter( _.size > 0 )
}

### We need to capture citation valus for Chapters, Paragraphs, and Sentences.

Let's attach an index-number to every line. This will be broadly useful. The result will be a Vector of Tuples: (String, Int). Since that will be confusing, we can create a Class called IndexedLine, and map to a Vector[IndexedLine]:

In [None]:
case class IndexedLine( index: Int, text: String)
val indexedLines: Vector[IndexedLine] = lines.zipWithIndex.map( l => {
    IndexedLine( l._2, l._1 )
})

Let's get a separate vector of *just* chapter-headings, but keeping their index-numbers from their context in the whole text:

In [None]:
val chapterHeadingStrings: Vector[IndexedLine] = indexedLines.filter( l => {
    l.text.startsWith("CAPÍTULO")
})

Let's define a ChapterHead class, containing the important data we'll need:

In [None]:
case class ChapterHead(index: Int, label: String, head: String)

And we map `chapterHeadingStrings` to this new class, by defining a Regular Expression, then applying it to each line in `chapterHeadingStrings`.

In [None]:
val pattern = """CAPÍTULO ([XVI]+)\. (.+)""".r

In [None]:
val chapterHeads: Vector[ChapterHead] = chapterHeadingStrings.map( chs => {
    val pattern(tempLabel, tempText) = chs.text

    val label:String = fromRoman(tempLabel).toString
    val text = tempText
    val index = chs.index
    ChapterHead( index, label, text)
})


Now we map each ChapterHead to a Vector of IndexedLines that go with that chapter. We can use a map of the indices where a chapter begins and ends.

In [None]:
val chapterMap: Vector[(ChapterHead, Vector[IndexedLine])] = chapterHeads.map( ch => {
    val startIndex: Int = ch.index
    val endIndex: Int = {
       val nextChapter: Vector[ChapterHead] = chapterHeads.filter( _.index > ch.index)
       if ( nextChapter.size == 0 ){ 
           indexedLines.last.index + 1 // end of the whole list
       } else {
           nextChapter.head.index
       }
    }
    val contentLines = indexedLines.filter( il => {
        ( il.index > startIndex ) && ( il.index < endIndex )
    })
    (ch, contentLines )
})

We can now build up a CEX file…

In [None]:
val urnBase = CtsUrn("urn:cts:greekLit:torquemada.001.offner:")

In [None]:
val sectionCexVec: Vector[String] = chapterMap.map( cm => {
    val chapt: String = cm._1.label
    val headStr: String = cm._1.head
    val firstLine: String = s"${urnBase}${chapt}.head#${headStr}"
    val sections: Vector[String] = cm._2.zipWithIndex.map( ll => {
        val secNum: String = (ll._2 + 1).toString
        val text: String = ll._1.text
        s"${urnBase}${chapt}.${secNum}#${text}"
    })
    firstLine +: sections
}).flatten

## Make Final CEX File

In [None]:
val cexHeaderPath = s"cex_header.txt"
val headerLines: Vector[String] = {
    scala.io.Source.fromFile(cexHeaderPath).mkString.split("\n").toVector.filter( _.size > 0 )
}
val cexHeader: String = "\n" + headerLines.mkString("\n") + "\n#!ctsdata\n"

Save it!

In [None]:
val finalCex: String = cexHeader + sectionCexVec.mkString("\n")
val fileName: String = "torq.cex"
saveString( finalCex, "", fileName)