# Validate & Build Camerarius

## Validation Reports



In [None]:
// A handle can also be retained, to later update or clear things
val textValidation = almond.display.Markdown("""
## Text Validation

> No report

## Collection Validation

> No report

""")

## Methods for Updating Report

In [None]:
def updateValidation( s1: String = "No report", s2: String = "No report", handle: almond.display.Markdown = textValidation ): Unit = {
    handle.withContent(s"""
## Text Validation

${s1}

## Collection Validation

${s2}

""").update()
}


## Editable Variables

We stick these up top where they are easy to find and change!

In [None]:
// Paths to template files
val dmTemplateCexPath: String = "pre-cex/data-models.cex"
val imageTemplateCexPath: String = "pre-cex/img_coll_template.cex"

// URLs for data in-progress
val figDataUrl: String = "https://docs.google.com/spreadsheets/d/1jslymhAMJLaWMka5gswxHE1MTH8JZRp3B831w6UPfnA/export?format=csv"
val imgTextUrl: String = "https://docs.google.com/spreadsheets/d/11vJuQE7_oPDrIlFYzqoBMRlDuichan3eDXxsXenBP24/export?format=csv"
val textEditionUrl: String = "https://docs.google.com/spreadsheets/d/1xPo3x3bcssHrFTWXTcw08dGe3xXxa9OCOQytn5ICS5Q/export?format=csv"
val imageUrnsUrl: String = "https://docs.google.com/spreadsheets/d/1WX_SZtPyz0f1dX3gAalmoh1dVFffiqyaKJeE4sLQ_pQ/export?format=csv"
val pageUrnsUrl: String = "https://docs.google.com/spreadsheets/d/1q4hoO3565ZhWyXUA1Z3gejTgB7Pn-T0DqP9wTkdjLuM/export?format=csv"

// Stuff for images
val ictUrl: String = "http://www.homermultitext.org/ict2/index.html?urn="
val thumbUrl: String = "http://www.homermultitext.org/iipsrv?OBJ=IIP,1.0&FIF=/project/homer/pyramidal/deepzoom/fufolio/camerarius1668/2020a/IMAGE_ID_HERE.tif&RGN=IMG_ROI_HERE&wID=5000&CVT=JPEG"

// Metadata for the Library
val citeLibName: String = "Camerarius Project, Furman University, 2020"
val citeLibUrnString: String = "urn:cite2:fufolio:camerarius.2020temp:"
val citeLibUrnLicense: String = "https://creativecommons.org/licenses/by-sa/4.0/"

// Metadata for our text
val textUrnString: String = "urn:cts:fufolio:camerarius.se.fu1668:"
val textGroupLabel: String = "Joachim Camerarius the Younger"
val workLabel: String = "Symbola et Emblemata"
val versionLabel: String = "Furman Editions, digital text of the 1669 edition"
val langString: String = "lat"



## Configuring Libraries for the Almond Kernel

First, we'll make a bintray repository with libraries available to your almond kernel.

In [None]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")

interp.repositories() ++= Seq(myBT)

Next, we bring in specific libraries from the new repository using almond's `$ivy` magic:

In [None]:
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`
import plotly._, plotly.element._, plotly.layout._, plotly.Almond._

import $ivy.`com.github.tototoshi::scala-csv:1.3.6`
import com.github.tototoshi.csv._

// if you want to have the plots available without an internet connection:
init(offline=true)

// restrict the output height to avoid scrolling in output cells
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)

import $ivy.`edu.holycross.shot::ohco2:10.20.3`
import $ivy.`edu.holycross.shot.cite::xcite:4.3.0`
import $ivy.`edu.holycross.shot::scm:7.4.0`
import $ivy.`edu.holycross.shot::dse:7.1.3`
import $ivy.`edu.holycross.shot::citebinaryimage:3.2.0`
import $ivy.`edu.holycross.shot::citeobj:7.5.1`
import $ivy.`edu.holycross.shot::citerelations:2.7.0`
import $ivy.`edu.holycross.shot::cex:6.5.0`
import $ivy.`edu.holycross.shot::greek:9.0.0`

## Imports

From this point on, your notebook consists of completely generic Scala, with the CITE Libraries available to use.

In [None]:
import almond.display.UpdatableDisplay
import almond.interpreter.api.DisplayData.ContentType
import almond.interpreter.api.{DisplayData, OutputHandler}

import java.io.File
import java.io.PrintWriter

import scala.io.Source

import java.text.SimpleDateFormat
import java.util.Date

// Import some CITE libraries
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.scm._
import edu.holycross.shot.citeobj._
import edu.holycross.shot.citerelation._
import edu.holycross.shot.dse._
import edu.holycross.shot.citebinaryimage._
import edu.holycross.shot.ohco2._
import edu.holycross.shot.greek._




## Useful Functions

Save a string to a named file:

In [None]:
def saveString(s:String, filePath:String = "temp", fileName:String = "temp.txt"):Unit = {
		 val writer = new PrintWriter(new File(s"${filePath}${fileName}"))
         writer.write(s)
         writer.close()
	}

Like `.split`, but preserving the character we split on:

In [None]:
def splitWithSplitter(text: String, puncs: String): Vector[String] = {
	//val regexWithSplitter = s"((?<=${puncs})|(?=${puncs}))"
    val regexWithSplitter = s"((?<=${puncs}))"
	text.split(regexWithSplitter).toVector.filter(_.size > 0)
}

Pretty Print Things:

In [None]:
def showMe(v:Any):Unit = {
  v match {
    case _:Vector[Any] => println(s"""\n----\n${v.asInstanceOf[Vector[Any]].mkString("\n")}\n----\n""")
    case _:Iterable[Any] => println(s"""\n----\n${v.asInstanceOf[Iterable[Any]].mkString("\n")}\n----\n""")
    case _ => println(s"\n-----\n${v}\n----\n")
  }
}

Load current data from our shared Google Spreadsheets.

The data is in `.csv`, and the fields included change over time, so we have to do this the hard way, with a real CSV library.

In [None]:
def getNParse(url: String ): List[Map[String, String]] = {
    val reader = CSVReader.open(scala.io.Source.fromURL(url))
    reader.allWithHeaders()
}

## Classes for Data

We make some Scala classes for our specific data-files. These will be turned into CTS texts or CITE Collections eventually, with **validation** in the process.

In [None]:
case class figureRow( urn: Cite2Urn, figRoi: Cite2Urn, captionRoi: Cite2Urn, text: String, passage: CtsUrn, description: String )

case class textImageRow( text: CtsUrn, image: Cite2Urn)

## Load & Validate a Text

We will take this a step at a time, validating wherever possible.

The end result will be a CITE `Corpus` object, which we can incorporate into a full CITE Library of Camerarius Data.

No news is good news. If the following cells run without obviousl errors, the data is valid.

In [None]:
/* 

    Here we use com.github.tototoshi.csv to parse CSV into a Vector of Tuple[String, String]
  
    We could go all the way to a CITE Corpus, but we want to do validation, so we'll take
    it a step at a time.
    
*/


def csvMapToTextTuples( csv: List[Map[String, String]]): Vector[(String, String)] = {
     csv.map( dm => {
        val citationString: String = dm("Citation")
        val passage: String = dm("Passage")
        (citationString, passage)
    }).toVector
}

val csv: List[Map[String, String]] = getNParse(textEditionUrl)

val textTuples: Vector[(String, String)] = csvMapToTextTuples(csv)


In [None]:
/* 

    Here we build a Corpus, and validate along the way.
    Errors will be printed below!
    
*/

var textMessage = ""

// Validate individual URNs
for (tt <- textTuples) {
    val psgText: String = tt._1
    val urnStr: String = textUrnString + psgText
    try {
        val u: CtsUrn = CtsUrn(urnStr)
    } catch {
        case e:Exception => {
            val errorMsg: String = s"""<p style="color: red">Failed to make URN with passage ${psgText}: ${e}</p>"""
            textMessage += "\n\n" ++ errorMsg
            println(s"\nERROR\n\tFailed to make URN with passage ${psgText}\n")
        }
    }
}


// Make a Corpus
val nodeVec: Vector[CitableNode] = textTuples.map( tt => {
    val psgText: String = tt._1
    val urnStr: String = textUrnString + psgText
    val u: CtsUrn = CtsUrn(urnStr)
    val passage: String = tt._2
    CitableNode( u, passage )
})

val camerariusCorpusOption: Option[Corpus] = {
    try {
        val c = Corpus(nodeVec)
        textMessage += """<p style="color: green">Successfully made a Corpus</p>"""
        println(textMessage)
        Some(c)
    } catch {
        case e:Exception => {
            val errorMsg: String = s""""<p style="color: red">Failed to make a Corpus: ${e}</p>"""
            textMessage += errorMsg
            println( s"Failed to make a Corpus: ${e}")
            None
        }
    }
}

updateValidation(s1 = textMessage)

camerariusCorpusOption.get.size

## Generate & Validate the Collections of Images and Pages

**N.b.** Even though, for the moment, we have a 1:1 correspondance between images of Camerarius pages and pages, these need to be two collections. 

In [None]:
var collErrorMsg = ""

def imageCsv( csv: List[Map[String, String]], col: String): Vector[String] = {
     csv.map( dm => {
        val citationString: String = dm(col)
         citationString
    }).toVector
}

val csv: List[Map[String, String]] = getNParse(imageUrnsUrl)

val imageUrnStringVec: Vector[String] = imageCsv(csv, "ImageUrn")

val imageUrnVec: Vector[Cite2Urn] = {
    try {
        val v = imageUrnStringVec.map( s => Cite2Urn(s))
        collErrorMsg += s"""<p style="color: green">Made list of ${v.size} image URNs</p>"""
        v
    } catch {
        case e: Exception => {
            collErrorMsg += s"""<p style="color: red">Error making list of Image URNs: ${e}</p>"""
            println(e)
            Vector()
        }
    }
}

// pageUrnsUrl

def pageCsv( csv: List[Map[String, String]]): Vector[String] = {
     csv.map( dm => {
        val citationString: String = dm("PageUrns")
         citationString
    }).toVector
}

val pagecsv: List[Map[String, String]] = getNParse(pageUrnsUrl)

val pageUrnStringVec: Vector[String] = imageCsv(pagecsv, "PageUrns")

val pageUrnVec: Vector[Cite2Urn] = {
    try {
        val v = pageUrnStringVec.map( s => Cite2Urn(s))
        collErrorMsg += s"""<p style="color: green">Made list of ${v.size} page URNs</p>"""
        v
    } catch {
        case e: Exception => {
            collErrorMsg += s"""<p style="color: red">Error making list of Page URNs: ${e}</p>"""
            println(e)
            Vector()
        }
    }
}



val imageAndPageLib: CiteLibrary = {
    CiteLibrary(Source.fromFile(imageTemplateCexPath).getLines.mkString("\n"))
}

updateValidation(s1 = textMessage, s2 = collErrorMsg )

/* Make Colllection out of imageUrnVec */

// Get dummy collection
val imgCollTemplate: String = Source.fromFile(imageTemplateCexPath).getLines.mkString("\n")

val pageObjectVec: Vector[String] = pageUrnVec.zipWithIndex.map( si => {
    val s = si._1
    val i = si._2
    val urnStr: String = s.toString
    val pageNum: String = s.objectComponent
    val label: String = s"The 1668 edition of Camerarius, page ${pageNum}"
    val seq: String = s"${i + 1}"
    s"${urnStr}#${label}#${seq}"
})

val imgObjectVec: Vector[String] = imageUrnVec.map( s => {
    val urnStr: String = s.toString
    val imgNum: String = s.objectComponent
    val label: String = s"The 1668 edition of Camerarius, image ${imgNum}"
    val rights: String = "Public Domain"
    s"${urnStr}#${label}#${rights}"
})

val imgDataVec: Vector[String] = "#!citedata\nurn#caption#rights" +: imgObjectVec

val imgDataStr = imgDataVec.mkString("\n")

val pageDataVec: Vector[String] = "#!citedata\nurn#label#sequence" +: pageObjectVec

val pageDataStr = pageDataVec.mkString("\n")

val imageCollectionCexString = imgCollTemplate + "\n\n" + imgDataStr + "\n\n" + pageDataStr

val imageCollectionLibrary: Option[CiteLibrary] = {
    try {
        val v = CiteLibrary(imageCollectionCexString)
        collErrorMsg += s"""<p style="color: green">Successfully generated collections of pages and images.</p>"""
        Some(v)
    } catch {
        case e: Exception => {
            collErrorMsg += s"""<p style="color: red">Error making collections of pages and images: ${e}</p>"""
            println(e)
            None
        }
    }
}

updateValidation(s1 = textMessage, s2 = collErrorMsg )


## Generate a Library of DataModels

In [None]:

val datamodelLib: Option[CiteLibrary] = {
    try {
        val cl = CiteLibrary(Source.fromFile(dmTemplateCexPath).getLines.mkString("\n"))
        collErrorMsg += s"""<p style="color: green">Made Data Model Collection</p>"""
        Some(cl)
    } catch {
        case e: Exception => {
          collErrorMsg += s"""<p style="color: red">Error making Data Model Collection: ${e}</p>"""
          None
        }
    }
}

updateValidation(s1 = textMessage, s2 = collErrorMsg )
