Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge master and reconcile differences #756

Open
wants to merge 106 commits into
base: balaur
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
6b912a9
Remove Tuple class use
kwalcock Feb 15, 2023
cdf5055
Make sure webapp testing works, especially in IntelliJ
kwalcock Feb 16, 2023
f609a9a
Merge pull request #712 from clulab/kwalcock/webappTest
kwalcock Feb 16, 2023
143e826
Add mostSimilarWords, documentation, and example
kwalcock Feb 16, 2023
f508413
Merge pull request #713 from clulab/kwalcock/tuples
kwalcock Feb 16, 2023
5e9d079
Make other compiler versions happy
kwalcock Feb 16, 2023
7955951
Make them happier
kwalcock Feb 16, 2023
77ba907
Fix variable name
kwalcock Feb 17, 2023
efabf41
Merge pull request #714 from clulab/kwalcock/wordEmbeddings
kwalcock Feb 17, 2023
3412251
Change from AutoCloser to Using
kwalcock Feb 17, 2023
436444d
Address failing test
kwalcock Feb 18, 2023
f4d76c1
Fix that TestSerializer, not the other one
kwalcock Feb 18, 2023
c1b9303
Incorporate Using throughout
kwalcock Feb 18, 2023
dfc4dc7
Streamline String via PrintWriter
kwalcock Feb 20, 2023
dec770f
Let PrintWriter manage its own files
kwalcock Feb 20, 2023
1badca9
Copy vs. assimilate
kwalcock Feb 21, 2023
a22116b
Add Veil
kwalcock Feb 21, 2023
f77a2ff
Try out annotation
kwalcock Feb 22, 2023
16e4cc5
Make unknowns blank again
kwalcock Feb 22, 2023
aa7e89b
Make webap display nicer
kwalcock Feb 22, 2023
02a2846
Clean up veil code
kwalcock Feb 22, 2023
38c6a48
Add comments
kwalcock Feb 22, 2023
6a52621
Add style to webapp
kwalcock Feb 22, 2023
c59a027
Rename variable
kwalcock Feb 22, 2023
ecaf7d5
Remove test code from webapp
kwalcock Feb 22, 2023
6eeca89
Remove more test code
kwalcock Feb 22, 2023
5c83dbd
Change some methods from private to protected
kwalcock Feb 22, 2023
1f48397
Add CustomRuleReader
kwalcock Feb 22, 2023
1d51057
Replace one protected
kwalcock Feb 22, 2023
1d22a77
Fix cross-compilation
kwalcock Feb 23, 2023
0160148
Merge pull request #715 from clulab/kwalcock/using
kwalcock Feb 28, 2023
f14ada8
Merge pull request #717 from clulab/kwalcock/dynamicRules
kwalcock Feb 28, 2023
48dc397
Document serialization
kwalcock Mar 1, 2023
b92bce6
Sort the roots
kwalcock Mar 2, 2023
d36c1ab
Test hash code
kwalcock Mar 9, 2023
eaa270d
Modify hash code
kwalcock Mar 9, 2023
5b8e19b
Fix test directory for SBT
kwalcock Mar 10, 2023
b6d62f0
Simplify HashTest
kwalcock Mar 10, 2023
5a9395f
Test simple string hashCode
kwalcock Mar 10, 2023
5d65563
Separate TestHash for different Scala versions
kwalcock Mar 10, 2023
e70f964
added plurals for pounds and lbs
maxaalexeeva Mar 20, 2023
f5f92f2
First version
kwalcock Mar 20, 2023
030dc9c
Second version, decising to go with Opt
kwalcock Mar 20, 2023
3213991
Add more tests
kwalcock Mar 20, 2023
945285d
Merge branch 'kwalcock/distanceToRoot' into kwalcock/distToRoot
kwalcock Mar 20, 2023
b6e8712
Fix a test
kwalcock Mar 21, 2023
9a1fee2
Merge pull request #726 from clulab/masha-pounds
kwalcock Mar 24, 2023
c118028
Merge pull request #719 from clulab/kwalcock/documentSerialization
kwalcock Mar 24, 2023
ee29a97
Bump nokogiri from 1.13.10 to 1.14.3 in /docs
dependabot[bot] Apr 12, 2023
26e6b47
Make it run without the Closer
kwalcock Jul 11, 2023
03fb091
Remove Using._
kwalcock Jul 11, 2023
002b52d
Update Scala versions
kwalcock Jul 11, 2023
4f01195
Comment on version number
kwalcock Jul 11, 2023
a661d5d
Merge branch 'master' into kwalcock/veil
kwalcock Jul 13, 2023
fb8b091
Remove autoclose, add comments
kwalcock Jul 13, 2023
88ec35e
Make webap display nicer
kwalcock Feb 22, 2023
1c4b627
Add style to webapp
kwalcock Feb 22, 2023
dda0a8f
Merge pull request #733 from clulab/kwalcock/webapp
kwalcock Jul 13, 2023
3f2ca9a
Show document type
kwalcock Jul 13, 2023
77159ff
Merge pull request #716 from clulab/kwalcock/veil
kwalcock Jul 13, 2023
5324c05
Merge pull request #732 from clulab/kwalcock/closeSource
kwalcock Jul 13, 2023
c52dd58
Merge pull request #723 from clulab/kwalcock/rehash
kwalcock Jul 13, 2023
86173d5
Merge branch 'dependabot/bundler/docs/nokogiri-1.14.3' into kwalcock/…
kwalcock Jul 14, 2023
8fd3879
Merge pull request #734 from clulab/kwalcock/dependabot
kwalcock Jul 14, 2023
feb8557
Merge pull request #727 from clulab/kwalcock/distToRoot
kwalcock Jul 14, 2023
71d6edf
1) added normalizer for imprecise dates (e.g., 'first week of April')…
alicekwak Jul 31, 2023
627eda9
fall/spring filter added
alicekwak Aug 7, 2023
18cf67e
normalizing 'last week/last two weeks of month' patterns
alicekwak Aug 8, 2023
b8f2694
1) revised and moved new tests into TestNumericEntityRecognition 2) R…
alicekwak Aug 20, 2023
eb5539f
Fix test in dependency utils
kwalcock Aug 21, 2023
6695055
Fix mention test
kwalcock Aug 21, 2023
2582a4a
Smooth out NumericActions
kwalcock Aug 21, 2023
beb9e14
Change variable name
kwalcock Aug 21, 2023
a6f3fd0
added ft. as measurement
MihaiSurdeanu Aug 22, 2023
be19dde
Fix test in dependency utils
kwalcock Aug 21, 2023
cfd82b4
Fix mention test
kwalcock Aug 21, 2023
2a87153
Merge pull request #740 from clulab/kwalcock/ft
kwalcock Aug 23, 2023
0149e78
Merge pull request #738 from clulab/kwalcock/date-revision
kwalcock Aug 23, 2023
7073b6f
Merge pull request #739 from clulab/ft
kwalcock Aug 23, 2023
9152996
cleaned up unwanted lines
alicekwak Aug 29, 2023
cc932cb
Merge branch 'master' into alice-date-revision
alicekwak Aug 29, 2023
f723a33
Forcing an empty commit.
alicekwak Aug 29, 2023
007d233
Merge remote-tracking branch 'origin/alice-date-revision' into alice-…
alicekwak Aug 29, 2023
d2e5e4a
Merge pull request #736 from clulab/alice-date-revision
kwalcock Sep 1, 2023
7172f71
added "season in year" unit test
MihaiSurdeanu Sep 12, 2023
d28eed3
started debugging
MihaiSurdeanu Sep 13, 2023
7abaed9
added WeakPossibleYear
MihaiSurdeanu Sep 14, 2023
548e632
"fall" filter bug fix
MihaiSurdeanu Sep 14, 2023
8bf0cb8
Clean up after my own complaints
Sep 15, 2023
667b4d0
Merge pull request #753 from clulab/kwalcock/fall
kwalcock Sep 17, 2023
1ca4472
Merge pull request #752 from clulab/fall
kwalcock Sep 19, 2023
b57541b
Merge branch 'master' into kwalcock/balaur
kwalcock Sep 19, 2023
c2588ca
Fix immediate merge problems
kwalcock Sep 20, 2023
3fcf8b8
Get old CluProcessor
kwalcock Sep 20, 2023
1909e3f
Fix Scala 3 problems with toArray
kwalcock Sep 20, 2023
5fa534a
Use type instead of tuple
kwalcock Sep 20, 2023
a19104e
Fix Scala 2.11 syntax
kwalcock Sep 20, 2023
abd5c8a
added hybrid deps
MihaiSurdeanu Sep 20, 2023
89ba0bb
Get old CluProcessor for other Scala versions
kwalcock Sep 20, 2023
a4898d1
Increase a timeout
kwalcock Sep 21, 2023
da2bfd1
Put CluProcessor into a package object
kwalcock Sep 21, 2023
d8fac3d
Remove CluProcessor from webapp
kwalcock Sep 21, 2023
ff07326
Otherwise remove CluProcessor from code
kwalcock Sep 21, 2023
291a080
Streamline setLabelsAndNorms
kwalcock Sep 21, 2023
145613a
Temporarily add timer to test
kwalcock Sep 21, 2023
8cf219d
Skip time check
kwalcock Sep 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions build.sbt
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
val scala211 = "2.11.12" // up to 2.11.12
val scala212 = "2.12.17" // up to 2.12.18
val scala213 = "2.13.10" // up to 2.13.11
val scala212 = "2.12.18" // up to 2.12.18
val scala213 = "2.13.12" // up to 2.13.12
val scala30 = "3.0.2" // up to 3.0.2
val scala31 = "3.1.3" // up to 3.1.3
val scala32 = "3.2.1" // up to 3.2.2
val scala33 = "3.3.0" // up to 3.3.0
val scala32 = "3.2.2" // up to 3.2.2
val scala33 = "3.3.1" // up to 3.3.1

val scala3 = scala31

// See https://www.scala-lang.org/blog/2022/08/17/long-term-compatibility-plans.html.
// Scala30: "If you are maintaining a library, you should drop Scala 3.0." Dropped.
// Scala31: This is the current LTS (long term support) version and default Scala 3 release.
// Scala31: This is a LTS (long term support) version before it was called that.
// Scala32: This is for experimentation, as in Scala Next, and not for release.
ThisBuild / crossScalaVersions := Seq(scala212, scala211, scala213, scala31)
// Scala33: This is the first official LTS, but hold off until necessary.
ThisBuild / crossScalaVersions := Seq(scala212, scala211, scala213, scala3)
ThisBuild / scalaVersion := crossScalaVersions.value.head

lazy val root = (project in file("."))
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
package org.clulab.processors

import java.io.{File, FileFilter, PrintWriter}
import org.clulab.processors.clu.BalaurProcessor
import org.clulab.processors.fastnlp.FastNLPProcessor
import org.clulab.struct.GraphMap
import org.clulab.utils.{FileUtils, Sourcer, StringUtils}
import org.slf4j.{Logger, LoggerFactory}

import java.io.{File, FileFilter, PrintWriter}
import scala.util.Using

import TextLabelToCoNLLU._
import org.clulab.struct.GraphMap
import org.clulab.utils.Closer.AutoCloser

/**
* Processes raw text and saves the output in the CoNLL-U format
Expand All @@ -24,9 +26,9 @@ class TextLabelToCoNLLU(val proc:Processor, val isCoreNLP:Boolean) {
try {
val doc = parseFile(f)
val ofn = s"$outDir/${f.getName.substring(0, f.getName.length - 4)}.conllu"
val pw = new PrintWriter(ofn)
toCoNLLU(doc, pw)
pw.close()
Using.resource(new PrintWriter(ofn)) { pw =>
toCoNLLU(doc, pw)
}
} catch {
case e:Exception => {
logger.error(s"Parsing of file $f failed with error:")
Expand Down Expand Up @@ -77,7 +79,7 @@ class TextLabelToCoNLLU(val proc:Processor, val isCoreNLP:Boolean) {

def parseFile(f:File):Document = {
def option1(): Document = {
val tokens = Sourcer.sourceFromFile(f).autoClose { source =>
val tokens = Using.resource(Sourcer.sourceFromFile(f)) { source =>
for (line <- source.getLines())
yield line.split(' ').toSeq
}.toSeq
Expand Down
23 changes: 13 additions & 10 deletions corenlp/src/main/scala/org/clulab/processors/TextToCoNLLU.scala
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
package org.clulab.processors

import java.io.{File, FileFilter, PrintWriter}

import org.clulab.processors.clu.BalaurProcessor
import org.clulab.processors.fastnlp.FastNLPProcessor
import org.clulab.struct.GraphMap
import org.clulab.utils.StringUtils
import org.slf4j.{Logger, LoggerFactory}

import java.io.{File, FileFilter, PrintWriter}
import scala.util.Using

import TextToCoNLLU._
import org.clulab.struct.GraphMap

/**
* Processes raw text and saves the output in the CoNLL-U format
Expand All @@ -24,9 +27,9 @@ class TextToCoNLLU(val proc:Processor, val isCoreNLP:Boolean) {
try {
val doc = parseFile(f)
val ofn = s"$outDir/${f.getName.substring(0, f.getName.length - 4)}.conllu"
val pw = new PrintWriter(ofn)
toCoNLLU(doc, pw)
pw.close()
Using.resource(new PrintWriter(ofn)) { pw =>
toCoNLLU(doc, pw)
}
} catch {
case e:Exception => {
logger.error(s"Parsing of file $f failed with error:")
Expand Down Expand Up @@ -65,13 +68,13 @@ class TextToCoNLLU(val proc:Processor, val isCoreNLP:Boolean) {
}

def parseFile(f:File):Document = {
val s = scala.io.Source.fromFile(f)
val buffer = new StringBuilder
for(line <- s.getLines()) {
buffer.append(line)
buffer.append("\n")
Using.resource(scala.io.Source.fromFile(f)) { s =>
for (line <- s.getLines()) {
buffer.append(line)
buffer.append("\n")
}
}
s.close()

val doc = proc.mkDocument(buffer.toString())
annotate(doc)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ class CoreNLPDocument(sentences: Array[Sentence]) extends Document(sentences) {

var annotation:Option[Annotation] = None

def copy(document: CoreNLPDocument): CoreNLPDocument = {
super.copy(document)
def assimilate(document: CoreNLPDocument, textOpt: Option[String]): CoreNLPDocument = {
super.assimilate(document, textOpt)
annotation = document.annotation
this
}

override def copy(sentences: Array[Sentence] = sentences): CoreNLPDocument =
new CoreNLPDocument(sentences).copy(this)
override def copy(sentences: Array[Sentence] = sentences, textOpt: Option[String] = text): CoreNLPDocument =
new CoreNLPDocument(sentences).assimilate(this, textOpt)

override def clear(): Unit = {
//println("Clearing state from document.")
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
package org.clulab.processors.corenlp.chunker

import edu.stanford.nlp.ling.{ CoreLabel, CoreAnnotations }
import org.clulab.scala.WrappedArray._

import java.io.FileInputStream
import java.util.zip.GZIPInputStream
import scala.collection.mutable
import scala.io.Source
import edu.stanford.nlp.ling.{ CoreLabel, CoreAnnotations }
import scala.util.Using

object TrainChunker extends App {

Expand Down Expand Up @@ -63,9 +65,9 @@ object TrainChunker extends App {

def readData(path: String): Array[Array[CoreLabel]] = {
val is = new GZIPInputStream(new FileInputStream(path))
val source = Source.fromInputStream(is)
val text = source.mkString
source.close()
val text = Using.resource(Source.fromInputStream(is)) { source =>
source.mkString
}
// sentences are separated by an empty line
val sentences = text.split("\n\n")
sentences.map { sent =>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,31 +1,32 @@
package org.clulab.processors.examples

import java.io.{BufferedReader, FileReader}

import org.clulab.serialization.DocumentSerializer

import java.io.{BufferedReader, FileReader}
import scala.util.Using

/**
*
* User: mihais
* Date: 10/1/14
*/
object DocumentSerializerExample {
def main(args:Array[String]): Unit = {
val ds = new DocumentSerializer
val r = new BufferedReader(new FileReader(args(0)))
var done = false
var count = 0
while(! done) {
val d = ds.load(r)
if(d == null) {
done = true
} else {
count += 1
if(count % 10 == 0)
println(s"Loaded $count documents...")
Using.resource(new BufferedReader(new FileReader(args(0)))) { r =>
val ds = new DocumentSerializer
var done = false
while (!done) {
val d = ds.load(r)
if (d == null) {
done = true
} else {
count += 1
if (count % 10 == 0)
println(s"Loaded $count documents...")
}
}
}
r.close()
println(s"Done! Loaded $count documents.")
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import org.clulab.struct.DirectedGraphEdgeIterator

object DocumentationExample extends App {
// Create the processor. Any processor works here!
// Try FastNLPProcessor or our own CluProcessor.
// Try FastNLPProcessor or our own BalaurProcessor.
val proc: Processor = new CoreNLPProcessor()

// val proc: Processor = new FastNLPProcessor()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,14 @@ import org.clulab.processors.Document
import org.clulab.processors.Processor
import org.clulab.processors.fastnlp.FastNLPProcessor
import org.clulab.serialization.DocumentSerializer
import org.clulab.utils.Closer.AutoCloser
import org.clulab.utils.FileUtils
import org.clulab.utils.ThreadUtils
import org.clulab.utils.Timer
import org.clulab.utils.{FileUtils, StringUtils, ThreadUtils, Timer}

import java.io.BufferedOutputStream
import java.io.File
import java.io.FileOutputStream
import java.io.PrintWriter
import java.io.StringWriter
import scala.collection.parallel.ParSeq
import scala.util.Using

object InfiniteParallelProcessorExample {

Expand Down Expand Up @@ -49,15 +46,8 @@ object InfiniteParallelProcessorExample {
val text = FileUtils.getTextFromFile(file)
val outputFile = new File(outputDir + "/" + file.getName)
val document = processor.annotate(text)
val printedDocument = {
val stringWriter = new StringWriter

new PrintWriter(stringWriter).autoClose { printWriter =>
printDocument(document, printWriter)
}

val result = stringWriter.toString
result
val printedDocument = StringUtils.viaPrintWriter { printWriter =>
printDocument(document, printWriter)
}
val savedDocument = documentSerializer.save(document)
val outputDocument = printedDocument + savedDocument
Expand Down Expand Up @@ -86,7 +76,7 @@ object InfiniteParallelProcessorExample {
def run(args: Array[String]): Unit = {

mainWithCallback(args) { case (file: File, contents: String) =>
new PrintWriter(new BufferedOutputStream(new FileOutputStream(file))).autoClose { printWriter =>
Using.resource(new PrintWriter(file)) { printWriter =>
printWriter.println(contents)
}
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,19 +1,14 @@
package org.clulab.processors.examples

import java.io.BufferedOutputStream
import java.io.File
import java.io.FileOutputStream
import java.io.PrintWriter
import java.io.StringWriter
import org.clulab.processors.Document
import org.clulab.processors.Processor
import org.clulab.processors.clu.BalaurProcessor
import org.clulab.processors.fastnlp.FastNLPProcessor
import org.clulab.serialization.DocumentSerializer
import org.clulab.utils.Closer.AutoCloser
import org.clulab.utils.FileUtils
import org.clulab.utils.ThreadUtils
import org.clulab.utils.Timer
import org.clulab.utils.{FileUtils, StringUtils, ThreadUtils, Timer}

import java.io.File
import java.io.PrintWriter
import scala.util.Using

object ParallelProcessorExample {

Expand All @@ -25,7 +20,7 @@ object ParallelProcessorExample {
val outputDir = args(1)
val extension = args(2)
val threads = args(3).toInt
val parallel = args.lift(4).exists(_ == "true")
val parallel = args.lift(4).contains("true")

val files = FileUtils.findFiles(inputDir, extension)
val serFiles = files.sortBy(-_.length)
Expand Down Expand Up @@ -60,15 +55,8 @@ object ParallelProcessorExample {
println(s"Threw exception for ${file.getName}")
throw throwable
}
val printedDocument = {
val stringWriter = new StringWriter

new PrintWriter(stringWriter).autoClose { printWriter =>
printDocument(document, printWriter)
}

val result = stringWriter.toString
result
val printedDocument = StringUtils.viaPrintWriter { printWriter =>
printDocument(document, printWriter)
}
val savedDocument = documentSerializer.save(document)
val outputDocument = printedDocument + savedDocument
Expand All @@ -83,7 +71,7 @@ object ParallelProcessorExample {
def run(args: Array[String]): Unit = {

mainWithCallback(args) { case (file: File, contents: String) =>
new PrintWriter(new BufferedOutputStream(new FileOutputStream(file))).autoClose { printWriter =>
Using.resource(new PrintWriter(file)) { printWriter =>
printWriter.println(contents)
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ object ProcessorExample {

// other processors supported:
// BioNLPProcessor, and FastBioNLPProcessor - for the biomedical domain
// CluProcessor - similar to FastNLPProcessor, but using tools licensed under the Apache license
// BalaurProcessor - similar to FastNLPProcessor, but using tools licensed under the Apache license

// the actual work is done here
val doc = proc.annotate("John Smith went to China. He visited Beijing, on January 10th, 2013.")
Expand Down
11 changes: 4 additions & 7 deletions corenlp/src/test/scala/org/clulab/processors/TestOpenIE.scala
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ import org.clulab.processors.corenlp.CoreNLPProcessor
import org.clulab.processors.fastnlp.FastNLPProcessor
import org.clulab.processors.shallownlp.ShallowNLPProcessor
import org.clulab.serialization.DocumentSerializer
import org.clulab.utils.Test

import java.io.{PrintWriter, StringWriter}
import org.clulab.utils.{StringUtils, Test}

import scala.collection.mutable

Expand All @@ -20,10 +18,9 @@ class TestOpenIE extends Test {
private lazy val fastNLPDoc = fastNLP.annotate(text)
private lazy val coreNLPDoc = coreNLP.annotate(text)

private val buffer = new StringWriter()
serializer.save(fastNLPDoc, new PrintWriter(buffer))
private val serialized = buffer.toString

private val serialized = StringUtils.viaPrintWriter { printWriter =>
serializer.save(fastNLPDoc, printWriter)
}
private val deserializedDoc = serializer.load(serialized)

def openIEBehavior(doc:Document): Unit = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import org.clulab.utils.Test
class TestParenthesesInCore extends Test {
val fast = new FastNLPProcessor()

"CluProcessor" should "tokenize, lemmatize, and POS tag parentheses correctly" in {
"Processor" should "tokenize, lemmatize, and POS tag parentheses correctly" in {
val doc = fast.mkDocument("Moreover, in von Willebrand factor-stimulated platelets, the tyrosine phosphorylation of pp60(c-src) is closely associated with the activation of phosphatidylinositol 3-kinase (PIK), and two adhesion receptors, glycoprotein (Gp)Ib and GpIIb/IIIa(alpha-IIb-beta(3)), are involved. ")
fast.tagPartsOfSpeech(doc)
fast.lemmatize(doc)
Expand Down
Loading