Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added "fall" as a season & normalized "week of month" patterns #736

Merged
merged 13 commits into from
Sep 1, 2023

Conversation

alicekwak
Copy link
Contributor

Added "fall" as a season & normalized "week of month" patterns

  1. added fall as a season and added post-processing method for season homonyms (fall/spring)
  2. normalized "week of month" patterns into date ranges (e.g., first week of May, last two weeks of June, etc.)

Copy link
Member

@kwalcock kwalcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm going to have to double check my own work, though. I'll run through it again before it gets merged.

val prevWords = m.sentenceObj.words.slice(wordIndex - 2, wordIndex)
val contextWords = m.sentenceObj.words.slice(wordIndex - 5, wordIndex + 5)

(prevWords.contains("in") && prevWords.contains("the")) || prevWords.contains("this") || prevWords.contains("last") || prevWords.contains("every") ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the in" would pass the test as well as "in the" does. There is a containsSlice that might work to preserve the order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your comment is correct. But I would argue it is overkill here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the in" would pass the test as well as "in the" does. There is a containsSlice that might work to preserve the order.

Yes, that's a good point. The filter is not perfectly precise at this moment. My intention was to create a filter that is not too complicated but works reasonably well. But if any issues are anticipated due to this filter being not exact, I'd be more than happy to do the revision!

val contextWords = m.sentenceObj.words.slice(wordIndex - 5, wordIndex + 5)

(prevWords.contains("in") && prevWords.contains("the")) || prevWords.contains("this") || prevWords.contains("last") || prevWords.contains("every") ||
contextWords.exists(_.matches("[0-9]{0,4}"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the exists construction, one could take the three contains from above and write

contextWords.exists(Array("this", "last", "every").contains)

Comparison is case-sensitive here even though above I see equalsIgnoreCase. "This spring..." might not match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the exists construction, one could take the three contains from above and write

contextWords.exists(Array("this", "last", "every").contains)

Comparison is case-sensitive here even though above I see equalsIgnoreCase. "This spring..." might not match.

Thanks! I didn't know how to make that simpler. I'll revise the code as you suggested. And I'll also think about how to make the comparison case-insensitive, as I want patterns like "This spring..." to match.

behavior of "Default SeasonalCluProcessor"

it should "find true seasons in trueSeason1" in {
val processor = new CluProcessor()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be a single CluProcessor for the entire file of tests. It is very expensive to create. If the processor above is moved like so, it might work:

  behavior of "Default SeasonalCluProcessor"

  val processor = new CluProcessor()

  it should "find autumn but not rainy season" in {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be a single CluProcessor for the entire file of tests. It is very expensive to create. If the processor above is moved like so, it might work:

  behavior of "Default SeasonalCluProcessor"

  val processor = new CluProcessor()

  it should "find autumn but not rainy season" in {

Thanks for the great suggestion! I didn't know if that's possible or not (and also didn't know that creating CluProcessor is very expensive!) so just followed the pattern I found from the other test. I'll change the code as you suggested and see how it works.

behavior of "WeekCluProcessor"

it should "find first two weeks of April" in {
val processor = new CluProcessor()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about the single CluProcessor.

@kwalcock
Copy link
Member

kwalcock commented Aug 8, 2023

FYI, @alicekwak.

[info] *** 5 TESTS FAILED ***
[error] Failed tests:
[error] 	org.clulab.numeric.TestNumericEntityRecognition
[error] 	org.clulab.numeric.TestEvalTimeNorm
[error] 	org.clulab.utils.TestDependencyUtils
[error] 	org.clulab.odin.TestMention

[info] - should recognize date ranges from seasons *** FAILED ***
[info]   "[O]" was not equal to "[B-DATE-RANGE]" (TestNumericEntityRecognition.scala:621)
[info]   Analysis:
[info]   "[O]" -> "[B-DATE-RANGE]"

[info] - should recognize date ranges with seasons *** FAILED ***
[info]   "[O]" was not equal to "[B-DATE-RANGE]" (TestNumericEntityRecognition.scala:621)
[info]   Analysis:
[info]   "[O]" -> "[B-DATE-RANGE]"

[info] - should not degrade in performance *** FAILED ***
[info]   0.8487485647201538 was not greater than or equal to 0.85 (TestEvalTimeNorm.scala:15)

[info] - should produce one head using findHeads *** FAILED ***
[info]   Some(0) was not equal to None (TestDependencyUtils.scala:75)
[info]   Analysis:
[info]   Some(x: 0 -> )

[info] - should get None when there are no roots *** FAILED ***
[info]   Some(0) was not equal to None (TestMention.scala:59)

@alicekwak
Copy link
Contributor Author

FYI, @alicekwak.

[info] *** 5 TESTS FAILED ***
[error] Failed tests:
[error] 	org.clulab.numeric.TestNumericEntityRecognition
[error] 	org.clulab.numeric.TestEvalTimeNorm
[error] 	org.clulab.utils.TestDependencyUtils
[error] 	org.clulab.odin.TestMention

[info] - should recognize date ranges from seasons *** FAILED ***
[info]   "[O]" was not equal to "[B-DATE-RANGE]" (TestNumericEntityRecognition.scala:621)
[info]   Analysis:
[info]   "[O]" -> "[B-DATE-RANGE]"

[info] - should recognize date ranges with seasons *** FAILED ***
[info]   "[O]" was not equal to "[B-DATE-RANGE]" (TestNumericEntityRecognition.scala:621)
[info]   Analysis:
[info]   "[O]" -> "[B-DATE-RANGE]"

[info] - should not degrade in performance *** FAILED ***
[info]   0.8487485647201538 was not greater than or equal to 0.85 (TestEvalTimeNorm.scala:15)

[info] - should produce one head using findHeads *** FAILED ***
[info]   Some(0) was not equal to None (TestDependencyUtils.scala:75)
[info]   Analysis:
[info]   Some(x: 0 -> )

[info] - should get None when there are no roots *** FAILED ***
[info]   Some(0) was not equal to None (TestMention.scala:59)

Thanks! I'll fix the broken tests before I make this into a real PR.

@alicekwak
Copy link
Contributor Author

Hi @kwalcock and @MihaiSurdeanu, I did some debugging and made some more tests pass. However, I'm still failing on three tests (TestParallel, TestDependencyUtils, TestMention) and I'm not sure why they are failing. Below is the console output that I'm seeing. If you have any idea why the tests are failing, could you let me know? Thank you!

[info] *** 1 TEST FAILED ***
[error] Failed tests:
[error] org.clulab.processors.TestParallel
[info] Run completed in 16 minutes, 36 seconds.
[info] Total number of tests run: 19
[info] Suites: completed 4, aborted 0
[info] Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Run completed in 16 minutes, 36 seconds.
[info] Total number of tests run: 548
[info] Suites: completed 79, aborted 0
[info] Tests: succeeded 546, failed 2, canceled 0, ignored 1, pending 0
[info] *** 2 TESTS FAILED ***
[error] Failed tests:
[error] org.clulab.utils.TestDependencyUtils
[error] org.clulab.odin.TestMention
[error] (main / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] (corenlp / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 1045 s (17:25), completed Aug 20, 2023, 6:28:33 AM

@kwalcock
Copy link
Member

FWIW I'm looking at it.

@alicekwak alicekwak marked this pull request as ready for review August 27, 2023 06:10
@alicekwak
Copy link
Contributor Author

Hi @kwalcock and @MihaiSurdeanu, I believe this PR is now ready to be reviewed and (hopefully) merged now. Would you be able to have a look at this and let me know if there's any issue left? Thank you!

Copy link
Member

@kwalcock kwalcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more concerns still...

@@ -892,6 +921,34 @@ package object mentions {
else seasonNormalizer.norm(wordsOpt.get)
}

private def getWeekRange(weekNormalizer: WeekNormalizer)(argName: String, m:Mention): Option[WeekRange] = {
val wordsOpt = getArgWords(argName, m)
print("this is wordsOpt: " ++ wordsOpt.get.mkString(" "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be removed. Commenting out is OK. println is better for Scala.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this up! I removed the line.

@@ -51,7 +51,7 @@ object TempEvalFormatter {
}
}

private def convertLiteralMonth(s: String): Int = {
def convertLiteralMonth(s: String): Int = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point these 12 ifs need to change.

print("this is wordsOpt: " ++ wordsOpt.get.mkString(" "))

if (wordsOpt.isEmpty) None
else if (wordsOpt.get.mkString(" ").toLowerCase().equals("last week")) {getLastWeekRange(m)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will eventually do something about the duplicate calculation here.

// A common introduction to a season
val inThe: Array[String] = Array("in", "the")
// Match a 1 to 4 digit year
val yearPattern = Pattern.compile("[0-9]{1,4}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this really is for a year, I can only imagine 2 or 4 digits, like summer of '69 or 2023. Could this be something like "[0-9]{2}|[0-9]{4}"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great suggestion! I just replaced the regex pattern to "[0-9]{2}|[0-9]{4}" as you suggested.

Copy link
Contributor

@MihaiSurdeanu MihaiSurdeanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM!

@kwalcock kwalcock merged commit d2e5e4a into master Sep 1, 2023
1 check passed
@kwalcock kwalcock deleted the alice-date-revision branch September 1, 2023 15:45
@kwalcock
Copy link
Member

kwalcock commented Sep 1, 2023

Thanks, @alicekwak!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants