New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Spanish grammar for TimeNorm #68

Closed

NGEscribano wants to merge 12 commits into clulab:master from NGEscribano:master

NGEscribano commented Apr 27, 2023

No description provided.

NGEscribano added 7 commits

March 10, 2022 12:39


          Spanish grammar added

252803a


          Spanish grammar processing added

ac24258


          Update README.md

fd4bb5f


          Evaluator added and minor problem solved

e090b92


          Merge branch 'master' of https://github.com/NGEscribano/timenorm

a7aadbe


          Update README.md


          Italian usage added

509580e

kwalcock reviewed

View reviewed changes

Member

kwalcock left a comment

Thank you! I'm sure @bethard will have comments. It looks like this repo is configured for Travis, but I don't think that is still working, so this might need to be tested locally.

README.md Show resolved Hide resolved

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                standard normalizations
+                */
+                def main(lang:String, in_file:String, out_file:String) {

Member

kwalcock Apr 27, 2023

It's all nice and neat. Someone might be concerned with the Python conventions, though. This line needs an = for Scala 2.13.

Author

NGEscribano Apr 28, 2023

OK, thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                  the list with all the normalizations*/
+                  // Select the parser for the desired grammar depending on the language
+                  val parser = lang match {

Member

kwalcock Apr 27, 2023

Maybe we should change the compiler settings so that there is a complaint about the unexhaustive match.

Author

NGEscribano Apr 28, 2023

As before, all improvements are welcome!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                  // Process the DCT depending on the presence of a time reference
+                  val pattern = "T".r
+                  val anchor = pattern.findFirstIn(dct_timex) match {

Member

kwalcock Apr 27, 2023

Maybe dct_timex.contains('T')

Author

NGEscribano Apr 28, 2023

Thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                  val anchor = pattern.findFirstIn(dct_timex) match {
+                    // Get the anchor timespan if time is specified
+                    case Some(_) =>
+                      val dct = dct_timex.split("T")

Member

kwalcock Apr 27, 2023

Would a DateTimeFormatter make more sense?

Author

NGEscribano Apr 28, 2023

Possibly. Thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                  var norm_counter = 0
+                  // Iterate over timex list and get each timex, gold and norm set
+                  for (i <- 0 to timex_list.length - 1) {

Member

kwalcock Apr 27, 2023

for (i <- timex_list.indices) { can make life easier here.

Author

NGEscribano Apr 28, 2023

Thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                    // If this is a timex, write the data and sum a gold value
+                    if (timex != "") {
+                      writer.write(s"${timex}\t${gold}\t${norm}\n")

Member

kwalcock Apr 27, 2023

Since it is a PrintWriter, println() would work well.

Author

NGEscribano Apr 28, 2023

OK, thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                      if (gold != "-" && norm == gold) {norm_counter += 1}
+                    }
+                    // If this is a doc separator, write a newline
+                    else {writer.write(s"\n")}

Member

kwalcock Apr 27, 2023

Ditto. writer.println().

Author

NGEscribano Apr 28, 2023

OK!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                      gold_list += ""
+                    }
+                  }
+                  return (timex_list.toList, gold_list.toList)

Member

kwalcock Apr 27, 2023

It looks like we're not writing return. Can they be removed throughout?

Author

NGEscribano Apr 28, 2023

I appreciate any improvement for a better performance :)

kwalcock reviewed

View reviewed changes

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

Comment on lines +43 to +75

+                def get_content(in_file:String) : (List[String], List[String]) = {
+                  /**Obtains the content from the input file as timex and value lists*/
+                  // Turn input file to a list of lines
+                  val content = Source.fromFile(in_file).getLines.toList
+                  // Obtain the standard line length from the first DCT
+                  val std_line_length = content(0).split("\t").length
+                  val timex_list = ListBuffer[String]()
+                  val gold_list = ListBuffer[String]()
+                  for (line <- content) {
+                    // If line is not a doc separator (indicated by empty string):
+                    if (line != "") {
+                      // If line length equals the standard, get the timex and its gold value
+                      if (line.split("\t").length == std_line_length) {
+                        timex_list += line.split("\t").head
+                        gold_list += line.split("\t").last
+                      }
+                      // If this is a detected timex absent in the evaluation corpus, get the
+                      // timex but append "" as gold normalization
+                      else {
+                        timex_list += line.split("\t").head
+                        gold_list += "-"
+                      }
+                    }
+                    // Otherwise, add empty strings to mark end of document timexes
+                    else {
+                      timex_list += ""
+                      gold_list += ""
+                    }
+                  }
+                  return (timex_list.toList, gold_list.toList)

Member

kwalcock Apr 27, 2023

FWIW, it depends on the file, but this method may amount to

  def get_content(in_file:String) : (List[String], List[String]) = {
    /**Obtains the content from the input file as timex and value lists*/

    val source = Source.fromFile(in_file)
    val content = source.getLines.toList.map { line =>
      val split = line.split("\t")
      (split.lift(0).getOrElse(""), split.lift(1).getOrElse(""))
    }.unzip
    source.close()
    content
  }

Author

NGEscribano Apr 28, 2023

Thanks!

NGEscribano commented

View reviewed changes

Author

NGEscribano left a comment

Thanks so much for the comments! Feel free to make any improvement you consider appropriate

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                standard normalizations
+                */
+                def main(lang:String, in_file:String, out_file:String) {

Author

NGEscribano Apr 28, 2023

OK, thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                      gold_list += ""
+                    }
+                  }
+                  return (timex_list.toList, gold_list.toList)

Author

NGEscribano Apr 28, 2023

I appreciate any improvement for a better performance :)

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

Comment on lines +43 to +75

+                def get_content(in_file:String) : (List[String], List[String]) = {
+                  /**Obtains the content from the input file as timex and value lists*/
+                  // Turn input file to a list of lines
+                  val content = Source.fromFile(in_file).getLines.toList
+                  // Obtain the standard line length from the first DCT
+                  val std_line_length = content(0).split("\t").length
+                  val timex_list = ListBuffer[String]()
+                  val gold_list = ListBuffer[String]()
+                  for (line <- content) {
+                    // If line is not a doc separator (indicated by empty string):
+                    if (line != "") {
+                      // If line length equals the standard, get the timex and its gold value
+                      if (line.split("\t").length == std_line_length) {
+                        timex_list += line.split("\t").head
+                        gold_list += line.split("\t").last
+                      }
+                      // If this is a detected timex absent in the evaluation corpus, get the
+                      // timex but append "" as gold normalization
+                      else {
+                        timex_list += line.split("\t").head
+                        gold_list += "-"
+                      }
+                    }
+                    // Otherwise, add empty strings to mark end of document timexes
+                    else {
+                      timex_list += ""
+                      gold_list += ""
+                    }
+                  }
+                  return (timex_list.toList, gold_list.toList)

Author

NGEscribano Apr 28, 2023

Thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                  the list with all the normalizations*/
+                  // Select the parser for the desired grammar depending on the language
+                  val parser = lang match {

Author

NGEscribano Apr 28, 2023

As before, all improvements are welcome!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                  // Process the DCT depending on the presence of a time reference
+                  val pattern = "T".r
+                  val anchor = pattern.findFirstIn(dct_timex) match {

Author

NGEscribano Apr 28, 2023

Thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                  val anchor = pattern.findFirstIn(dct_timex) match {
+                    // Get the anchor timespan if time is specified
+                    case Some(_) =>
+                      val dct = dct_timex.split("T")

Author

NGEscribano Apr 28, 2023

Possibly. Thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                  var norm_counter = 0
+                  // Iterate over timex list and get each timex, gold and norm set
+                  for (i <- 0 to timex_list.length - 1) {

Author

NGEscribano Apr 28, 2023

Thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                    // If this is a timex, write the data and sum a gold value
+                    if (timex != "") {
+                      writer.write(s"${timex}\t${gold}\t${norm}\n")

Author

NGEscribano Apr 28, 2023

OK, thanks!

src/main/scala/org/clulab/timenorm/scfg/Evaluator.scala

+                      if (gold != "-" && norm == gold) {norm_counter += 1}
+                    }
+                    // If this is a doc separator, write a newline
+                    else {writer.write(s"\n")}

Author

NGEscribano Apr 28, 2023

OK!

Nayla Escribano added 2 commits

April 28, 2023 12:37


          Update README.md

a239bcb


          Update README.md

dd118da

Collaborator

bethard commented May 16, 2023

I won't have a chance to look at this before the end of June, so @kwalcock has my permission to review, clean up, and merge as deemed appropriate.

Member

kwalcock commented May 18, 2023

@NGEscribano, do you happen to have a file meeting the description of "timex [type] gold_value" tab-separated format that can be used for testing?

NGEscribano added 2 commits

May 19, 2023 10:03


          TimeBank datasets added for testing

b2492d9


          Merge branch 'master' of https://github.com/NGEscribano/timenorm-es

b1b048d

Author

NGEscribano commented May 19, 2023

Sure! I just added a "datasets" directory with the train and test sets from TempEval-3 formatted for this normalization task. This test file should work.


          Update README.md

4bcd030

Member

kwalcock commented May 20, 2023

@NGEscribano, your code has been incorporated into #70. Please take a look at it if you can and let me know if you have concerns. I don't know whether you would want it to be merged back into your repo.

Member

kwalcock commented May 30, 2023

This PR is being closed in favor of #70, which includes all the commits from this one except for changes to README.md. Since this PR originates from a fork, it was easier to do that than to modify the fork so that the modifications would show up here under #68. So, the changes have been incorporated, just not with this PR.

kwalcock closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet