Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

I added a convenience method... #8

Merged
merged 1 commit into from

2 participants

@stevenbedrick

... that simply joins up the tokenizer's output into a whitespace-delimited string, which should make it easier to include this in a larger Tweet-processing pipeline involving Java classes.

I found myself needing to use Twokenize from a Java class that just needed a string representation of a tokenized Tweet, and was reminded the fact that Java makes this sort of thing way harder than it should be. Luckily, it's almost a one-liner in Scala (+1 for your choice of language, guys!), so I thought it would be a worthwhile addition to the program.

@stevenbedrick stevenbedrick Added a convenience method that simply joins up the tokens into a whi…
…tespace-delimited string, which should make it easier to include this in a larger Tweet-processing pipeline.
5311910
@brendano brendano merged commit 35bcfa2 into from
@brendano
Owner

cool thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 16, 2011
  1. @stevenbedrick

    Added a convenience method that simply joins up the tokens into a whi…

    stevenbedrick authored
    …tespace-delimited string, which should make it easier to include this in a larger Tweet-processing pipeline.
This page is out of date. Refresh to see the latest.
Showing with 6 additions and 0 deletions.
  1. +6 −0 src/edu/cmu/cs/lti/ark/tweetnlp/twokenize.scala
View
6 src/edu/cmu/cs/lti/ark/tweetnlp/twokenize.scala
@@ -288,6 +288,12 @@ object Twokenize {
tokenizeForTagger(text).toSeq
}
+ // Convenience method to produce a string representation of the
+ // tokenized tweet in a standard-ish format.
+ def tokenizeToString (text: String): String = {
+ tokenizeForTagger(text).mkString(" ");
+ }
+
// Main method
def main (args: Array[String]) = {
// force stdin/stdout interpretation as UTF-8
Something went wrong with that request. Please try again.