Permalink
Browse files

added tokenize function

  • Loading branch information...
1 parent 89eb07c commit 39b9c1aa9e2527ffcd6f421af52a1ac700c4ae5c @andrefs committed Jun 3, 2012
Showing with 14 additions and 1 deletion.
  1. +14 −1 lib/Lingua/EN/Tokenizer/Offsets.pm
@@ -11,13 +11,26 @@ our @EXPORT_OK = qw/
token_offsets
adjust_offsets
get_tokens
+ tokenize
/;
# ABSTRACT: Finds word (token) boundaries, and returns their offsets.
+=method tokenize($text)
+
+Takes text as input and returns a tokenized version (space-separated tokens).
+
+=cut
+
+sub tokenize {
+ my ($text) = @_;
+ my $tokens = get_tokens($text);
+ return join ' ',@$tokens;
+}
+
=method get_offsets($text)
@@ -81,7 +94,7 @@ sub adjust_offsets {
return $new_offsets;
}
-=head2 initial_offsets($text)
+=method initial_offsets($text)
First naive delimitation of tokens.

0 comments on commit 39b9c1a

Please sign in to comment.