Throw away code to benchmark Tika language detection on tweets.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
src/com/textjuicer/tikaontwitter
.gitignore
README.md
project.clj

README.md

Tika on Twitter

Throw away code to evaluate Tika language detection on tweets.

Usage

Once you have downloaded a corpus of tweets using twitter-sample, you can run the language identifier using the following command:

lein run < tweets.json > tweets.csv

Where tweet.csv will look like data/tweets.csv.

License

Copyright © 2014 Alexandre Patry

Distributed under the Eclipse Public License, the same as Clojure.