I wanted to play around with a corpus of tweets, but none is openly distributed due to twitter terms of services. So I built this utility to create my own.
This program uses twitter streaming api, which can only be accessed by authenticated user. In order to create your credentials, you must create a twitter application using https://dev.twitter.com/apps/new.
java -jar twitter-sampler-1.0.0-standalone.jar -c credentials.clj -n 1000 -t '#clippers' tweets.json
credentials.clj contains your twitter credentials,
specifies the number of tweets to download,
-t specifies an optional
coma separated list of keywords or hash to track and
the file where tweets are saved. You can also specify a proxy
configuration file using
proxy-sample.clj for a sample
For details about tweets structure, see https://dev.twitter.com/docs/platform-objects/tweets.
How does it work?
Copyright (C) 2012-2014 Alexandre Patry
Distributed under the Eclipse Public License, the same as Clojure.