Skip to content

cbuntain/TwitterStreamFilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Compress and Store the Twitter Public Sample

Intro

This project allows you to run a set-it-and-forget-it system for streaming the Twitter public sample stream to a local set of gzipped files. Each file will contain the JSON for all the tweets from the 1% stream.

Building

This project was written with Maven, so you should be able to do mvn package and use the resulting jar file TwitterStreamFilter-1.0-SNAPSHOT-jar-with-dependencies.jar in the target directory.

Running

You have a few options with this code. If you run it without arguments, it should download the unfiltered 1% stream. You can run code for that as follows:

  • java -Xmx1536m -jar TwitterStreamFilter-1.0-SNAPSHOT-jar-with-dependencies.jar

You can also provide a file with keywords to track (separated by a newline) using the --keywords/-k flag, a GeoJSON file to get tweets within a geographic area using the --bounds/-b flag, or the path to a file with a list of user IDs to track with the --users/-u flag.

While running, this code will produce several files: warnings.log.YYYY-MM-DD-HH, statuses.log.YYYY-MM-DD-HH, and many .gz files. At the end of every hour, the current warnings.log. and statuses.log.* will be gzipped automatically.

About

Simple Java code for storing the Twitter stream in local gzipped files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published