This code is written by : Motaz_sassd
tweets-collector
Collect tweets (tweets corpus) using Twitter API.
Collection can be based on hashtags or keywords.
install requirements
pip install -r requirements
Getting your API keys from Twitter
-
Go to https://apps.twitter.com and create an new app
-
Provide a name and describe for the app, then specify permissions
-
put these info in credentials.txt and in api_keys.py files.
query_tweets.py Usage
usage: query_tweets.py [-h] -k KEYWORDS_FILE -o OUTFILE -n NUMBER
collect tweets based on keywords
optional arguments:
-h, --help show this help message and exit
-k KEYWORDS_FILE, --keywords-file KEYWORDS_FILE
keywords or hashtags file. The file should contain one
keyword/hashtag per line
-o OUTFILE, --outfile OUTFILE
the output json file path and prefix.
-n NUMBER, --number NUMBER
the number of tweets that you want to collect
json2text.py Usage
usage: json2text.py [-h] -i JSON_DIR -o OUT_DIR
extract tweet texts from json
optional arguments:
-h, --help show this help message and exit
-i JSON_DIR, --json-dir JSON_DIR
tweets json directory
-o OUT_DIR, --out-dir OUT_DIR
the output directory.
stream_geolocation.py Usage
Get Geo locations from http://boundingbox.klokantech.com/
usage: stream_geolocation.py [-h] -l GEO_LOCATIONS -j JSON -n NUMBER
collect tweets based on geographic location
optional arguments:
-h, --help show this help message and exit
-l GEO_LOCATIONS, --geo-locations GEO_LOCATIONS
geo location coordinates from
http://boundingbox.klokantech.com copy and past using
csv option
-j JSON, --json JSON the the json output file.
-n NUMBER, --number NUMBER
the number of tweets that you want to collect
stream_users.py Usage
Get users id from https://tweeterid.com
usage: stream_users.py [-h] -u USERS -j JSON -n NUMBER
collect tweets based on following twitter users
optional arguments:
-h, --help show this help message and exit
-u USERS, --users USERS
twitter user ids file. Get ids from tweeterid.com
-j JSON, --json JSON the the json output file.
-n NUMBER, --number NUMBER
the number of tweets that you want to collect
Some pre processing in Java code
use the java file inside Rad Arabic File corpus for Reading the txt file process the Repeated characters Normalize the .txt file