Find authorities for Twitter topics. [Licenta][Undergraduate thesis]
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
dump
helpers
migrate
paper
render_results
tests
.gitignore
Makefile
README.md
config.rc
constants.py
main.py
patch_tweepy.py
preprocess_words.py
requirements.txt
stopwords.txt

README.md

Summary

Find authorities for Twitter topics. Needs to download tweets (fetching stage), then it can compute based on fetched tweets. Requirements:

  • the implementation uses mongodb to store downloaded tweets
  • see requirements.txt for more, but main packages used are: pymongo, tweepy, scikit-learn, numpy, scipy

See results from datasets included in dump/licenta folder. Results are in .html files in dump/licenta/results/ folder. These results have been obtained with the current code, by using only Retweet Impact and Mention Impact features, see paper/ folder for way more details and how we combined the metrics and analyzed results.

Usage

  1. populate mongoDB database with tweets
  • use existing tweets in dump/ folder. Import by using the dump/import.sh script like so:
    • have a mongod running instance before running the script
    • run the script from root folder like $ ./dump/import.sh and you're done.
    • verify in mongod that you have successfully imported the data and the features+metrics:
    > db.getCollection("halep").count()  # 2736`
    > db.getCollection("ukraine gas russia").count()  # 16203
    
  • or fetch it yourself like say for topic "ukraine gas russia":
# tells it to fetch a maximum of 100 pages x 100 per page => 10,000 tweets.
$ ./main.py fetch "ukraine gas russia" 100
  1. after we've added tweets in mongoDB, we can compute authorities for the collected tweets:
# 2nd param should be a valid collection in mongoDB containing tweets.
# should give us the main authorities by processing all tweets from mongo collection "ukraine gas russia".
./main.py compute "ukraine gas russia"