Skip to content
Lefteris Paraskevas edited this page Apr 30, 2016 · 6 revisions

In order to execute the available code, you need the following items available and/or configured to your PC:

config.properties: A file that must be placed in the /src/ folder and contains all external variables that must be configured in order for the program to properly run. A demo config file can be found here. In short, the file is divided in six sections:

  • Twitter Keys: The user can paste the Twitter Developer Keys if he/she wishes to use the Twitter Streaming API.
  • Names and variables: Database names, ports, hosts etc.
  • Filepaths: Paths to output files, datasets etc.
  • MongoDB files: Paths specifically for MongoDB files.
  • MongoDB fields: Names of the used MongoDB collection fields and documents.
  • Console fields: Name of the arguments used when a user utilizes the input console.

MongoDB: A running instance of MongoDB must be installed in the host PC (version 3.0 and above).

MongoDB Database Structure: The dataset that will be stored in the MongoDB must obey to the following schema:

  • All tweets will be stored as documents in a collection with a desired name.
  • Every document must contain the following fields:
    • A long entity named "id" that contains the tweet's ID.
    • A document named "user" that contains user information.
      • A String named "name" that contains the display name of the user.
      • A long named "id" that contains the user ID.
    • A String named "text" that contains the actual un-edited text of the tweet.
    • A Date entity named "created_at" that contains the actual date of creation of the tweet.
    • A String named "lang" that represents the language code of the text of the tweet.
    • A document named "coordinates" that provides additional information for the location of the tweet.
      • The document must contain an array[1x2] with latitude and longitude information, respectively.
    • An integer entity named "retweet_count" that contains the number of retweets on this tweet.
    • An integer entity named "favorite_count" that contains the number of favorites on this tweet.
    • A boolean entity named "retweeted" that is marked as true when the tweet is retweeted and false otherwise.
    • A boolean entity named "favorited" that is marked as true when the tweet is favorited and false otherwise.
    • A document named "retweeted_status" that contains information about the tweet that the particular tweet was originated from.
      • A long entity named "id" that contains the ID of the original tweet.
    • An integer entity named "stanfordSentiment" that contains the sentiment of the tweet, calculated with Stanford Sentiment Treebank.
    • An integer entity named "bayesianNetSentiment" that contains the sentiment of the tweet, calculated with a Bayesian Net Classifier.
    • An integer entity named "naiveSentiment" that contains the sentiment of the tweet, calculated with a Naive Bayes Classifier.
    • An integer entity named "posEmot" that contains 1 if the tweet has at least one positive emoticon in each text and 0 otherwise.
    • An integer entity named "negEmot" that contains 1 if the tweet has at least one negative emoticon in each text and 0 otherwise.

NOTE: The field names are not fixed and the user is free to use his/her own but all previous fields must be created in order for the tool to properly work. Whatever names are going to be used, they must be stored in the appropriate section in the config.properties file.

Resources folder structure: The structure of the resources folder must be the following:

  • /resources/
    • emoticons/
    • output/
      • edcow/
      • peakfinding/
      • ...
    • stop-words/

Libraries: The following libraries must be downloaded and added to the project. If you have troubles finding them, please e-mail me and I'll send you a .zip with all of them.

  • bson-3.0.4.jar
  • commons-cli-1.3.1.jar
  • commons-collections4.4.0.jar
  • commons-io.2.4.jar
  • commons-lang3-3.1.jar
  • commons-math3-3.4.jar
  • controlsfx-8.40.10-20150826.135843-344.jar
  • ejml-0.23.jar
  • gs-algo-1.2.jar
  • gs-core-1.2.jar
  • gs-ui-1.2.jar
  • guava-18.0.jar
  • HAC.jar
  • javassit.jar
  • jmod-1.2b.jar
  • jtransforms-2.4.jar
  • jWave_java_groovy.jar
  • LibSVM-1.0.6.jar
  • lucene-analyzers-common-4.10.2.jar
  • lucene-core-4.10.2.jar
  • mongodb-driver-3.0.4.jar
  • mongodb-driver-async-3.0.4.jar
  • mongodb-driver-core-3.0.4.jar
  • reflections-0.9.9-RC1.jar
  • stanford-corenlp-3.5.2.jar
  • stanford-corenlp-3.5.2-models.jar
  • twitter4j-async-4.0.4.jar
  • twitter4j-core-4.0.4.jar
  • twitter4j-examples.4.0.4.jar
  • twitter4j-media-support-4.0.4.jar
  • twitter4j-stream-4.0.4.jar
  • weka-dev-3.7.10.jar
  • xom.jar

NOTE: The libraries were skipped from GitHub due to their large size (374MB). Due to high percentage of homegrown libraries that do not exist on the Maven Repository, there was no point in converting this project into MVN.

Java 1.8 and above

(Optionally) Twitter API Developer Keys: A set of developer keys must be obtained from Twitter Developer Site in order to access the Streaming API and retrieve tweets.