Perform sentiment analysis for Twitter keywords with AWS Comphrehend and Splunk
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
docker
img
lib/db
logs
splunk-app
splunk-config
.dockerignore
.gitignore
CODE_OF_CONDUCT.md
LICENSE
README.md
aws-credentials.txt.example
config.ini.example
docker-compose.yml.example
logging_config.ini
requirements.txt

README.md

Twitter Sentiment Analysis

This app makes use of Twitter's API and AWS Comprehend to get insights about your company, city, event, or convention by way of analyzing tweet sentiment. It further allows drilldown by user, topic, and search string.

Requirements

  • Twitter credentials (these can be created on the fly when running the first script)
  • AWS Credentials
  • Working knowledge of Splunk. There is some documentation here, but existing dashboards should be enough to get you started.
  • Docker (optional)

Screenshots

Installation

Setting up credentials

  • Get your Twitter credentials:
    • docker build -f ./Dockerfile-0-get-twitter-credentials -t 0-get-twitter-credentials . && docker run -v $(pwd):/mnt -it 0-get-twitter-credentials
    • Running this script will walk your through the steps, so it should be straightforward.
  • Make sure you have AWS CLI installed and ran aws configure to enter your AWS credentials.
    • Access to a single S3 bucket and AWS Comprehend will be required
    • I recommend creating an IAM user with the "ComprehendFullAccess" policy as well as an S3 policy tha tlooks like this:
    {
     "Version": "2012-10-17",
     "Statement": [
         {
             "Sid": "VisualEditor0",
             "Effect": "Allow",
             "Action": [
                 "s3:*"
             ],
             "Resource": [
                 "arn:aws:s3:::REPLACE_WITH_YOUR_BUCKET_NAME/*",
                 "arn:aws:s3:::REPLACE_WITH_YOUR_BUCKET_NAME"
             ]
         },
         {
             "Sid": "VisualEditor1",
             "Effect": "Allow",
             "Action": [
                 "s3:ListAllMyBuckets",
                 "s3:HeadBucket"
             ],
             "Resource": "*"
         }
     ]
    }
    

Running the app

Now you can run the app via Docker:

  • cp ~/.aws/credentials aws-credentials.txt
  • Copy docker-compose.yml.example to docker-compose.yml and edit the latter with things like your search string and other values
  • docker-compose up -d

This will start up several Docker containers in the background running various Python scripts and a copy of Splunk. To access Splunk, go to http://localhost:8000/ and log in with default credentials of admin/password. Do not expose this port to the Internet. Use nginx with HTTPS as a proxy if you do!

Exporting the Password file

If you create users and want them to persist between runs, you can export the passwd file with this script:

./bin/export-password-file-from-splunk

Running that will export the passwd file from Splukn and sstore it in splunk-config/passwd. If the container is deleted at any point and then re-run, the passwd file will be copied into Splunk so the users will be able to log back in.

Restoring From Backups

Let's say you were running this app on your desktop/laptop (as one does in Docker...) and you're ready to move the app to a server, how do you take all of your data with you?

This is actually pretty easy--deploy the code to your server but don't start anything. Make sure AWS is configured. Then, run the script ./bin/aws-download-latest-backup. That will connect to AWS and download the latest backup! Then simply rename the file to tweets.db, start up the containers, and you'll be up and running!

Architecture Overview

The following docker containers are used:

  • 1-fetch-tweets
    • Downloads tweets from Twitter with a search string in them.
  • 2-analyze-tweets
    • Sends tweets off to AWS to be analyzed
  • 3-export-tweets
    • Exports analyzed Tweets from the SQLite database to disk, where they can be analyzed
  • 4-splunk
    • Runs Splunk.
  • 4-backup
    • Does regular backups ofthe SQLIte database to AWS S3.

Adding reports to Splunk

Feel free to edit/save new reports in Splunk, they will all show up in the splunk-app/ directory.

Data persistence

Splunk will save its index to splunk-data/ between runs.

Development

To do developmenton a container, first go into docker-compose.yml and uncomment the DEBUG line in environment:. Then run one of these commands:

  • export C="1-fetch-tweets"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose up $C
  • export C="2-analyze-tweets"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose up $C
  • export C="3-export-tweets"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose up $C
  • export C="4-splunk"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose up $C
  • export C="4-backup"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose up $C

With DEBUG enabled in docker-compose.yml, standard output and standard error will be written to the screen instead of to logs/. Note that for 3-export-tweets specifically, this means that tweets will not be written to logfiles and therefore not make it into Splunk.

To attach to the running Splunk instance for troubleshooting:

  • docker-compose exec 4-splunk bash

Interactive Bash Shells

Do you want an interactive bash shell so that you can instead stay in the container and run the script repeatedly? Here, try these commands:

  • export C="1-fetch-tweets"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose run $C bash; docker-compose kill $C
  • export C="2-analyze-tweets"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose run $C bash; docker-compose kill $C
  • export C="3-export-tweets"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose run $C bash; docker-compose kill $C
  • export C="4-splunk"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose run $C bash; docker-compose kill $C
  • export C="4-backup"; docker-compose kill $C && docker-compose rm -f $C && docker-compose build $C && docker-compose run $C bash; docker-compose kill $C

Resetting The App

Run ./bin/stop-and-reset to kill and remove all services and remove all logs files. The data in the database will remain untouched.

Known Issues

Since the SQLite file tweets.db is shared between several containers/services, there may occasionally be some contention, resulting in an error from SQLite that it can't lock tweets.db. When this happens, it triggers an exception in the Python code, causing it to exit, which causes the Docker container to exit as well. It will be restarted in accordance with Docker's restart policy, and the service will resume running. This means that all Tweets will eventually make it into Splunk, but there may be a slight delay if a lot of tweets are being processed.

Credits

TODO

  • Write up some "priming instructions" for large tweet volumes
  • Consider adding in MySQL because of file access conflicts
  • Redo the skip code to be once at the top level of the script.
  • Better timezone handling. EST5EDT does not play nice during standard time.
  • Performance of 3-export-tweets is very bad on OS/X. com.docker.osfx eats up a lot of CPU. See if it's all the writes to SQLite or the log file that are triggering it.

Contact

If there are any issues, feel free to file an issue against this project, hit me up on Twitter or Facebook, or drop me a line: dmuth AT dmuth DOT org.