TwitterFeelings

TwitterFeelings is a Twitter sentiment analysis engine. It streams live statuses from Twitter and categorizes them according to their mood (positive or negative). It is written in Elixir, runs on Erlang VM and uses Redis for storage.

It is partly based on Twitter Sentiment Classification using Distant Supervision described in this Stanford paper.

TwitterFeelings is composed of 3 parts :

corpus builder: uses Twitter search API to build a very large set of tweets that will feed the sentiment analyzer, to make it learn.
learning: based on a corpus of tweets, computes each word's probability to appear in a positive and in a negative tweet.
streaming machine : uses Twitter streaming API to get live tweets on a specific topic, and updates Redis mood counters accordingly.

Setup

install Elixir / Erlang / Redis
install dependencies mix deps.get
build script with mix escript.build command
have following environment variables declared:
- TWITTER_CONSUMER_KEY
- TWITTER_CONSUMER_SECRET
- TWITTER_ACCESS_TOKEN
- TWITTER_ACCESS_TOKEN_SECRET

Corpus Builder

This application runs thousand of queries on Twitter Search API, to build a large corpus of Tweets that we will able to analyze later. Since we need to know if each retrieved status is either positive or negative, we will use Stanford approach (see link in intro) to categorize Twitter statuses according to the smileys they contain.

Each tweet retrieved is:

filtered (we don't keep biased twitters containing both positive and negative smileys)
normalized (downcased, stripped of urls/usernames/accents/smileys/short words/...)
stored in a dedicated Redis set

To build a large corpus of french positive and negative tweets, run the following commands:

./twitter_feelings build-corpus --lang fr --mood positive
./twitter_feelings build-corpus --lang fr --mood negative

It will take hours since the application has to deal with Twitter rate limitations (450 queries per 15mn). Each query fetches 100 tweets at once.

Learning

Live Tweets will a be categorized using the Naive Bayesian approach. The great benefit of this method is that a lot of stuff can be precomputed making the live computation very fast. The purpose of this module is to precompute (from existing tweets corpus) the probability for each monogram (ie word) to appear in either positive or negative messages.

Once two corpus have been built (both negative and positive moods on a single language), run the learning module with this command:

./twitter_feelings learn --lang fr

On my computer (4ghz i7), it takes about 10mn to learn from 2 x 500.000 tweets.

Streaming Machine

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
config		config
lib		lib
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TwitterFeelings

Setup

Corpus Builder

Learning

Streaming Machine

About

Releases

Packages

Languages

License

cblavier/twitter-feelings

Folders and files

Latest commit

History

Repository files navigation

TwitterFeelings

Setup

Corpus Builder

Learning

Streaming Machine

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages