Skip to content
Naive sentiment analysis in R: sensitive to valence shifters but not relying on punctuation of sentence boundaries
Branch: master
Clone or download
Latest commit 69f943f Mar 3, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lexicon_data
sample_data
.gitignore
LICENSE
README.md
ncs.R
ncs_debugging.R change terminology May 4, 2018
ncs_evaluation.R

README.md

Naive context sentiment analysis

Aim

This R script should address the problem that several sentiment analysis scripts ignore valence shifters (e.g. "hardly difficult", "not great at all"). For a great outline of that issue, you can see trinker's argument and sentimentr package here.

The sentimentr package does a remarkable job in handling valence shifters but it requires 'good' text data that is properly punctuated - because the valence shifter weighting is done on "polarized context clusters" in sentences (i.e., you get one sentiment value per sentence).

Many text data are not suitable in that pipeline because they are

  • not punctuated at all (e.g., auto-generated YouTube transcripts)
  • badly punctuated (e.g., data from blogs where punctuation is not necessarily a given)
  • or because they are very brief: Twitter data, for example, even if properly annotated for sentence-boundary-disambiguation, would return one or two sentiment values.

Why "naive context sentiment analysis"

Our approach is based on the sentimentr idea of creating a "cluster" around sentiments. Within that cluster, we then look for valence shifters (taken from the brilliant lexicon package), weight the original sentiment, and returns a vector of sentiments of the size v (where v = number of tokens that are not punctuation marks).

Our approach does not rely on sentences and punctation and is therefore "naive" towards the broader structure texts.

Note: We are still developing this tool.

Development wish list

  • speed improvements (in particular in the length standardisation, e.g. switch to different discrete cosine transformation or Fourier transformation)
  • multi-dimensionality implementation for other lexicon-based approaches (needed: "lexicon" as function parameter)
  • multi-language support (needs lexicon-databases in different languages)
  • python implementation
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.