Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add words of different language to vaderSentiment #59

Closed
Rishav09 opened this issue Sep 15, 2018 · 2 comments
Closed

Add words of different language to vaderSentiment #59

Rishav09 opened this issue Sep 15, 2018 · 2 comments

Comments

@Rishav09
Copy link

Rishav09 commented Sep 15, 2018

I am trying to write multiple words of Hindi language to vader Sentiment using this

   analyzer=SentimentIntensityAnalyzer()
   new_words={
                          'ग़लतापना':  -2.0,
                          'एकता_का_अभाव':  3.4,
                }
   analyzer.lexicon.update(new_words)

But it is not correctly predicting the new words.

@Rishav09 Rishav09 changed the title Add own words to vaderSentiment Add words of different language to vaderSentiment Sep 26, 2018
@darthvader2
Copy link

anyone working on this issue?

@cjhutto
Copy link
Owner

cjhutto commented Mar 20, 2020

I note @Hiestaa 's excellent response in another issue about guidance for adding new words to the lexicon, and emphasize that the lexicon file is TAB separated (so adding new words and valence score with space-only separations won't work).

The README provides a description of the values in the lexicon.

The vader_lexicon.txt holds the following TAB SEPARATED format:

Token Valence Standard Deviation Human Ratings
(:< -0.2 2.03961 [-2, -3, 1, 1, 2, -1, 2, 1, -4, 1]
amorphous -0.2 0.4 [0, 0, 0, 0, 0, 0, -1, 0, 0, -1]

If you want to follow the same rigorous process as the author of the study, you should find 10 independent humans to evaluate each word you want to add to the lexicon, make sure the standard deviation doesn't exceed 2.5, and take the average rating for the valence. This will keep the file consistent.

Now if you just want to make the algorithm work on these new cases quickly, the standard deviation and human ratings are indeed not necessary. Only the token and valences are used.

Originally posted by @Hiestaa in #28 (comment)

@cjhutto cjhutto closed this as completed Mar 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants