Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabling existing international nrc lexicon in get_sentiments() #169

Open
LeWaHe opened this issue Apr 30, 2020 · 3 comments
Open

enabling existing international nrc lexicon in get_sentiments() #169

LeWaHe opened this issue Apr 30, 2020 · 3 comments
Labels
feature a feature request or enhancement

Comments

@LeWaHe
Copy link

LeWaHe commented Apr 30, 2020

Hi,
I love learning tidytext but was a bit surprised to see that the get_sentiments() function does not allow to use the non-english translations included within the Nov 2017 nrc lexicon v.092 xlsx file used by tidytext(english words are in column A, and are translated in dozens of languages from columns B to DA while DB to DK list the polarity and sentiment scores for each word). It would be amazing to add an argument to define which language (column) to use from the nrc lexicon i.e lang="French".
Thanks,
Leonard

@juliasilge
Copy link
Owner

The NRC-Emotion-Lexicon.zip file that is currently downloaded via the function in the textdata package does include that .xlsx file you are mentioning. Using these translations is within the permission we have from the lexicon creators, although of course translated sentiment lexicons can be less reliable.

@EmilHvitfeldt do you want to consider this in textdata?

@EmilHvitfeldt
Copy link
Contributor

I'm on it!

@LeWaHe
Copy link
Author

LeWaHe commented Apr 30, 2020

Thank you for your answers, great to know using the translations is within the permissions from the lexicon creators. I concur that using translated lexicons is less reliable than a natively created one. However, (i) for analyses comparing corpora spanning across different languages a single lexicon would be more reliable than a patchwork of different lexicons (ii) many languages, spoken by millions of people still lack reliable native lexicons. Thanks

@juliasilge juliasilge added the feature a feature request or enhancement label Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants