Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when calling SentimentIntensityAnalyzer #12

Closed
aWildRiceHasAppeared opened this issue May 18, 2016 · 5 comments
Closed

Comments

@aWildRiceHasAppeared
Copy link

aWildRiceHasAppeared commented May 18, 2016

Hi all

I've just been trying to learn how to use the SentimentIntensityAnalyzer() and I've come up with the problem where:

analyzer = SentimentIntensityAnalyzer()
 ---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-31-6c626c4ef428> in <module>()
----> 1 analyzer = SentimentIntensityAnalyzer()
      2 analyzer.polarity_score(line_first)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site
packages/nltk/sentiment/vader.pyc in __init__(self, lexicon_file)
    200     def __init__(self, lexicon_file="sentiment/vader_lexicon.zip/vader_lexicon/vader_lexicon.txt"):
    201         self.lexicon_file = nltk.data.load(lexicon_file)
--> 202         self.lexicon = self.make_lex_dict()
    203 
    204     def make_lex_dict(self):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/sentiment/vader.pyc in make_lex_dict(self)
    208         lex_dict = {}
    209         for line in self.lexicon_file.split('\n'):
--> 210             (word, measure) = line.strip().split('\t')[0:2]
    211             lex_dict[word] = float(measure)
    212         return lex_dict

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in next(self)
    697 
    698         """ Return the next decoded line from the input stream."""
--> 699         return self.reader.next()
    700 
    701     def __iter__(self):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in next(self)
    628 
    629         """ Return the next decoded line from the input stream."""
--> 630         line = self.readline()
    631         if line:
    632             return line

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in readline(self, size, keepends)
    543         # If size is given, we call read() only once
    544         while True:
--> 545             data = self.read(readsize, firstline=True)
    546             if data:
    547                 # If we're at a "\r" read one extra character (which might

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in read(self, size, chars, firstline)
    490             data = self.bytebuffer + newdata
    491             try:
--> 492                 newchars, decodedbytes = self.decode(data, self.errors)
    493             except UnicodeDecodeError, exc:
    494                 if firstline:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xde in position 0: invalid continuation byte

I've read the thread with a similar issue however, I dont quite understand where to add the 'u' to make the string unicode. I've only did: analyzer = SentimentIntensityAnalyzer()

can someone help me?

@mrbeann
Copy link

mrbeann commented Sep 8, 2016

Just use NLTK can help this problem. or you may want to use something like text = str(text.encode('utf-8'))

@cjhutto
Copy link
Owner

cjhutto commented Dec 13, 2016

Thanks! The new update (with new pip install) has better compatibility support for Python 3, which solves many of the encoding/decoding issues.

@cjhutto cjhutto closed this as completed Dec 13, 2016
@durakkerem
Copy link

As I described in below question;
https://stackoverflow.com/questions/46395206/vader-is-throwing-unicodedecodeerror-on-api-init

I get UnicodeDecodeError when calling the API initializer.

I am using the latest pip version of the library on Python 3.

@HalaKuwatly
Copy link

HalaKuwatly commented Jan 25, 2018

i'm also getting the same error with Python3

@panyi121
Copy link

I am using docker to run the vaderSentiment:

python: 3.6, pip: 9.0.1 (latest), ubuntu: 16.04, but I still get the same error. Anyone has a good suggestion?

Dockerfile:
FROM ubuntu:16.04

RUN apt-get update -y
RUN apt-get install -y software-properties-common vim
RUN add-apt-repository ppa:jonathonf/python-3.6
RUN apt-get update
RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
RUN apt-get install -y nginx supervisor gcc g++

update pip

RUN python3.6 -m pip install pip --upgrade
#RUN python3.6 -m pip install wheel

Setup flask application

RUN mkdir -p /deploy/app
COPY app /deploy/app
RUN python3.6 -m pip install raderSentiment==2.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants