'str' object has no attribute 'decode' #17

asingh9530 · 2018-09-18T17:15:03Z

When i tried running
python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt
--out_file preprocessed_movie_lines.txt

it gives me error
python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt --out_file preprocessed_movie_lines.txt
/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "preprocessors/preprocess_movie_dialogs.py", line 24, in
tf.app.run()
File "/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "preprocessors/preprocess_movie_dialogs.py", line 18, in main
s = dialog_line.strip().lower().decode("utf-8", "ignore")
AttributeError: 'str' object has no attribute 'decode'

But this is obvious as each line is string but if i remove decode then it dosen't working.

The text was updated successfully, but these errors were encountered:

aayushee · 2019-03-06T08:40:38Z

I was also getting this error. I ran the script with python2 and it worked.

micooke · 2019-04-04T00:37:20Z

The decode is in the str instantiation which takes a bytes object. First you need to encode the raw string as utf-8, then pass it through str like so...
s = str(bytes(dialog_line.strip().lower(), "utf-8"), "utf-8", "ignore")
...or just ignore the whole thing if your string is utf-8 anyways
s = dialog_line.strip().lower()

lipsajohny · 2020-02-11T07:40:50Z

When trying the above solution another error pops. error:

Traceback (most recent call last):
File "preprocessors/preprocess_movie_dialogs.py", line 23, in
tf.app.run()
File "/home/lipsa/anaconda3/envs/iia/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/lipsa/anaconda3/envs/iia/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/lipsa/anaconda3/envs/iia/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "preprocessors/preprocess_movie_dialogs.py", line 15, in main
for line in raw_data:
File "/home/lipsa/anaconda3/envs/iia/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 3767: invalid start byte

d4buss · 2021-06-17T14:41:16Z

This seemed to get it working for me

def main(_):
    ##add rb here
    with open(FLAGS.raw_data, "rb") as raw_data, \
            open(FLAGS.out_file, "w") as out:
        for line in raw_data:
            line = str(line)
            parts = line.split(" +++$+++ ")
            dialog_line = parts[-1]
           # modify this line to match below
            s = ''.join((c for c in str(dialog_line.strip().lower()) if ord(c) < 128))
            preprocessed_line = " ".join(nltk.word_tokenize(s))
            out.write(preprocessed_line + "\n")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'str' object has no attribute 'decode' #17

'str' object has no attribute 'decode' #17

asingh9530 commented Sep 18, 2018

aayushee commented Mar 6, 2019

micooke commented Apr 4, 2019

lipsajohny commented Feb 11, 2020

d4buss commented Jun 17, 2021 •

edited

Loading

'str' object has no attribute 'decode' #17

'str' object has no attribute 'decode' #17

Comments

asingh9530 commented Sep 18, 2018

aayushee commented Mar 6, 2019

micooke commented Apr 4, 2019

lipsajohny commented Feb 11, 2020

d4buss commented Jun 17, 2021 • edited Loading

d4buss commented Jun 17, 2021 •

edited

Loading