Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于step2编码问题 #9

Closed
betterwater opened this issue Mar 30, 2022 · 4 comments
Closed

关于step2编码问题 #9

betterwater opened this issue Mar 30, 2022 · 4 comments

Comments

@betterwater
Copy link

betterwater commented Mar 30, 2022

最近拜读了论文,尝试运行时,step2一直报utf-8编码问题,尝试了网上大多数修改方法,仍没有解决,请问有办法破吗(悲)

  • def _read_tsv(cls, input_file, quotechar=None):
    
  •     """Reads a tab separated value file."""
    
  •     with open(input_file, "r", encoding="utf-8") as f:
    
  •         reader = csv.reader(f, delimiter="\t", quotechar=quotechar)
    
  •         lines = []
    
  •         for line in reader:
    
  •             if sys.version_info[0] == 2:
    
  •                 line = list(str(cell) for cell in line) 
    
  •             lines.append(line)
    
  •         return lines
    
@blhoy
Copy link
Collaborator

blhoy commented Mar 30, 2022

可以看一下报错的信息是什么,可能是数据编码格式变了?

@betterwater
Copy link
Author

可以看一下报错的信息是什么,可能是数据编码格式变了?

这是运行时候报的错。
Traceback (most recent call last):
File "F:/acos/ACOS/Extract-Classify-ACOS/run_step2.py", line 351, in
main()
File "F:/acos/ACOS/Extract-Classify-ACOS/run_step2.py", line 174, in main
eval_examples = processor.get_dev_examples(args.data_dir, args.domain_type)
File "F:\acos\ACOS\Extract-Classify-ACOS\run_classifier_dataset_utils.py", line 208, in get_dev_examples
self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_test_pair_1st.tsv")), "test")
File "F:\acos\ACOS\Extract-Classify-ACOS\run_classifier_dataset_utils.py", line 127, in _read_tsv
for line in reader:
File "F:\anaconda\envs\ACOS\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 2954: invalid start byte

@blhoy
Copy link
Collaborator

blhoy commented Mar 31, 2022

这块我测试了没有这个问题,应该就是遇到解码不了的字符了,或许可以试着按不同编码另存一下输入数据文件?

@betterwater
Copy link
Author

好的,我试试,麻烦你了

@blhoy blhoy closed this as completed Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants