Incorrect file reading mode and offset in Conll.py #21

userofgithub1 · 2018-09-07T16:09:43Z

Hi,

I noticed in the file opening line in Conll.py the mode is incorrect it should be 'rb':

with open(path, 'rd') as f:
            doc_id = None
            doc_tokens = None

Also the calculation of the mentions positions is completely incorrect when both only reading the dataset and after linking.

The incorrect mention offsets is probably caused by these lines in class Conll.py :

begin = sum(len(t)+1 for t in doc_tokens)
dodgy_tokenisation_bs_offset = 1 if re.search('[A-Za-z],',parts[2]) else 0
position = (begin, begin + len(parts[2]) + dodgy_tokenisation_bs_offset)

Hope this is helpful and the files are edited :)
Thanks :)

The text was updated successfully, but these errors were encountered:

This was referenced Sep 8, 2018

idf and tfidf models andychisholm/sift#14

Closed

prepare-conll-coref does not convert AIDA-YAGO2-dataset wikilinks/neleval#45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect file reading mode and offset in Conll.py #21

Incorrect file reading mode and offset in Conll.py #21

userofgithub1 commented Sep 7, 2018 •

edited

Loading

Incorrect file reading mode and offset in Conll.py #21

Incorrect file reading mode and offset in Conll.py #21

Comments

userofgithub1 commented Sep 7, 2018 • edited Loading

userofgithub1 commented Sep 7, 2018 •

edited

Loading