Span with surrounding quotation marks appears as if without ones #1033

cjer · 2018-08-08T21:23:19Z

Span annotation with beginning or closing quotation mark (and I believe other punctuation marks as well) appears as if the punctuation is not included in the mark. This made annotators miss a lot of these small border issues.

Examples:

"הראל" will visually appear the same as הראל, but as you can see they are annotated differently, as recognized by the curation interface.

Same goes for חטיבת "הראל" and חטיבת "הראל

The text was updated successfully, but these errors were encountered:

reckart · 2018-08-08T21:34:06Z

Sounds like an edge-case. I assume the quotes are not unicode RTL characters, right? Can you verify whether the quotes are detected as separate tokens? So "הראל" should consist of three tokens. You could check that e.g. by exporting the data as TSV and seeing whether ", הראל and " all appear on separate lines. If they are, then we have an unknown bug. If they are not and this is a mixed RTL-LTR token, then it sounds like a known bug (#283).

reckart · 2018-08-08T21:35:10Z

Actually, re-reading #283 it sounds like the same thing.

cjer · 2018-08-08T22:25:57Z

Yes these are regular ASCII quotation marks. And yes, they are recognized as separate tokens.

reckart added RTL 🐛Bug Something isn't working labels Aug 8, 2018

reckart added this to the Bug backlog milestone Sep 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Span with surrounding quotation marks appears as if without ones #1033

Span with surrounding quotation marks appears as if without ones #1033

cjer commented Aug 8, 2018 •

edited

reckart commented Aug 8, 2018

reckart commented Aug 8, 2018

cjer commented Aug 8, 2018

Span with surrounding quotation marks appears as if without ones #1033

Span with surrounding quotation marks appears as if without ones #1033

Comments

cjer commented Aug 8, 2018 • edited

Examples:

reckart commented Aug 8, 2018

reckart commented Aug 8, 2018

cjer commented Aug 8, 2018

cjer commented Aug 8, 2018 •

edited