Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling quotes #58

Closed
jofatmofn opened this issue Jul 12, 2017 · 4 comments
Closed

Handling quotes #58

jofatmofn opened this issue Jul 12, 2017 · 4 comments

Comments

@jofatmofn
Copy link

Given the text
John said, "Welcome to the heaven".
rrp.simple_parse gives
(S1 (S (NP (NNP John)) (VP (VBD said) (, ,) (`` ``) (INTJ (UH Welcome) (PP (TO to) (NP (DT the) (NN heaven)))) ('' '')) (. .)))
If I use rrp.parse_tagged with the following tokens and postags

tokens=[u'John', u'said', u',', u'"', u'Welcome', u'to', u'the', u'heaven', u'"', u'.']
postags={0: u'NNP', 1: u'VBD', 2: u',', 3: u'``', 4: u'UH', 5: u'TO', 6: u'DT', 7: u'NN', 8: u"''", 9: u'.'}

it returns an empty list.

Workaround: In tokens, if I change the beginning double quotes to two backticks and ending double quotes to two apostrophe, as
tokens=[u'John', u'said', u',', u'``', u'Welcome', u'to', u'the', u'heaven', u"''", u'.']
then it works.

@dmcc
Copy link
Member

dmcc commented Jul 12, 2017 via email

@jofatmofn
Copy link
Author

Sure. Thanks. Could you please direct me to any reference (document or code) which highlights such replacements. I need to use tokens and postags from another parser and I can apply these before calling BLLIP.

@dmcc
Copy link
Member

dmcc commented Jul 17, 2017 via email

@jofatmofn
Copy link
Author

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants