Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logic to redact question texts #13

Merged
merged 3 commits into from May 11, 2017
Merged

Improve logic to redact question texts #13

merged 3 commits into from May 11, 2017

Conversation

@c-w
Copy link
Contributor

@c-w c-w commented Mar 26, 2017

To give enough context for the answer, the redaction algorithm does not redact context words. However, this sometimes leads to nothing at all being redacted, e.g. for the question "an Earth year is about 365.26 years long", the correct answer "365.26" will be shortened to "36526" and then we'll fail to redact that token in the question text.

This problem was, for example, reported in #9. After applying this patch, the situation described in that issues is fixed:

image

This patch also fixes another failure mode related to context words: we may have a question like "the atmosphere is made up of ?% CO2" and one of the answers is "52%". That's too easy! After this change, the correct answer in this example would be redacted to "52" making the question less easy to answer.

c-w added 3 commits Mar 26, 2017
For number answers, the redaction algorithm in the question only redacts
the numeric values, leaving contextual values like the percent sign to
make it easier to answer the question. However, the same logic was not
applied to the correct answer which means that we can guess the correct
answer too easily. For example, we may have a question like "the
atmosphere is made up of ?% CO2" and one of the answers is "52%". That's
too easy! After this change, the correct answer in this example would be
redacted to "52" making the question less easy to answer.
To give enough context for the answer, the redaction algorithm does not
redact context words. However, this sometimes leads to nothing at all
being redacted, e.g. for the question "an Earth year is about 365.26
years long", the correct answer "365.26" will be shortened to "36526"
and then we'll fail to redact that token in the question text.

This patch white-lists comma and dot so that we cover a lot of the
common failure cases like "1,000,000" or "3.14"

Fixes #9
@alexgreene alexgreene merged commit bf625c9 into alexgreene:master May 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants