Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logic to redact question texts #13

Merged
merged 3 commits into from May 11, 2017

Conversation

Projects
None yet
2 participants
@c-w
Copy link
Contributor

commented Mar 26, 2017

To give enough context for the answer, the redaction algorithm does not redact context words. However, this sometimes leads to nothing at all being redacted, e.g. for the question "an Earth year is about 365.26 years long", the correct answer "365.26" will be shortened to "36526" and then we'll fail to redact that token in the question text.

This problem was, for example, reported in #9. After applying this patch, the situation described in that issues is fixed:

image

This patch also fixes another failure mode related to context words: we may have a question like "the atmosphere is made up of ?% CO2" and one of the answers is "52%". That's too easy! After this change, the correct answer in this example would be redacted to "52" making the question less easy to answer.

c-w added some commits Mar 26, 2017

Redact contextual clues in number answers
For number answers, the redaction algorithm in the question only redacts
the numeric values, leaving contextual values like the percent sign to
make it easier to answer the question. However, the same logic was not
applied to the correct answer which means that we can guess the correct
answer too easily. For example, we may have a question like "the
atmosphere is made up of ?% CO2" and one of the answers is "52%". That's
too easy! After this change, the correct answer in this example would be
redacted to "52" making the question less easy to answer.
Consider number separators in redaction algorithm
To give enough context for the answer, the redaction algorithm does not
redact context words. However, this sometimes leads to nothing at all
being redacted, e.g. for the question "an Earth year is about 365.26
years long", the correct answer "365.26" will be shortened to "36526"
and then we'll fail to redact that token in the question text.

This patch white-lists comma and dot so that we cover a lot of the
common failure cases like "1,000,000" or "3.14"

Fixes #9

@alexgreene alexgreene merged commit bf625c9 into alexgreene:master May 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.