Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions
Official code for the interface in
Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions, preprint and early version at 2018 ACL Student Research Workshop.
This is an interactive user interface for creating question-answer pairs that are difficult (adversarial) for computers to answer. The goal is for users to either write unique questions, or to reformulate existing questions, such that they adversarially break a question answering system. The underlying computer system is based on QANTA, a deep-learning Question Answering system. The interface can be naturally extended to other NLP tasks.
There are three main pieces of code that run the service. Each one is described below. There is additional parsing and postprocessing code described at the end.
This is where most of the magic happens.
- The main server is in
- The main HTML code is in
static/js/scripts2.jsis the file that does most of the interfaces functionality
static/answers.jsoncontains all of the possible answers that are system can guess on (extracted from the training data portion of the Quiz Bowl data). This is not fully up to date
evidenceStoreare all folders that store logs about the user's actions and submitted questions. This will be phased out in place of a database.
To run the server with multiple workers, launch it using gunicorn (this command launches 4 parallel workers):
gunicorn --bind 0.0.0.0:7000 adversarial:app --workers 4
The Non-QANTA server
/non_qanta handles certain backend functionality that doesn't involve answering questions or calculating evidence. Originally it did more computation, and it was separated out to put computationally expensive things on a different machine. It can probably be merged into the Main Interface at some point.
Non-QANTA hosts two REST endpoints that the main interface uses.
get_question gets a question from the development set is to be rewritten by the user.
search_answers searches all past questions (from the Quiz Bowl dataset) and returns the top ones based on n-gram overlap with your question.
QANTA is used for answering questions and generating the evidence that is highlighted in the interface. For simplicity, we used the ElasticSearch system for answering questions in QANTA.
QANTA hosts two REST endpoints that the main interface uses.
answer_question. To get QANTA up and running, see that repository. The code to run the correct QANTA services is part of the web app API code.
Parse Logs and Postprocessing
All of the postprocessing from the submitted questions happens in
/parse_logs. This is also where any visualization happens, and how the adversarial questions that were shown in the paper were found.
This code is maintained by Eric Wallace at the University of Maryland. Feel free to open bug report or pull requests. For contact, find my email on my website.
There are some TODOs sprinkled through the codebase. Here are some more general things. Contact Eric for more information
- How can we make rewrittten questions more enticing, if we ever want to do rewrites in the future?
- Add something (probably just a message to email someone) about submitting questions in bulk
- Update the QANTA repo I am using to the latest one, and then merge my changes into it
- After updating, also update answers.json and also possible_answers inside scripts2.js
- The question answering are getting lower cased at some point, need to not do this anymore
- The logging is the world's worst code. Need to make a database (needs to have concurrent reads/write like PostgreSQL, and write the user data to that and retrieve it
Post Processing of Data
- I added filters for vulgar words, which accidently caught some words like Rape. Make sure this isn't happening for real questions (not spam)
- Some of the questions are very close to duplicates. For example, they changed "10 points" to "ten points". See if you can filter these out (probably check n-gram similarity and then filter out by hand)
- Make all the questions say "For 10 points" or whatever is said in the Quiz Bowl data. I think Pedro has tried to be consistent about this (he might have scripts already for this). i.e. "for the points" -> "for 10 points" or "ftp" -> "for 10 points"
- Make sure not to show any questions that are labeled as "private"
Weird Things To Note
- references to
- Everything uses
**for delimiting between different fields. JSON should be used instead =(
- The highlighting functionality is an absolute mess. Think you should contact me if you want to change that
- editID is -1 when your not editing. Set when you are.