This repository includes code and data for the CHI 2020 submission Answering Questions about Charts and Generating Visual Explanations.
We include the code for the 3-stage pipeline for automatically answering questions about a chart specified in Vega-Lite and providing explanations about how the answer was obtained. Each of the three stages of the pipeline can be run in separation.
- Sempre
- Recompiled with
Server.javafrom our code - With parameters from Macro Grammars and Holistic Triggering for Efficient Semantic Parsing (Zhang et al.) (available here)
- Run on port
8400
- Recompiled with
- Stanford CoreNLP
- Run on port
9000.
- Run on port
- Word2Vec
- With trained vectors on the Google News dataset from Distributed Representations of Words and Phrases and their Compositionality (Mikolov et al.) (available here)
- Run
word2vec.pyon port5005.
- Python packages (Requires Python3)
- flask
- nltk
- bidict
- gensim
Note: the code has only been tested on Google Chrome.
- Set
curr_visualizationparameter injs/index.jsto point to the desired visualization. - Run local host on the root code directory and open
index.html - Extracted table will show underneath the chart. Save the table as CSV for next steps. (You may want to change line 153 of
js/index.jsto save to a file instead of printing to console.)
- Run
QAServer.py. You may have to modify the directories the file depending on where you have the relevant data. This will run on port5000. - You can provide:
sessionId,question_id,dataset_name,spec_file_name,runtime_file_name(name of the CSV table extracted from Stage 1) to localhost:5000/query-vis-sempre with GET method to obtain the converted question, answer given by system, and the lambda expression.
- Run
GenerateExplanation.pywith the CSV including the lambda expression and the chart metadata.
We include the 52 Vega-Lite charts and the 629 human-generated questions, along with 748 explanations they gave for their answers. We collected these questions and explanations through Amazon Mechanical Turk, and manually annotated each of the questions with the correct answers. We used these questions to evaluate our automatic pipeline.
chart-list.json includes the metadata about the 52 charts present in the dataset. This includes the title of the chart, as well as the dataset name and the name of the Vega-Lite specification file.
Inside the dataset directory, the Vega-Lite specification files can be found in [dataset-name]/specs/[file-name].
qadata.json includes the human-generated questions with correct answers, and answer-explanation pairs generated by humans.
For each question id, the question, the correct answer, and the turkers' answers and explanations (if the provided explanation was meaningful) are given, along with the metadata about the chart that the question and the explanations refer to.