A visualization interface for analyzing a (very large) corpus of natural-language queries.
Clone or download
Latest commit e1d09da Sep 27, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
syntaviz Renamed code folder Sep 5, 2018
.gitignore New Repository Aug 24, 2018
CONTRIBUTING.md Update and rename CONTRIBUTING to CONTRIBUTING.md Sep 5, 2018
LICENSE New Repository Aug 24, 2018
NOTICE Fixed copyright year Aug 28, 2018
README.md Update README.md Sep 27, 2018
screenshot.png Changed file format Sep 27, 2018
setup.cfg New Repository Aug 24, 2018
setup.py Update flask Sep 11, 2018


SyntaViz is a visualization interface specifically designed for analyzing a large number of natural-language queries. SyntaViz provides a platform for browsing the ontology of user queries from a syntax-driven perspective, providing quick access to high-impact failure points of the existing intent understanding system and evidence for data-driven decisions in the development cycle.

For more details, see our demo paper "SyntaViz: Visualizing Voice Queries through a Syntax-Driven Hierarchical Ontology" at EMNLP 2018: http://emnlp2018.org/program/accepted/demos


Outline of the code

  • filter_query.py: Implements all the necessary functions for processing the raw data to smaller and more manageable files. It has functions for filtering and sorting the queries based on language model-based scores.
  • parse_query.py: Parses a list of queries and outputs a list of dependency parse trees. It assumes tensorflow/syntaxnet environment.
  • cluster_query.py: Builds hierarchical clusters from the (dependency) parsed queries. It has functionalities to navigate into the clusters and show the contents.
  • syntaviz.py: Reads the hierarchical clusters from file and displays them dynamically in a web interface.
  • templates/ Contains the html skeleton for the SyntaViz server.

Logical sequence of the codes

[for preparing data]                      
[for parsing queries] 
[for creating clusters]
[for creating server]

Running SyntaViz

Define variables:


Running SyntaViz on a corpus of queries

0. Set up environment

Start container with SyntaxNet: docker run --rm --name syntaviz-parser -it -e CODEDIR=$CODEDIR -e DATADIR=$DATADIR -v $CODEDIR:$CODEDIR -v $DATADIR:$DATADIR -p 9030:8888 tensorflow/syntaxnet /bin/bash

Install Syntaviz:

pip install --upgrade setuptools
python setup.py install

1. Prepare data in the following format

  • queries: A text file with each line representing one query in following format: ID\tquery\tlogProb\tlogFreq\tCount


0       i wanna change my plans its to high     1.0     1.0     1
1       please email me an alarm certificate showing that our services are current and active. 1.0     1.0     1
2       cant send outgoing email        1.0     1.0     1
  • actions.pkl: A pkl file that contains a single mapping (dict object) with key=query value=action

2. Parse queries

cd /opt/tensorflow/syntaxnet
mkdir $DATADIR/parsed
python -m syntaviz.parse_query $DATADIR/queries $DATADIR/parsed/part >& parse-queries.log 2>&1 &
cat $DATADIR/parsed/part* > $DATADIR/parsed.txt

At this point, $DATADIR/parsed.txt should have the same number of lines as $DATADIR/queries.

3. Start SyntaViz server

python -m syntaviz.syntaviz $DATADIR/queries $DATADIR/parsed.txt $DATADIR/actions.pkl $PORT