Update paper from overleaf.

Rhoana · Aug 15, 2018 · 1e8d1b3 · 1e8d1b3
1 parent f4bb253
commit 1e8d1b3
Show file tree

Hide file tree

Showing 348 changed files with 6,004 additions and 289 deletions.
diff --git a/PAPER/00_introduction.tex b/PAPER/00_introduction.tex
@@ -1,16 +1,29 @@
 \firstsection{Introduction}
-
 \maketitle
+\vspace{-0.1cm}
+
+%One fundamental application of human vision is to understand data visualizations. This is a task unlike natural image processing but which includes the abstraction of real-world objects and their effects into data, represented with visual marks. As a field, visualization catalogues and evaluates human perception of these marks, such as in the seminal \emph{graphical perception} experiments of Cleveland and McGill~\cite{cleveland_mcgill}. This work describes nine elementary perceptual reasoning tasks, such as position relative to a scale, length, angle, area, and shading density, plus orders their reasoning difficulty.
+
+Convolutional neural networks (CNNs) have been successfully applied to a wide range of visual tasks, most famously to natural image object recognition~\cite{simonyan_very_deep2014, szegedy2015}, for which some claim equivalent or better than human performance. This performance comparison is often motivated by the idea that CNNs model or reproduce the early layers of the human visual cortex, even though they do not incorporate many details of biological neural networks or model higher-level abstract or symbolic reasoning~\cite{yamins2016using, hassabis2017neuroscience, human_vs_machine_vision}. While CNN techniques were originally inspired by neuroscientific discoveries, recent advances in processing larger datasets with deeper networks have been the direct results of engineering efforts. Throughout this significant advancement, researchers have aimed to understand why and how CNNs produce such high performance~\cite{deeplearning_blackbox2017}, \change{with recent works targeting the systematic evaluation of the limits of feed-forward convolutional neural networks for both image recognition problems \cite{Azulay2018} and for visual relation problems~\cite{clevr, not_so_clevr}.}
+
+In visualization, there is increasing research interest in the computational analysis of graphs, charts, and visual encodings~\cite{maneesh_deconstructing_d3,Pineo2012_computational_perception,kafle2018dvqa}, for applications like information extraction and classification, visual question answering (``computer, which category is greater?''), or even design analysis and generation~\cite{Viegas2016}. One might turn to a CNN for these tasks. However, computational analysis of visualizations is a more complex task than natural image classification~\cite{Kahou2018}, requiring the identification, estimation, and relation of visual marks to extract information. For instance, we take for granted the human ability to generalize an understanding of length to a previously unseen chart design, or to estimate the ratios between lengths, yet for a CNN these abilities are in question, with no clear mechanism for concept abstraction.
+
+% this is a nicely readable motivation!
+Our goal is to better understand the abilities of CNNs for visualization analysis, and so we investigate the performance of current off-the-shelf CNNs on visualization tasks and show what they can and cannot accomplish. As computational visualization analysis is predicated upon an understanding of elementary perceptual tasks, we consider the seminal \emph{graphical perception} settings of Cleveland and McGill~\cite{cleveland_mcgill}. This work describes nine reasoning tasks, such as position relative to a scale, length, angle, area, and shading density, and measures human graphical perception performance on bar and pie chart quantity estimation. We reproduce Cleveland and McGill's settings with four different neural network designs of increasing sophistication (MLP, LeNet, VGG, and Xception), and compare their performance to human graphical perception. For this task, we collect new human measures for each elementary task, for the bars and frames rectangles setting, and for a Weber's law point cloud experiment. Further, as CNNs trained on natural images are said to mimic early human vision, we investigate whether using pre-trained natural image weights (via ImageNet~\cite{imagenet}) or weights trained from scratch on elementary graphical perception tasks produces more accurate predictions.
+
+%  and greater generalization
+
+%To perform this evaluation, we parametrize the elementary perceptual tasks and experiments suggested by Cleveland and McGill~\cite{cleveland_mcgill}, and define a set of regression tasks to estimate continuous variables. 
+%We pit four neural networks against human perception: a three-layer multilayer perceptron (MLP), the LeNet 2-layer CNN~\cite{lenet}, the VGG 16-layer CNN~\cite{simonyan_very_deep2014}, and the Xception 36-layer CNN~\cite{xception}. 
 
-Convolutional neural networks (CNNs) have been successfully applied to a wide range of visual tasks, most famously to natural image object recognition~\cite{krizhevsky_imagenet2012, simonyan_very_deep2014, szegedy2015}, for which some claim equivalent or better than human performance. This performance comparison is often motivated by the idea that CNNs model or reproduce the early layers of the visual cortex, even though they do not incorporate many details of biological neural networks or model higher-level abstract or symbolic reasoning~\cite{yamins2016using, hassabis2017neuroscience, human_vs_machine_vision}. While CNN techniques were originally inspired by neuroscientific discoveries, recent advances in processing larger datasets with deeper networks have been the direct results of engineering efforts. Throughout this significant advancement, researchers have aimed to understand why and how CNNs produce such high performance~\cite{goodfellow_book, deeplearning_blackbox2017}, with recent works targeting the systematic evaluation of the visual perception limits of CNNs~\cite{clevr, not_so_clevr}.
+%With these experiments, we describe a ranking defining the ease with which our tested CNN architectures can estimate elementary perceptual tasks, equivalent to Cleveland and McGill's ranking for human perception. 
 
-One fundamental application of human vision is to understand data visualizations. This is a task unlike natural image processing but includes the abstraction of real-world objects and their effects into data, represented with visual marks. As a field, visualization catalogues and evaluates human perception of these marks, such as in the seminal \emph{graphical perception} experiments of Cleveland and McGill~\cite{cleveland_mcgill}. This work describes nine elementary perceptual reasoning tasks, such as position relative to a scale, length, angle, area, and shading density, plus orders their reasoning difficulty. But, with increasing research interest in the machine analysis of graphs, charts, and visual encodings, it seems pertinent to question whether CNNs are able to process these basic graphical elements and derive useful measurements from the building blocks of information visualization.
 
-As such, we reproduce Cleveland and McGill's human perceptual experiments with CNNs, and discuss to what extent they have `graphical perception'. To perform this evaluation, we parametrize the elementary perceptual tasks and experiments suggested by Cleveland and McGill~\cite{cleveland_mcgill}, and define a set of regression tasks to estimate continuous variables. Against human perception, we pit four neural networks: a three-layer multilayer perceptron (MLP), the LeNet 2-layer CNN~\cite{lenet}, the VGG 16-layer CNN~\cite{simonyan_very_deep2014}, and the Xception 36-layer CNN~\cite{xception}. As CNNs trained on natural images are said to mimic layers of the human visual cortex, we investigate whether using weights trained on natural images (via ImageNet~\cite{imagenet}) or weights trained from scratch on elementary graphical perception tasks produces more accurate measurements and greater generalization.
+%We test these four networks across four scenarios presented by Cleveland and McGill~\cite{cleveland_mcgill}: 1) Nine elementary perceptual tasks with increasing parameteric complexity, e.g., length estimation with fixed x, then with varying x and y, then with varying width, including cross-network evaluations testing the generalizability of networks to unseen parameters; 2) The position-angle experiment, which compares judgments of bar charts to pie charts; 3) The position-length experiment, which compares grouped and divided bar charts; and 4) The bars and framed rectangles experiments, where visual cues aid ratio perception. We also investigate whether our CNNs can detect a proportional change in a measurement across scales in relation to Weber's law.
 
-We test these four networks across four scenarios presented by Cleveland and McGill~\cite{cleveland_mcgill}: 1) Nine elementary perceptual tasks with increasing parameteric complexity, e.g., length estimation with fixed x, then with varying x, then with varying width, including cross-network evaluations testing the generalizability of networks to unseen parameters; 2) The position-angle experiment, which compares judgements of bar charts to pie charts, 3) The position-length experiment, which compares grouped and divided bar charts, and 4) The bars and framed rectangles experiments, where visual cues aid ratio perception. We also investigate whether our CNNs can detect a proportional change in a measurement across scales, in relation to Weber's law.
+First, we find that CNNs can more accurately predict quantities than humans for nine elementary perceptual tasks, but only if their training data includes similar stimuli. Second, that our networks can estimate bar chart lengths and pie segment angles with accuracy similar to humans, and that our networks trained more easily on bar charts. Third, that our networks were largely unable to estimate length ratios across five bar chart designs, unlike humans, and that bar chart design type had largely no effect on CNN performance but a significant effect on human performance. Fourth, that framing bars make it no easier for our CNNs to estimate just noticeable differences in length, unlike for humans. Fifth, that some CNNs can solve a difficult Weber's law problem that is beyond human ability. Practically, we find that networks trained from scratch on visualizations perform better than using pre-trained natural images weights, and that current Xception networks perform worse than older VGG networks.
 
-With these experiments, we describe a ranking defining the ease with which our tested CNN architectures can estimate elementary perceptual tasks, as an equivalent to Cleveland and McGill's ranking for human perception. Further, we discuss the implications of our results and derive recommendations for the use of CNNs in perceiving visualizations. We accompany this paper with open source code and our input and results data, both to enable reproduction studies and to spur new machine perception systems more adept at graphical perception: \url{http://rhoana.org/perception}
+A second goal of our work is to help frame the visualization community's discussion of CNNs as it builds towards future computational visualization analysis applications. For this, our findings suggest that CNNs are not currently a good model for human graphical perception, and as such their application to specific visualization problems may be possible but needs care. We discuss this in more detail with respect to designing CNNs for analyzing visualizations. Further, toward this goal, we accompany this paper with open source code and data, both to enable reproduction studies and to spur new machine perception systems more adept at graphical perception: \url{http://vcglab.org/perception}
 
 
 %Cleveland and McGill's work in the 1980s has led to many insights for modern information visualization research such as the identification of elementary perceptional tasks or that bar charts are easier to understand than pie charts.