Camera ready fixes

ChrisCummins · Jul 18, 2017 · a91558f · a91558f
1 parent c53791c
commit a91558f
Show file tree

Hide file tree

Showing 17 changed files with 4,879 additions and 648 deletions.
diff --git a/paper.pdf b/paper.pdf
diff --git a/tex/IEEEtran.cls b/tex/IEEEtran.cls
diff --git a/tex/abstract.tex b/tex/abstract.tex
@@ -1,25 +1,7 @@
 \begin{abstract}
-Accurate automatic optimization heuristics are necessary for dealing with the
-complexity and diversity of modern hardware and software. Machine learning is a
-proven technique for learning such heuristics, but its success is bound by the
-quality of the features used. These features must be hand crafted by developers
-through a combination of expert domain knowledge and trial and error. This makes
-the quality of the final model directly dependent on the skill and available
-time of the system architect.
+Accurate automatic optimization heuristics are necessary for dealing with the complexity and diversity of modern hardware and software. Machine learning is a proven technique for learning such heuristics, but its success is bound by the quality of the features used. These features must be hand crafted by developers through a combination of expert domain knowledge and trial and error. This makes the quality of the final model directly dependent on the skill and available time of the system architect.
 
-Our work introduces a better way for building heuristics. We develop a deep
-neural network that learns heuristics over raw code, entirely without using code
-features. The neural network simultaneously constructs appropriate
-representations of the code and learns how best to optimize, removing the need
-for manual feature creation. Further, we show that our neural nets can transfer
-learning from one optimization problem to another, improving the accuracy of new
-models, without the help of human experts.
+Our work introduces a better way for building heuristics. We develop a deep neural network that learns heuristics over raw code, entirely without using code features. The neural network simultaneously constructs appropriate representations of the code and learns how best to optimize, removing the need for manual feature creation. Further, we show that our neural nets can transfer learning from one optimization problem to another, improving the accuracy of new models, without the help of human experts.
 
-We compare the effectiveness of our automatically generated heuristics against
-ones with features hand-picked by experts. We examine two challenging tasks:
-predicting optimal mapping for heterogeneous parallelism and GPU thread
-coarsening factors. In 89\% of the cases, the quality of our fully automatic
-heuristics matches or surpasses that of state-of-the-art predictive models using
-hand-crafted features, providing on average 14\% and 12\% more performance with
-no human effort expended on designing features.
+We compare the effectiveness of our automatically generated heuristics against ones with features hand-picked by experts. We examine two challenging tasks: predicting optimal mapping for heterogeneous parallelism and GPU thread coarsening factors. In 89\% of the cases, the quality of our fully automatic heuristics matches or surpasses that of state-of-the-art predictive models using hand-crafted features, providing on average 14\% and 12\% more performance with no human effort expended on designing features.
 \end{abstract}
diff --git a/tex/acks.tex b/tex/acks.tex
@@ -0,0 +1,6 @@
+\section*{Acknowledgments}
+
+This work was supported by the UK Engineering and Physical Sciences Research
+Council under grants EP/L01503X/1 (CDT in Pervasive Parallelism), EP/M01567X/1
+(SANDeRs), EP/M015793/1 (DIVIDEND), and EP/P003915/1 (SUMMER). The code and data
+for this paper are available at: \url{https://chriscummins.cc/pact17}.
diff --git a/tex/ae.tex b/tex/ae.tex
@@ -5,9 +5,7 @@
 
 \subsection{Abstract}
 
-Our research artifact consists of interactive Jupyter notebooks. The notebooks
-enable users to replicate all experiments in the paper, evaluate results, and
-plot figures.
+Our research artifact consists of interactive Jupyter notebooks. The notebooks enable users to replicate all experiments in the paper, evaluate results, and plot figures.
 
 \subsection{Description}
 
@@ -16,30 +14,23 @@ \subsubsection{Check-list (Artifact Meta Information)}
 {\small
 \begin{itemize}
   \item {\bf Run-time environment: }Ubuntu Linux and a web browser.
-  \item {\bf Hardware: }Users with an NVIDIA GPU may enable CUDA support to
-    speed up computation of experiments.
-  \item {\bf Output: }Trained neural networks, predictive model evaluations,
-    figures and tables from the paper.
-  \item {\bf Experiment workflow: }Install and run Jupyter notebook server;
-    interact with and observe results in web browser.
-  \item {\bf Experiment customization: }Edit code and parameters in Jupyter
-    notebooks.
+  \item {\bf Hardware: }Users with an NVIDIA GPU may enable CUDA support to speed up computation of experiments.
+  \item {\bf Output: }Trained neural networks, predictive model evaluations, figures and tables from the paper.
+  \item {\bf Experiment workflow: }Install and run Jupyter notebook server; interact with and observe results in web browser.
+  \item {\bf Experiment customization: }Edit code and parameters in Jupyter notebooks.
   \item {\bf Publicly available?: }Yes, code and data. See:\\*
     \url{https://chriscummins.cc/pact17/}
   \end{itemize}
 }
 
 \subsubsection{How Delivered}
 
-A publicly available git repository containing Jupyter notebooks and
-experimental data.
+A publicly available git repository containing Jupyter notebooks and experimental data.
 
 
 \subsection{Installation}\label{subsec:installation}
 
-See \url{https://chriscummins.cc/pact17/} for instructions. The \texttt{code}
-directory contains the Jupyter notebooks. Following the build instructions
-described in \texttt{code/README.md}, the full installation process is:
+See \url{https://chriscummins.cc/pact17/} for instructions. The \texttt{code} directory contains the Jupyter notebooks. Following the build instructions described in \texttt{code/README.md}, the full installation process is:
 
 \begin{verbatim}
 $ ./bootstrap.sh | bash
@@ -55,11 +46,9 @@ \subsection{Experiment Workflow}\label{subsec:workflow}
   \texttt{make run}.
   \item In a web browser, navigate to:\\* \texttt{http://localhost:8000}.
   \item Select a Jupyter notebook to open it.
-  \item Repeatedly press the \emph{play} button (tooltip is ``run cell, select
-    below'') to step through each cell of the notebook.
+  \item Repeatedly press the \emph{play} button (tooltip is ``run cell, select below'') to step through each cell of the notebook.
 
-  OR select ``Kernel'' $>$ ``Restart \& Run All'' from the menu to run all of
-  the cells in order.
+  OR select ``Kernel'' $>$ ``Restart \& Run All'' from the menu to run all of the cells in order.
   \begin{figure}[H]
     \includegraphics[width=\columnwidth]{img/jupyter}
   \end{figure}
@@ -68,19 +57,14 @@ \subsection{Experiment Workflow}\label{subsec:workflow}
 
 \subsection{Evaluation and Expected Result}
 
-Code cells within Jupyter notebooks display their output inline, and may be
-compared against the values in the paper. Expected results are described in text
-cells.
+Code cells within Jupyter notebooks display their output inline, and may be compared against the values in the paper. Expected results are described in text cells.
 
 
 \subsection{Experiment Customization}
 
-The experiments are fully customizable. The Jupyter notebook can be edited ``on
-the fly''. Simply type your changes into the cells and re-run them.
+The experiments are fully customizable. The Jupyter notebook can be edited ``on the fly''. Simply type your changes into the cells and re-run them.
 
-\noindent Note that some of the code cells depend on the values of prior cells,
-so must be executed in sequence. Select ``Kernel'' $>$ ``Restart \& Run All''
-from the menu to run all of the cells in order.
+\noindent Note that some of the code cells depend on the values of prior cells, so must be executed in sequence. Select ``Kernel'' $>$ ``Restart \& Run All'' from the menu to run all of the cells in order.
 
 
 \subsection{Notes}

diff --git a/tex/fig/nn.tex b/tex/fig/nn.tex
@@ -5,8 +5,7 @@
   \caption{%
     DeepTune neural networks, configured for (a) heterogeneous mapping, and (b)
     thread coarsening factor. The design stays almost the same regardless of the
-    optimization problem. The only changes are the extra input for (a) and the
-    number of nodes in the output layer.%
+    optimization problem. The only changes are the extra input for (a) and size of the output layers.%
   }%
   \label{fig:nn}
 \end{figure}
diff --git a/tex/header.tex b/tex/header.tex
@@ -1,4 +1,4 @@
-\title{End-to-end Deep Learning of\\*Optimization Heuristics}
+\title{End-to-end Deep Learning of Optimization Heuristics}
 \author{\IEEEauthorblockN{Chris Cummins, Pavlos Petoumenos}
 	\IEEEauthorblockA{%
 		School of Informatics\\

diff --git a/tex/paper.tex b/tex/paper.tex
@@ -11,12 +11,7 @@
 \input{sec/related_work}
 \input{sec/conclusion}
 
-\section*{Acknowledgments}
-
-This work was supported by the UK Engineering and Physical Sciences Research
-Council under grants EP/L01503X/1 (CDT in Pervasive Parallelism), EP/M01567X/1
-(SANDeRs), EP/M015793/1 (DIVIDEND), and EP/P003915/1 (SUMMER). The code and data
-for this paper are available at: \url{https://chriscummins.cc/pact17}.
+\input{acks}
 
 \begingroup
 \label{bibliography}

diff --git a/tex/preamble.tex b/tex/preamble.tex
@@ -99,4 +99,11 @@
 % Artifact Evaluation stamp
 \usepackage[firstpage]{draftwatermark}
 \SetWatermarkText{\hspace*{5.85in}\raisebox{8.8in}{\includegraphics[]{img/ae-pact.pdf}}}
-\SetWatermarkAngle{0}
+\SetWatermarkAngle{0}
+
+% Squeeze caption sizes
+\usepackage{caption}
+\DeclareCaptionFont{mysize}{\fontsize{9}{9}\selectfont}
+\captionsetup{font=mysize}
+\captionsetup[table]{font={mysize}}
+\captionsetup[figure]{font={mysize}}
diff --git a/tex/sec/conclusion.tex b/tex/sec/conclusion.tex
@@ -1,36 +1,9 @@
 \section{Conclusions} \label{sec:conclusion}
 
-Applying machine learning to compile-time and runtime optimizations requires
-generating features first. This is a time consuming process, it needs
-supervision by an expert, and even then we cannot be sure that the selected
-features are optimal. In this paper we present a novel tool for building
-optimization heuristics, DeepTune, which forgoes feature extraction entirely,
-relying on powerful language modeling techniques to automatically build complex
-and effective representations of programs directly from raw source code. The
-result translates into a huge reduction in development effort, improved
-heuristic performance, and more simple model designs.
+Applying machine learning to compiler and runtime optimizations requires generating features first. This is a time consuming process, it needs supervision by an expert, and even then we cannot be sure that the selected features are optimal. In this paper we present a novel tool for building optimization heuristics, DeepTune, which forgoes feature extraction entirely, relying on powerful language modeling techniques to automatically build effective representations of programs directly from raw source code. The result translates into a huge reduction in development effort, improved heuristic performance, and more simple model designs.
 
-Our approach is fully automated. Using DeepTune, compiler developers no longer
-need to spend months using statistical methods and profile counters to select
-program features via trial and error.  It is worth mentioning that we do not
-tailor our model design or parameters for the optimization task at hand, yet we
-achieve performance on par with and in most cases \emph{exceeding} state-of-the-
-art predictive models.
+Our approach is fully automated. Using DeepTune, developers no longer need to spend months using statistical methods and profile counters to select program features via trial and error. It is worth mentioning that we do not tailor our model design or parameters for the optimization task at hand, yet we achieve performance on par with and in most cases \emph{exceeding} state-of-the-art predictive models.
 
-We used DeepTune to automatically construct heuristics for two challenging
-optimization problems: selecting the optimal execution device for OpenCL
-kernels, and selecting OpenCL thread coarsening factors. In both cases, we
-outperform state-of-the-art predictive models, achieving performance
-improvements of 14\% and 12\%, respectively. We have also shown that the
-DeepTune architecture allows us to exploit information learned from another
-optimization problem to give the learning a boost. Doing so provides up to a
-16\% performance improvement when training using a handful of training programs.
-We suspect that this approach will be useful for other optimization tasks for
-which training programs are a scarce resource.
+We used DeepTune to automatically construct heuristics for two challenging compiler and runtime optimization problems, find that, in both cases, we outperform state-of-the-art predictive models by 14\% and 12\%. We have also shown that the DeepTune architecture allows us to exploit information learned from another optimization problem to give the learning a boost. Doing so provides up to a 16\% performance improvement when training using a handful of programs. We suspect this approach will be useful in other domains for which training data are a scarce resource.
 
-In future work, we will extend our heuristic construction approach by
-automatically learning dynamic features over raw data; apply unsupervised
-learning techniques~\cite{Le2012} over unlabeled source code to further improve
-learned representations of programs; and deploy trained DeepTune heuristic
-models to low power embedded systems using optimization and compression of
-neural networks~\cite{Han2015}.
+In future work, we will extend our heuristic construction approach by automatically learning dynamic features over raw data; apply unsupervised learning techniques~\cite{Le2012} over unlabeled source code to further improve learned representations of programs; and deploy trained DeepTune heuristic models to low power embedded systems using quantization and compression of neural networks~\cite{Han2015}.