Skip to content

Commit

Permalink
Camera ready fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
ChrisCummins committed Jul 18, 2017
1 parent c53791c commit a91558f
Show file tree
Hide file tree
Showing 17 changed files with 4,879 additions and 648 deletions.
Binary file modified paper.pdf
Binary file not shown.
4,722 changes: 4,722 additions & 0 deletions tex/IEEEtran.cls

Large diffs are not rendered by default.

24 changes: 3 additions & 21 deletions tex/abstract.tex
@@ -1,25 +1,7 @@
\begin{abstract}
Accurate automatic optimization heuristics are necessary for dealing with the
complexity and diversity of modern hardware and software. Machine learning is a
proven technique for learning such heuristics, but its success is bound by the
quality of the features used. These features must be hand crafted by developers
through a combination of expert domain knowledge and trial and error. This makes
the quality of the final model directly dependent on the skill and available
time of the system architect.
Accurate automatic optimization heuristics are necessary for dealing with the complexity and diversity of modern hardware and software. Machine learning is a proven technique for learning such heuristics, but its success is bound by the quality of the features used. These features must be hand crafted by developers through a combination of expert domain knowledge and trial and error. This makes the quality of the final model directly dependent on the skill and available time of the system architect.

Our work introduces a better way for building heuristics. We develop a deep
neural network that learns heuristics over raw code, entirely without using code
features. The neural network simultaneously constructs appropriate
representations of the code and learns how best to optimize, removing the need
for manual feature creation. Further, we show that our neural nets can transfer
learning from one optimization problem to another, improving the accuracy of new
models, without the help of human experts.
Our work introduces a better way for building heuristics. We develop a deep neural network that learns heuristics over raw code, entirely without using code features. The neural network simultaneously constructs appropriate representations of the code and learns how best to optimize, removing the need for manual feature creation. Further, we show that our neural nets can transfer learning from one optimization problem to another, improving the accuracy of new models, without the help of human experts.

We compare the effectiveness of our automatically generated heuristics against
ones with features hand-picked by experts. We examine two challenging tasks:
predicting optimal mapping for heterogeneous parallelism and GPU thread
coarsening factors. In 89\% of the cases, the quality of our fully automatic
heuristics matches or surpasses that of state-of-the-art predictive models using
hand-crafted features, providing on average 14\% and 12\% more performance with
no human effort expended on designing features.
We compare the effectiveness of our automatically generated heuristics against ones with features hand-picked by experts. We examine two challenging tasks: predicting optimal mapping for heterogeneous parallelism and GPU thread coarsening factors. In 89\% of the cases, the quality of our fully automatic heuristics matches or surpasses that of state-of-the-art predictive models using hand-crafted features, providing on average 14\% and 12\% more performance with no human effort expended on designing features.
\end{abstract}
6 changes: 6 additions & 0 deletions tex/acks.tex
@@ -0,0 +1,6 @@
\section*{Acknowledgments}

This work was supported by the UK Engineering and Physical Sciences Research
Council under grants EP/L01503X/1 (CDT in Pervasive Parallelism), EP/M01567X/1
(SANDeRs), EP/M015793/1 (DIVIDEND), and EP/P003915/1 (SUMMER). The code and data
for this paper are available at: \url{https://chriscummins.cc/pact17}.
40 changes: 12 additions & 28 deletions tex/ae.tex
Expand Up @@ -5,9 +5,7 @@

\subsection{Abstract}

Our research artifact consists of interactive Jupyter notebooks. The notebooks
enable users to replicate all experiments in the paper, evaluate results, and
plot figures.
Our research artifact consists of interactive Jupyter notebooks. The notebooks enable users to replicate all experiments in the paper, evaluate results, and plot figures.

\subsection{Description}

Expand All @@ -16,30 +14,23 @@ \subsubsection{Check-list (Artifact Meta Information)}
{\small
\begin{itemize}
\item {\bf Run-time environment: }Ubuntu Linux and a web browser.
\item {\bf Hardware: }Users with an NVIDIA GPU may enable CUDA support to
speed up computation of experiments.
\item {\bf Output: }Trained neural networks, predictive model evaluations,
figures and tables from the paper.
\item {\bf Experiment workflow: }Install and run Jupyter notebook server;
interact with and observe results in web browser.
\item {\bf Experiment customization: }Edit code and parameters in Jupyter
notebooks.
\item {\bf Hardware: }Users with an NVIDIA GPU may enable CUDA support to speed up computation of experiments.
\item {\bf Output: }Trained neural networks, predictive model evaluations, figures and tables from the paper.
\item {\bf Experiment workflow: }Install and run Jupyter notebook server; interact with and observe results in web browser.
\item {\bf Experiment customization: }Edit code and parameters in Jupyter notebooks.
\item {\bf Publicly available?: }Yes, code and data. See:\\*
\url{https://chriscummins.cc/pact17/}
\end{itemize}
}

\subsubsection{How Delivered}

A publicly available git repository containing Jupyter notebooks and
experimental data.
A publicly available git repository containing Jupyter notebooks and experimental data.


\subsection{Installation}\label{subsec:installation}

See \url{https://chriscummins.cc/pact17/} for instructions. The \texttt{code}
directory contains the Jupyter notebooks. Following the build instructions
described in \texttt{code/README.md}, the full installation process is:
See \url{https://chriscummins.cc/pact17/} for instructions. The \texttt{code} directory contains the Jupyter notebooks. Following the build instructions described in \texttt{code/README.md}, the full installation process is:

\begin{verbatim}
$ ./bootstrap.sh | bash
Expand All @@ -55,11 +46,9 @@ \subsection{Experiment Workflow}\label{subsec:workflow}
\texttt{make run}.
\item In a web browser, navigate to:\\* \texttt{http://localhost:8000}.
\item Select a Jupyter notebook to open it.
\item Repeatedly press the \emph{play} button (tooltip is ``run cell, select
below'') to step through each cell of the notebook.
\item Repeatedly press the \emph{play} button (tooltip is ``run cell, select below'') to step through each cell of the notebook.

OR select ``Kernel'' $>$ ``Restart \& Run All'' from the menu to run all of
the cells in order.
OR select ``Kernel'' $>$ ``Restart \& Run All'' from the menu to run all of the cells in order.
\begin{figure}[H]
\includegraphics[width=\columnwidth]{img/jupyter}
\end{figure}
Expand All @@ -68,19 +57,14 @@ \subsection{Experiment Workflow}\label{subsec:workflow}

\subsection{Evaluation and Expected Result}

Code cells within Jupyter notebooks display their output inline, and may be
compared against the values in the paper. Expected results are described in text
cells.
Code cells within Jupyter notebooks display their output inline, and may be compared against the values in the paper. Expected results are described in text cells.


\subsection{Experiment Customization}

The experiments are fully customizable. The Jupyter notebook can be edited ``on
the fly''. Simply type your changes into the cells and re-run them.
The experiments are fully customizable. The Jupyter notebook can be edited ``on the fly''. Simply type your changes into the cells and re-run them.

\noindent Note that some of the code cells depend on the values of prior cells,
so must be executed in sequence. Select ``Kernel'' $>$ ``Restart \& Run All''
from the menu to run all of the cells in order.
\noindent Note that some of the code cells depend on the values of prior cells, so must be executed in sequence. Select ``Kernel'' $>$ ``Restart \& Run All'' from the menu to run all of the cells in order.


\subsection{Notes}
Expand Down
3 changes: 1 addition & 2 deletions tex/fig/nn.tex
Expand Up @@ -5,8 +5,7 @@
\caption{%
DeepTune neural networks, configured for (a) heterogeneous mapping, and (b)
thread coarsening factor. The design stays almost the same regardless of the
optimization problem. The only changes are the extra input for (a) and the
number of nodes in the output layer.%
optimization problem. The only changes are the extra input for (a) and size of the output layers.%
}%
\label{fig:nn}
\end{figure}
2 changes: 1 addition & 1 deletion tex/header.tex
@@ -1,4 +1,4 @@
\title{End-to-end Deep Learning of\\*Optimization Heuristics}
\title{End-to-end Deep Learning of Optimization Heuristics}
\author{\IEEEauthorblockN{Chris Cummins, Pavlos Petoumenos}
\IEEEauthorblockA{%
School of Informatics\\
Expand Down
7 changes: 1 addition & 6 deletions tex/paper.tex
Expand Up @@ -11,12 +11,7 @@
\input{sec/related_work}
\input{sec/conclusion}

\section*{Acknowledgments}

This work was supported by the UK Engineering and Physical Sciences Research
Council under grants EP/L01503X/1 (CDT in Pervasive Parallelism), EP/M01567X/1
(SANDeRs), EP/M015793/1 (DIVIDEND), and EP/P003915/1 (SUMMER). The code and data
for this paper are available at: \url{https://chriscummins.cc/pact17}.
\input{acks}

\begingroup
\label{bibliography}
Expand Down
9 changes: 8 additions & 1 deletion tex/preamble.tex
Expand Up @@ -99,4 +99,11 @@
% Artifact Evaluation stamp
\usepackage[firstpage]{draftwatermark}
\SetWatermarkText{\hspace*{5.85in}\raisebox{8.8in}{\includegraphics[]{img/ae-pact.pdf}}}
\SetWatermarkAngle{0}
\SetWatermarkAngle{0}

% Squeeze caption sizes
\usepackage{caption}
\DeclareCaptionFont{mysize}{\fontsize{9}{9}\selectfont}
\captionsetup{font=mysize}
\captionsetup[table]{font={mysize}}
\captionsetup[figure]{font={mysize}}
35 changes: 4 additions & 31 deletions tex/sec/conclusion.tex
@@ -1,36 +1,9 @@
\section{Conclusions} \label{sec:conclusion}

Applying machine learning to compile-time and runtime optimizations requires
generating features first. This is a time consuming process, it needs
supervision by an expert, and even then we cannot be sure that the selected
features are optimal. In this paper we present a novel tool for building
optimization heuristics, DeepTune, which forgoes feature extraction entirely,
relying on powerful language modeling techniques to automatically build complex
and effective representations of programs directly from raw source code. The
result translates into a huge reduction in development effort, improved
heuristic performance, and more simple model designs.
Applying machine learning to compiler and runtime optimizations requires generating features first. This is a time consuming process, it needs supervision by an expert, and even then we cannot be sure that the selected features are optimal. In this paper we present a novel tool for building optimization heuristics, DeepTune, which forgoes feature extraction entirely, relying on powerful language modeling techniques to automatically build effective representations of programs directly from raw source code. The result translates into a huge reduction in development effort, improved heuristic performance, and more simple model designs.

Our approach is fully automated. Using DeepTune, compiler developers no longer
need to spend months using statistical methods and profile counters to select
program features via trial and error. It is worth mentioning that we do not
tailor our model design or parameters for the optimization task at hand, yet we
achieve performance on par with and in most cases \emph{exceeding} state-of-the-
art predictive models.
Our approach is fully automated. Using DeepTune, developers no longer need to spend months using statistical methods and profile counters to select program features via trial and error. It is worth mentioning that we do not tailor our model design or parameters for the optimization task at hand, yet we achieve performance on par with and in most cases \emph{exceeding} state-of-the-art predictive models.

We used DeepTune to automatically construct heuristics for two challenging
optimization problems: selecting the optimal execution device for OpenCL
kernels, and selecting OpenCL thread coarsening factors. In both cases, we
outperform state-of-the-art predictive models, achieving performance
improvements of 14\% and 12\%, respectively. We have also shown that the
DeepTune architecture allows us to exploit information learned from another
optimization problem to give the learning a boost. Doing so provides up to a
16\% performance improvement when training using a handful of training programs.
We suspect that this approach will be useful for other optimization tasks for
which training programs are a scarce resource.
We used DeepTune to automatically construct heuristics for two challenging compiler and runtime optimization problems, find that, in both cases, we outperform state-of-the-art predictive models by 14\% and 12\%. We have also shown that the DeepTune architecture allows us to exploit information learned from another optimization problem to give the learning a boost. Doing so provides up to a 16\% performance improvement when training using a handful of programs. We suspect this approach will be useful in other domains for which training data are a scarce resource.

In future work, we will extend our heuristic construction approach by
automatically learning dynamic features over raw data; apply unsupervised
learning techniques~\cite{Le2012} over unlabeled source code to further improve
learned representations of programs; and deploy trained DeepTune heuristic
models to low power embedded systems using optimization and compression of
neural networks~\cite{Han2015}.
In future work, we will extend our heuristic construction approach by automatically learning dynamic features over raw data; apply unsupervised learning techniques~\cite{Le2012} over unlabeled source code to further improve learned representations of programs; and deploy trained DeepTune heuristic models to low power embedded systems using quantization and compression of neural networks~\cite{Han2015}.

0 comments on commit a91558f

Please sign in to comment.