Skip to content

Commit

Permalink
Add frameworks paper abstract
Browse files Browse the repository at this point in the history
Correct some small mistakes in the text as well
  • Loading branch information
graeme-a-stewart authored and hegner committed Oct 15, 2018
1 parent 31e0801 commit e3e9b9c
Showing 1 changed file with 44 additions and 28 deletions.
72 changes: 44 additions & 28 deletions CWP/papers/HSF-CWP-2017-08_framework/latex/df_fwk_summary.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
% original source of this text was:
% https://docs.google.com/document/d/1DYEHGgB3fanhpYRDJblE9NicX0OK7BNsjTCrq7gY-sU/edit#
%

% JHEP preprint template
\documentclass[12pt,a4paper]{article}
\usepackage{jheppub}
Expand All @@ -28,7 +28,22 @@
\newcommand{\ifixme}[1]{{\slshape\color{cyan}\textbf{FIXME: } #1}}
\newcommand{\etc}{\textit{etc}}

\abstract{Abstract goes here.}
\abstract{Data processing frameworks are an essential part of HEP
experiments' software stacks. Frameworks provide a means by which code
developers can undertake the essential tasks of physics data processing,
accessing relevant inputs and storing their outputs, in a coherent way without
needing to know the details of other domains. Frameworks provide essential
core services for developers and help deliver a configurable working
application to the experiments' production systems.
Modern HEP processing frameworks are in the process of adapting to a new
computing landscape dominated by parallel processing and heterogeneity,
which pose many questions regarding enhanced functionality and scaling
that must be faced without compromising the maintainability of the code.
In this paper we identify a program of work that can help further clarify
the key concepts of frameworks for HEP and then spawn R\&D activities that
can focus the community's efforts in the most efficient manner to address
the challenges of the upcoming experimental program.
}

\begin{document}

Expand All @@ -40,21 +55,22 @@
\end{tabular*}
\vspace{2.0cm}

\title{HEP Software Foundation Community White Paper -- Data Processing Framework WG Report}
\title{HEP Software Foundation Community White Paper Working Group -- Data Processing Frameworks}

\author{HEP Software Foundation:}
\author[d]{Paolo Calafiura}
\author[a,1]{Benadikt Hegner}
\author[c]{Chris Jones}
\author[b]{Michel Jouvin}
\author[c,1]{Jim Kowalkowski}
\author[c,1]{Elizabeth Sexton-Kennedy}
\author[b]{Michel Jouvin}
\author[d]{Several Others}
\author[a]{Graeme A Stewart}
\author[d]{Several Others - Charles, Marco, ?}

\affiliation[a]{CERN, Geneva, Switzerland}
\affiliation[b]{LAL, Université Paris-Sud and CNRS/IN2P3, Orsay, France}
\affiliation[c]{Fermi National Accelerator Laboratory, Batavia, Illinois, USA}
\affiliation[d]{Lawence Berkeley National Laboratory, Berkeley, CA, USA}
\affiliation[d]{Lawrence Berkeley National Laboratory, Berkeley, CA, USA}
\affiliation[e]{Other Places}
\affiliation[1]{Paper Editor}

Expand All @@ -71,7 +87,7 @@ \section{Introduction}
formulate common data processing framework solutions for the future.

The time periods of interest for this document are DUNE and HL-LHC,
which will deliver on the order of 50 PB data events per year per
which will deliver on the order of 50 PB of data events per year per
experiment. The results of the proposed R\&D ought to be used for
building the final software systems that will be utilized in
commissioning and operations of these experiments and the processing
Expand All @@ -83,7 +99,7 @@ \section{Scope and Challenges}
Frameworks in HEP are used for the collaboration-wide data processing
tasks of triggering, reconstruction, and simulation, as well as other tasks that
subgroups of an experiment collaboration are responsible for, such as
detector alignment and calibration.
detector alignment and calibration.
Providing common framework services and libraries that will meet with
compute and data needs for HL-LHC experiments and the Intensity Frontier
experiments is a large challenge given the multi-decade legacy in this
Expand All @@ -96,10 +112,10 @@ \section{Scope and Challenges}
\item
Changes needed in the programming model that are necessary to
handle the massive parallelism that will be present throughout all
layers in the available compute facilities. This is necessary
layers in the available computing facilities. This is necessary
because of the ever-increasing availability of specialized compute
resources, and includes GPGPUs, Tensor Processing Units (TPUs),
and tiered memory systems integrated with storage, and ultra
resources, including GPGPUs, Tensor Processing Units (TPUs),
tiered memory systems integrated with storage, and ultra
high-speed network interconnects.
\item
Challenges related to advanced detector technology, like finer
Expand Down Expand Up @@ -191,12 +207,12 @@ \section{Current Practice}
and ALICE are now developing a new framework, which is called
O2~\cite{O2},{[}75{]}. At the time of writing, most major frameworks support
basic parallelisation, both within and across events, based on a
task-based model~\cite{Jones:2015soc}~\cite{Clemencic:2015paa}. O2 already
task-based model~\cite{Jones:2015soc,Clemencic:2015paa}. O2 already
includes additional multi-node setups and communication.

The frameworks provide the necessary functionality like I/O,
scheduling, configuration, logging, etc to support the execution of
these processing components. The mentioned components provide
scheduling, configuration, logging, etc.\ to support the execution of
these processing components. The aforementioned components provide
functionalities like pattern finding in a certain sub-detector or the
high-level identification of a given particle type. This layout allows
independent development and a high flexibility in the usage of physics
Expand Down Expand Up @@ -262,17 +278,17 @@ \section{Current Practice}
tradition starting in the beginning of Run 2, by utilizing all cores
on one virtual node in one process space using threading. ATLAS is
currently using a multi-process fork-and-copy-on-write solution to
remove the constraint of one core/process, and is now moving to the
remove the constraint of one core/process, and is now moving to the
multithreading approach too. Both experiments were
driven to solve this problem, by the ever growing needs for more
driven to solve this problem by the ever growing needs for more
memory per process brought on by the increasing complexity of LHC
events. Current practice manages system-wide (or facility-wide)
scaling by dividing up datasets, generating a framework application
configuration, and scheduling jobs on nodes/cores to consume all
configuration, and scheduling jobs on nodes/cores to utilize all
available resources. Given anticipated changes in hardware
(heterogeneity, connectivity, memory, storage) available at large
computing facilities, the interplay between workflow/workload
management systems and framework applications need to be carefully
management systems and framework applications needs to be carefully
examined. It may be advantageous to permit framework applications (or
systems) to span resources, permitting them to be first-class
participatents in the business of scaling within a facility. O2 provides
Expand All @@ -281,7 +297,7 @@ \section{Current Practice}

\section{Roadmap}
\label{sec:roadmap}
Forward-looking work is underway as part of projects funded through government agencies, laboratories, and collaborations. We want to be sure that relevant ideas and accomplishment are known, and that the groups doing this work have a place to report to and receive feedback for everyone’s benefit.
Forward-looking work is underway as part of projects funded through government agencies, laboratories, and collaborations. We want to be sure that relevant ideas and accomplishment are known, and that the groups doing this work have a place to report to and receive feedback for everyone’s benefit.
To organize the community, one needs to establish regular working group meetings, on a bi-monthly basis as we did with the concurrency forum. Face to face workshops after at least the 1st and the 3rd year can be co-hosted with events like CHEP and/or the WLCG workshops. A future planning workshop for transforming the results of the R\&D activities into a full development and deployment project plan should happen at the 5-year timescale.

\subsection{One-year goals}
Expand All @@ -301,15 +317,15 @@ \subsection{One-year goals}
\paragraph{Concept refinement} Jointly identify key abstractions that
made frameworks good for HEP in detail beyond what can be described in
this paper. Identify and describe where individual frameworks have
similarly or uniquely implemented these concepts. It is important to
describe how these choices are connected to the concrete use-cases. A
publishable paper should come of this that will serve as an agreed-upon
similarly or uniquely implemented these concepts. It is important to
describe how these choices are connected to the concrete use-cases. A
publishable paper should come of this that will serve as an agreed-upon
guide for where we can hope to go.

\paragraph{Technology investigations} There are four key areas that
ought to be explored to help determine future direction with regards
to software technology. The areas are: (1) task-based programming tools,
(2) inter-process and inter-node communication tools, (3) parallel number
(2) inter-process and inter-node communication tools, (3) parallel number
crunching libraries, and (4) framework workflow management.

\paragraph{Functional programming} Conduct a study describing where we
Expand All @@ -335,13 +351,13 @@ \subsection{One-year goals}
terms used to communicate and express how tasks are described and
carried out within a framework. This includes not only expressing data
dependencies, but also resource preferences and constraints, such as
GPU. The goal here is to provide enough information for a group to
GPUs. The goal here is to provide enough information for a group to
take on development of domain-specific libraries components and tools
that will increase the efficiency of carrying out physics. A good example
is how ML toolkits have evolved over the past few years. The
abstractions that have been developed have greatly increased productivity
and growth in the ML space such as the abstractions in Tensor Flow that
allow a coding of the matrix algebra that then gets remapped internally to
is how ML toolkits have evolved over the past few years. The
abstractions that have been developed have greatly increased productivity
and growth in the ML space such as the abstractions in Tensor Flow that
allow a coding of the matrix algebra that then gets remapped internally to
match the shape of the data being operated on. The user only has take care
of getting the domain science functions correct.

Expand Down

0 comments on commit e3e9b9c

Please sign in to comment.