Browse files

Merge pull request #7 from DEADBEEF/avoid_clash

Avoid clash
  • Loading branch information...
michielbaird committed Dec 7, 2012
2 parents 3bb26cd + db9b64d commit ec8e5d06f36cd730ab16d9b41d69e42a25335f42
Showing with 4,120 additions and 0 deletions.
  1. +7 −0 writeup/thesis_michiel/Makefile
  2. +245 −0 writeup/thesis_michiel/background.tex
  3. +751 −0 writeup/thesis_michiel/bibliography.bib
  4. +101 −0 writeup/thesis_michiel/conclusion.tex
  5. +929 −0 writeup/thesis_michiel/design.tex
  6. +482 −0 writeup/thesis_michiel/evaluation.tex
  7. BIN writeup/thesis_michiel/figures/basic_system.dia
  8. BIN writeup/thesis_michiel/figures/basic_system.pdf
  9. +43 −0 writeup/thesis_michiel/figures/basic_system.svg
  10. BIN writeup/thesis_michiel/figures/data_flow.dia
  11. BIN writeup/thesis_michiel/figures/data_flow.pdf
  12. +277 −0 writeup/thesis_michiel/figures/data_flow.svg
  13. BIN writeup/thesis_michiel/figures/design_cycle.dia
  14. BIN writeup/thesis_michiel/figures/design_cycle.pdf
  15. +198 −0 writeup/thesis_michiel/figures/design_cycle.svg
  16. BIN writeup/thesis_michiel/figures/document_test.png
  17. BIN writeup/thesis_michiel/figures/final-interface.png
  18. BIN writeup/thesis_michiel/figures/final-overview.png
  19. BIN writeup/thesis_michiel/figures/final-visual.png
  20. BIN writeup/thesis_michiel/figures/iter1_impl.pdf
  21. +83 −0 writeup/thesis_michiel/figures/iter1_impl.svg
  22. BIN writeup/thesis_michiel/figures/kepler_file.png
  23. BIN writeup/thesis_michiel/figures/network layout.dia
  24. +127 −0 writeup/thesis_michiel/figures/network layout.svg
  25. BIN writeup/thesis_michiel/figures/networklayout.pdf
  26. BIN writeup/thesis_michiel/figures/offload.dia
  27. BIN writeup/thesis_michiel/figures/offload.pdf
  28. +44 −0 writeup/thesis_michiel/figures/offload.svg
  29. BIN writeup/thesis_michiel/figures/provenance_edges.png
  30. BIN writeup/thesis_michiel/figures/site.dia
  31. BIN writeup/thesis_michiel/figures/site.pdf
  32. +110 −0 writeup/thesis_michiel/figures/site.svg
  33. BIN writeup/thesis_michiel/figures/site_view_impl2.png
  34. BIN writeup/thesis_michiel/figures/task_control_impl2.png
  35. BIN writeup/thesis_michiel/figures/task_edit_impl2.png
  36. BIN writeup/thesis_michiel/figures/task_list.dia
  37. BIN writeup/thesis_michiel/figures/task_list.pdf
  38. +138 −0 writeup/thesis_michiel/figures/task_list.svg
  39. BIN writeup/thesis_michiel/figures/task_model_kepler.png
  40. BIN writeup/thesis_michiel/figures/task_overview.dia
  41. BIN writeup/thesis_michiel/figures/task_overview.pdf
  42. +67 −0 writeup/thesis_michiel/figures/task_overview.svg
  43. BIN writeup/thesis_michiel/figures/task_overview_impl2.png
  44. BIN writeup/thesis_michiel/figures/user_impl2.dia
  45. BIN writeup/thesis_michiel/figures/user_impl2.dia.autosave
  46. BIN writeup/thesis_michiel/figures/user_impl2.pdf
  47. +107 −0 writeup/thesis_michiel/figures/user_impl2.svg
  48. BIN writeup/thesis_michiel/figures/workflow.png
  49. BIN writeup/thesis_michiel/figures/zamani_impl.png
  50. BIN writeup/thesis_michiel/figures/zamani_workflow.pdf
  51. BIN writeup/thesis_michiel/images/cslogo.png
  52. +210 −0 writeup/thesis_michiel/intro.tex
  53. 0 writeup/thesis_michiel/question.tex
  54. BIN writeup/thesis_michiel/survey.pdf
  55. +120 −0 writeup/thesis_michiel/thesis.tex
  56. +81 −0 writeup/thesis_michiel/title.tex
@@ -0,0 +1,7 @@
+ pdflatex thesis
+ bibtex thesis
+ pdflatex thesis
+ pdflatex thesis
+ pdftotext thesis.pdf - | wc -w
@@ -0,0 +1,245 @@
+ Workflow management systems define a complex process in terms well-defined
+ tasks and coordinate process completion \cite{1245778}. Automated
+ workflow management has been in wide use across various disciplines since
+ the concept was formalised in 1996\cite{springerlink:10.1007/BF00136712}.
+ Successful systems have been implemented across various, fields including
+ banking and pharmaceuticals
+ \cite{Brahe:2007:SWW:1316624.1316661,5407993}.
+ It has been shown to be very successful in the sciences as the same scientific
+ process can easily be repeated on a different set of data\cite{4721191}.
+ This not only aids in reproducibility but also saves time. This is done by
+ efficiently abstracting the operations in the flow, allowing it to be
+ automatically handled.
+ Geomatics is the field that concerns itself with the organisation,
+ representation and processing of geographic data, for the purpose of
+ querying it and making decissions off of the data
+ \cite{DiMartino:2007:TAG:1341012.1341081}. The workflow in Geomatics is
+ very distributed and the set of data that is operated on is large and
+ diverse. Workflow management within Geomatics has been considered and
+ solutions have been proposed, but not implemented or
+ evaluated\cite{Migliorini:2011:WTG:1999320.1999356}.
+ This chapter presents a discussion on Workflow Systems. Firstly presenting
+ an overview of what these systems are and briefly looking into the histories
+ of these systems. This is followed by a review of the factors that have influenced
+ the success an failure of theses systems.
+ Section~\ref{geo:data} does a review of the data and processing involved within
+ the field of Geomatics.
+ This is then followed in Section~\ref{example:sys} by a review of existing implementations
+ of Workflow Management Systems namely: Kepler, Trident and Taverna. A variety of Case Studies
+ ar presented in Section~\ref{casestudy}.
+A workflow management system consists of definitions on how a set of tasks
+should be executed \cite{springerlink:10.1007/BF00136712,vanderAalst2002125}.
+The overall procedure is defined by the following components:
+\begin{inparaenum}[(i)] \item actors, \item roles, \item responsibilities and
+obligations, \item tasks, \item activities,\item conceptual structures and
+\item resources.\end{inparaenum}
+A real life problem or task can then be broken up into these components in
+such a way that the tasks represent a flow network. These tasks then connect to
+the actors and resources via the other
+components\cite[p.~4]{Taylor:2006:WES:1196459}. This allows tasks to be
+executed efficiently in a distributed manner.
+The initial implementations of a workflow system, however, almost
+immediately failed. The systems were too rigid and was unable to accommodate the
+high levels of change that was required by the users
+These changes come from a number of sources, including: ill-specification
+of initial problems, change in actors or resources, exceptions that occurred
+and new requirements. Adaptive workflow systems were proposed to solve this
+problem by providing a mechanism for allowing change in the
+system\cite{vanderAalst2002125}. This allows processes to be extended, replaced
+or re-ordered. It also adds the ability to change already running tasks by
+providing restart, transfer and proceed options.
+Scientific workflow management has also been very successful with how
+experiments are defined, and, more importantly, reused. Another benefit that was
+quickly discovered was that it also allowed researchers to trade workflows,
+making the replication of results much easier than they were
+previously\cite{4721191}. Keys to this success were: that the workflow systems
+were made to fit the researchers; quick responses to adding required features
+when needed; listening to user input and making sharing of workflows as easy as
+Such a system has also been applied in fields that operate on large data
+sets, as would be the case if applied to problems in Geomatics Workflow systems were found to
+work well in the management of getting this data processed. Applying the
+concept to Observational Astrophysics, it revealed that it could be used to
+identify bottlenecks that could be optimised \cite{Aragon:2009:WMH:1529282.1529491}. Further, it was used to
+automatically ensure local access of large files that needed to be processed.
+\section{Geographic Data\label{geo:data}}
+Geomatics concerns itself with the collection, organisation and query of
+geographic data \\ \cite{DiMartino:2007:TAG:1341012.1341081}. This data includes
+but is not limited to landscapes, coordinate data, building models,
+statistics, pictures, textures and routes. This is a very broad set of data,
+varying from very large to very small. That variation, however, means that
+there exists no uniform method to efficiently deal with the data.
+The processing of this data can vary from human to software processing
+\cite{DiMartino:2007:TAG:1341012.1341081}. Various Web applications have been
+written to facilitate the tasks that need to be accomplished. This software is
+known as WebGIS and is becoming more popular with scientists; it also means
+that even within the field there is a strong shift toward Web-based services.
+A key realisation with the usage of this data is that the same data is used
+across various applications, to create various amounts of
+abstractions\cite{ElAdnani:2001:MLF:512161.512177}. The core data is seldom
+changed. Instead a new abstraction layer is added on top of it. The data can be
+thought of as a graph, where the nodes represent either a data or abstraction
+element, and the edges represent the functions/tasks required to create the
+particular abstraction as a set of topological relationships.
+There are various products available that can compose scientific workflows.
+\emph{The Trident workbench} \cite{Simmhan:2009:BTS:1673063.1673121} is an open
+source workflow management system developed by Microsoft Research that also
+adds middleware services and a graphical composition interface. Trident builds
+workflows of control and data flows, off of built-in, user defined activities
+and nested subflows.
+The flows are represented using XOML, an XML Specification, while the
+activities are stored as a set of sub-routines\cite{Simmhan2011790}. Trident
+can be used on a local system, remote systems and even clusters. Queries on
+the system can be performed using LINQ.
+\emph{Kepler} is another scientific workflow management system that
+provides workflow design and execution. Actors are designed to perform
+independent tasks that can either be atomic or composite
+\cite{Wang:2009:KHG:1645164.1645176}. Composite actors(subflows) consist of
+multiple atomic actors bundled together. Actors can consume data and produce
+output, called tokens. Actors communicate tokens with each other via links. The
+order of execution and the links are defined by an independent entity called
+the director. As a consequence, the workflow can either be executed in a
+sequential or parallel manner. Kepler effectively separates the workflow from
+its execution, allowing for easy batch execution. Actors can easily be exported
+and shared. Kepler is very popular due to its adaptability and easy
+ \emph{Taverna} is a scientific workbench that supports application-level
+workflow and does not focus on scheduling as much as others\cite{4721191}. Taverna
+has a strong focus on workflow sharing. Taverna is quite popular, since there
+exists a social network designed to facilitate workflow sharing among
+scientists(\emph{myExperiment}). Services are linked to the model to execute
+the various tasks. Taverna can be used in such a way that it can utilize all
+the services a client has to facilitate the flow by easily adding services. The
+Taverna language is a simple data-flow language called the Simple Conceptual
+Unified Flow Language(SCUFL), that can be encoded in XML.
+In order for these workbenches to be successful, there needs to exist a
+high level of interoperability between the workflow management and the services
+that are required \cite{Shegalov:2001:XWM:767132.767139}. However, due to the
+fact that there is a relatively high chance of failure when building this
+interoperability into the services as a core component. It is an extremely high
+risk and therefore is not typically done. A cheaper way of doing this is
+providing middleware that can wrap around the service to provide the required
+This need for interoperability has led to the popularisation of SOA(Service
+Orientated Architecture) \cite{Sanders:2008:SSA:1400549.1400595}. It should be
+noted that SOA is \emph{not} an implementation, but rather an
+\emph{Architectural Model}; SOA refers to a collection of loosely coupled
+services, that individually carry out a particular process. Each service should
+have a well defined interface with self-contained functionality. It should
+allow other applications or services to use this functionality without knowing
+the underlying technical details. These services should be hidden from the
+end-user and their usage should preferably be platform-independent.
+Although the concept has been around since the 1970s, it has only recently
+gained favour due to Web services. Web services are software components that run on the
+Internet through XML standards-based
+interfaces\cite{Tai:2004:CCW:1045658.1045680}. Each service provides a
+functional description using the \emph{Web Services Description Language}(WSDL).
+This description provides the supported operations, as well as the definition
+of the input and output messages.
+By using these concepts, a workflow system can be built that automatically uses
+these Web Services to facilitate both the data and control flow using well-defined
+interfaces in standards such as XML/JSON \cite{Shegalov:2001:XWM:767132.767139}.
+With the advancement of WebGIS, a lot
+of Web Services that facilitate Geomatics processing already exist.
+\section{Case Studies\label{casestudy}}
+The next section will look at two instances where workflow management systems
+were implemented and used. These case studies will look at both a business and
+a scientific application.
+ \subsection*{Danske Bank}
+ The workflow management system at \emph{Danske bank} was incrementally
+ implemented as their system moved from a manual
+ system\cite{Brahe:2007:SWW:1316624.1316661}.
+ This system was developed as an in-house solution when the manual system
+ could not cope any longer. Several lessons were learnt that are applicable
+ to other workflow systems. When work was divided purely from an
+ efficiency point of view, the workers became complacent as they felt that
+ they did not understand the overall mechanism and felt that they were not
+ involved. They discovered that the system did not handle change very
+ well. This change was expensive and inevitable. Their system had to be
+ adapted to handle this change. The success of the system is mainly
+ attributed to the interoperability and close relationship between the
+ users and the developers
+ \subsection*{OrthoSearch}
+ \emph{OrophoSearch} is a workflow, built on \emph{Kepler}, that is
+ designed to work on data in the field of Bioinformatics.
+ \cite{daCruz:2008:OSW:1363686.1363983}
+ A workflow system was implemented in \emph{Kepler} as it addressed the
+ requirements they had, including: \begin{inparaenum}[(i)] \item workflow
+ definition and design; \item workflow execution control; \item fault
+ tolerance; \item intermediate data management; and \item data provenance
+ support. \end{inparaenum}
+ Although the system was not without its hiccups and changes, the
+ integration with Kepler provided the workflow with increased overall
+ productivity.
+ \subsection*{Sunfall}
+ \emph{Sunfall} is a workflow system that was created to assist in locating
+ supernovas from large amount of telescope data\cite{Aragon:2009:WMH:1529282.1529491}.
+ Sunfall consists of four components: \begin{inparaenum}[(i)]\item Search, \item
+ Workflow Status Monitor, \item Data Forklift and \item Supernova Warehouse.\end{inparaenum}
+ The Search component is responsible for coordinating the tasks responsible for
+ coordinating tasks involved in finding supernovas, within the data. The system
+ is also tasked with dealing with an enormous amount of data, up to 100TB. The
+ data movement is carried out using the \emph{Data Forklift} component.
+ This project used a Parallel File system, to aid in data replication within the
+ project and used middle ware to interface with legacy software.
+ Sunfall was deemed a great success as it not only successfully improved the
+ efficiency and identified bottlenecks within the process
+This chapter reviewed the appropriate literature for Workflow Management Systems
+and the data variety and processing within Geomatics. This has provided the necessary
+insight to determine what components would be required in order to build such a Workflow
+System for the Zamani Project.
+The process to create the heritage artifacts from the raw-scans and photographs
+generates a large amount of varied data. A Workflow Management System would need
+to be able to specifically cater for this constraint similarly to the large scaled
+data involved in the implementation of both \emph{OrthoSearch} and \emph{Sunfall}.
+Since the tasks are however a mixture between automated and manual tasks. Such a
+system would be able to map to a more grid-based approach and was shown to be done
+at \emph{Danske Bank}. Middleware would need to be provided in order for the system
+to uniformly integrate with applications required throughout the process\cite{Montella:2007:UGC:1272980.1272995}.
+The model has been shown to be able to be effectively automated using a Workflow Management System\cite{Withana:2010:VWE:1851476.1851586}.
Oops, something went wrong.

0 comments on commit ec8e5d0

Please sign in to comment.