diff --git a/writeup/proposal/oct.png b/writeup/proposal/oct.png new file mode 100644 index 0000000..3196857 Binary files /dev/null and b/writeup/proposal/oct.png differ diff --git a/writeup/proposal/proposal.tex b/writeup/proposal/proposal.tex index 6bf18a2..b6344bb 100644 --- a/writeup/proposal/proposal.tex +++ b/writeup/proposal/proposal.tex @@ -5,6 +5,7 @@ \usepackage{amsmath} \usepackage{amsfonts} \usepackage{amssymb} +\usepackage[pdftex]{graphicx} \begin{document} \author{Timothy Trewartha \\ \and @@ -21,18 +22,24 @@ \section{Project Description} The Zamani project, started by the UCT Department of Geomatics, aims to preserve African cultural heritage by documenting heritage sites and producing laser scanned models. Currently they have documented about 40 sites in 12 African -countries with close to 100 models. Some of the models are very detailed containg -billions of points. With a fast growing volume of data the Department of Geomatics -is facing several challenges ranging from basic storage of the data to viewing and -interacting with the large models in real-time. This project aims to investigate -and develop solutions to these problems. Additionally the project will investigate +countries with close to 100 models. Some of the models are very detailed, containing +billions of points. With a fast-growing volume of data, the Department of Geomatics +is facing several challenges, ranging from basic storage of the data to viewing and +interacting with the large models in real-time. + +There are two main components to this project. The first is to develop a system to +enable real-time interaction with the models as well as real-time streaming of the +models from a central server. The second is to investigate various ways of automating GIS workflow with a view to developing a software tool -to enable more efficient manipulation of GIS data. +to enable more efficient manipulation of GIS data. These two distinct components +are related in that the real-time streaming infrastructure will be integrated into +the software developed for workflow automation. \section{Problem Statement and Research Questions} +This project aims to tackle two key issues faced by the Geomatics Department: the inability to interact with large models in real-time as well as the lack of tools enabling workflow to be automated. As such the following key research questions have been proposed. \subsection{Is it feasible to support real time viewing of models containing billions of points?} The UCT Department of Geomatics has indicated that they have difficulties -handling the size of some of their models. These laser scanned models of +handling the sizes of some of their models. These laser scanned models of cultural heritage sites are often very large, some of them containing over 8 billion points. Given this vast scale of data, traditional viewing methods and the current hardware and software systems are not able to cope. @@ -42,89 +49,93 @@ \subsection{Is it feasible to support real time viewing of models containing bil original detail. This level of detail is often necessary for cultural heritage sites in order to view details such as cracks and flaws, with a view to preserving the site and preventing damage. -\subsection{Can we provide a central storage repository for the data without the server becoming a bottleneck?} -The Geomatics Department also mentioned that they have no central -storage location for their many models. Some models are stored on -a server, but the server does not have sufficient storage capacity -for all their models. Some models are stored on client machines so -that the data is immediately available to be viewed and manipulated -as required. Lastly, there are some models that are simply stored on -external hard drives if there is no space on the server, and if they -are not currently being used. This leads to many issues concerning -data consistency, data availability and data safety. Ideally one should -have a central storage server, but this could introduce a significant -bottleneck to data access. -\subsection{Can GIS research be implemented with an automated workflow system?} -Geographic Information Science involves the capture, storage, manipulation and analysis -and management of geographic data. This data is very diverse and as such has to be handled + +This project will investigate the feasibility of real-time interaction with the Zamani models in their full detail. Answering this research question in the affirmative would enable exploration of these models interactively without decreasing the resolution beforehand. +\subsection{How effective is an automated workflow system in the GIS context?} +A LITTLE ON THE VAGUE SIDE +Developing geographic information systems involves the capture, storage, manipulation, analysis +and management of geographic data. This data is very diverse and, as such, has to be handled in quite diverse ways. This data gets abstracted into various forms. This presents a rather unique challenge in managing the data as it could be used by anyone of the research -staff at any point in the process. This data movement is laborious and could benefit from -from automisation. +staff at any point in the process. This data movement is laborious and could benefit +from automation. Workflow Management Systems aim to decompose complicated projects and processes into small atomic chucks. This decomposition can then be optimised to improve the efficiency. -GIS research projects are generally done with multiperson teams where the work is +GIS research projects generally have multi-person teams where the work is done in a parallel fashion. Under these conditions workflow management systems -are optimal. +are optimal [[REF]]. The aim is to provide a workflow management that is applicable for GIS projects. -This system should be able to: interface with the current systems; track and -manage the workflow; provide local data availability and content delivery; and -increases overall efficiency within the discipline. +This system should be able to: interface with the current systems, track and +manage the workflow, provide local data availability and content delivery, and +increase overall efficiency within the discipline. \section{Procedures and Methods} -\subsection{Hierarchical Data Structure} +Given the above problems the following procedures and methods are being proposed: +\subsection{Implement a Hierarchical Data Structure} From researching the literature it seems that the most common way of dealing with large point based models containing billions of points is to build a -multiresolution datastructure to divide our model into manageable chunks. +multiresolution data structure to divide our model into manageable chunks. Initially, we need only a small subset of the number of available points. As we zoom into the model we will need to request more points from the -datastructure until the full original detail is available. Using such a +data structure until the full original detail is available. Using such a level-of-detail structure should enable the Department of Geomatics to view even very large models at interactive frame rates, without having to decimate the original data. -\subsection{Server with 20TB storage capacity} -In order to have a central repository for all the models, we aim to obtain a server -with around 20TB of storage capacity. This will enable us to keep all the models in -a central location. + +The data structure being proposed is an octree \cite{interactivepointclouds}. All +data is stored in the leaf nodes and inner nodes provide simplified multi-resolution +representations. Additionally the data structure imposes the constraint that no leaf +node should contain more than a specified number of points. This number of points is +a parameter in the system which will need to be determined experimentally, but for +this structure a value of around 30,000 was found to give good performance +\cite{interactivepointclouds}. +\begin{figure}[h!] +\centering + \includegraphics[width=0.5\textwidth]{oct.png} + \caption{Representation of the first three levels in a multi-resolution data structure.} +\end{figure} + \subsection{Use an existing workflow management system as a base} -There already exists various platforms that are designed to manage workflow. These +Various platforms already exist that are designed to manage workflow. These systems however need to be adapted for GIS research. As these systems have a lot of -features writing such a system from the ground up would be a pointless task. +features writing such a system from the ground up would be a pointless task. -The decision on which system to use is entirely dependent on the requirements +The decision on which system to use is partly dependent on the requirements of the Geomatics department. \subsection{Modularise the interfacing components} The intent is to make each interfacing component within the workflow management system -have as little dependencies as possible. By limiting these dependencies +have as few dependencies as possible. By limiting these dependencies, the system is less likely to fail due to one component. This will also make -the system extendible if more features are required in the future. +the system extensible if more features are required in the future. \subsection{Testing and Evaluation} The most important evaluation criteria will be to demonstrate that the new -system is able to render the large point based models in realtime. Since this -functionality was not able previously, it will be a significant success. -Additionally it will be important to test whether realtime streaming from the +system is able to render the large point based models in real-time. Since this +functionality was not available previously, it will be a significant success. +Additionally it will be important to test whether real-time streaming from the server to client machines is feasible. Further testing and refinement is required for the workbench to determine the following: the speed of the content -delivery system; the effectiveness of imposed workflow; the -effectiveness of the user interface; and overall system stability. - +delivery system, the effectiveness of imposed workflow, the +effectiveness of the user interface and overall system stability. \section{Ethical, Professional and Legal Issues} \subsection{User Testing} -One of the componenets of the project invloves the design and evaluation +One of the components of the project involves the design and evaluation of a user interface. Ideally the design process for this would require -input from the user as well as testing. This testing requires ethical clearance -that will have to be obtained. +input from the user as well as testing. This testing requires that ethical clearance +be obtained. \subsection{Data Privacy} The Department of Geomatics has indicated that some of the data collected by the Zamani Project is sensitive and is not to be made freely available. It is important to ensure that during the course of this project this wish is respected -and that nothing is done to compromise the privacy of sensitive data. +and that nothing is done to compromise the privacy of sensitive data. As such, the +data, once received, will only be stored on the server used as part of the project. +Special permission will be required if the data is requested for testing at an +external location. @@ -134,25 +145,32 @@ \subsection{Hierarchical Data Structure} R-trees \cite{rtree}, bounding sphere hierarchies \cite{qsplat}, and Hilbert Space Filling Curves \cite{hilbert}, each with their own advantages and disadvantages. Based on the experimental results of each method, a dynamic octree structure seems to have -the best performance \cite{interactivepointclouds}. Using this datastructure the +the best performance \cite{interactivepointclouds}. Using this data structure the authors were able to achieve interactive walkthroughs of a data set with 2.2 -billion points totalling 63.5GB. +billion points totaling 63.5GB. \subsection{Automated Workflow Management} Various fields of science have benefited from automated workflow. It has seen good increases in productivity \cite{Brahe:2007:SWW:1316624.1316661} and the ability to share workflow between colleges has aided quite significantly in th the reproducibility of the science\cite{4721191}. GIS has been evaluated to be highly applicable to -workflow systems\cite{migliorini2011workflow}, some limitations were however noted however that -various components, such as the modeling and processing of spacial data, +workflow systems\cite{migliorini2011workflow}. However, some limitations were noted. Various components, such as the modeling and processing of spatial data, would need to be added to make it feasible. +\section{Anticipated Outcomes} +There are two key anticipated outcomes from this project. Firstly, the implementation of a multi-resolution data structure to enable real-time interaction with large point based models. Secondly, the development of a software system to enable automated GIS workflow. It is expected that if both of these outcomes are achieved, the results could have a significant impact in the Department of Geomatics. It will enable the viewing of models in full detail without decimation and could greatly increase the efficiency in Geomatics research. Key evaluation criteria are: +\begin{itemize} +\item Can the system render the largest of the Zamani models at interactive frame rates? +\item Does the system enable streaming of large models from a central server? +\item (ADD POINTS FOR WORKFLOW) +\end{itemize} + \section{Project Plan} \subsection{Risks} \subsubsection*{Request for Hardware Denied} \noindent \textit{Severity: } High \\ \noindent \textit{Likelihood: } Medium \\ -It is possible that the request for a server will be denied. If this happens +It is possible that the request for a server will be denied. If this happens, a considerable amount of restructuring will be required and it would have a significant impact on the course of the project. \subsubsection*{Network Constraints} @@ -160,42 +178,42 @@ \subsubsection*{Network Constraints} \noindent \textit{Likelihood: } Low \\ One of the core functionalities of the system would be to provide content delivery of data that is required for a specific task. Providing -this local data allows the task to get completed without unnessesry +this local data allows the task to get completed without unnecessary fetching delays. There is a risk that this content delivery system could saturate the network. This would cause the system to be slow and unusable. \subsubsection*{Middleware} \noindent \textit{Severity: } High \\ \noindent \textit{Likelihood: } Low \\ -For this project to be successful, the WFMS it would have to interface +For this project to be successful, the workflow management system would have to interface heavily with existing software used to perform GIS operations. This will require large amounts of middleware to be developed that understand the the input and output formats of this software. Since many of these -formats are propriatary a significant amount of effort will have to -be made for the sytem to function. If these formats can not be intergrated +formats are proprietary, a significant amount of effort will have to +be made for the system to function. If these formats can not be integrated, it presents a huge risk to the project. \subsubsection*{Hardware Limitations} \noindent \textit{Severity: } Medium \\ \noindent \textit{Likelihood: } Medium \\ There is a risk that the hardware available will not be able to cope with the load that will be required. Since a distributed system is -not being proposed there is a risk that the system will become a bottleneck. +not being proposed, there is a risk that the system will become a bottleneck. \subsubsection*{Large Indices} \noindent \textit{Severity: } Medium \\ \noindent \textit{Likelihood: } Low \\ -When indexing the models a significant amount of data will be generated. +When indexing the models, a significant amount of data will be generated. Given that many of the models are already very large, these indices might become infeasibly large. Dealing with such large indices will be an important part of the project and this risk will have to be handled carefully. \subsubsection*{Integration with Existing GIS Software} \noindent \textit{Severity: } Low \\ \noindent \textit{Likelihood: } Medium \\ -The hierarchical datastructure required as part of this project aims to facilitate level -of detail streaming and realtime interaction. However, ideally one should not have to -re-implement tools which are already available such as ArcGIS. The aim is to integrate -the datastructure into a pre-existing software package to prevent unnecessary work. +The hierarchical data structure required as part of this project aims to facilitate level +of detail streaming and real-time interaction. However, ideally, one should not have to +re-implement tools which are already available, such as ArcGIS. The aim is to integrate +the data structure into a pre-existing software package to prevent unnecessary work. However, this may be difficult and there are several associated risks such as, -unavailability of source code, lack of documenation for the software, and potential +unavailability of source code, lack of documentation for the software, and potential copyright license infringement. \subsubsection*{Indexing takes too long} \noindent \textit{Severity: } High \\ @@ -204,29 +222,30 @@ \subsubsection*{Indexing takes too long} amount of time \cite{interactivepointclouds}. While it is hoped that this will not be a problem, steps will have to be taken to ensure that the duration of the indexing process does not pose a risk to the completion of our project. It will also be important -to allow time for the possible event of a systems failure, in which case the indexing +to allow time for the possible event of a system failure, in which case the indexing process would have to be restarted. \subsection{Timeline, including Gantt chart} We need a nice program for creating Gantt charts \subsection{Resources required} \subsubsection*{Hardware} A server will be required for this project to enable central storage of the 3D models. -A large amount of storage will be required, as the Zamani data is over 20TB. Multiple -harddrives will be required and this will allow for a certain amount of parallelism in -data access. Additionally the server will ideally have the processing capacity to index -the models in a reasonable amount of time. +A large amount of storage will be required, as the Zamani data is over 20TB. The aim +is to initially store a subset of the data, and expand as the project progresses. +Multiple hard drives will be required and this will allow for a certain amount of +parallelism in data access. The exact specifications of the server are still +being finalised. \subsubsection*{Geographic Data} The Department of Geomatics has indicated that they are willing to make their data available for the purposes of this project. It will be important to obtain the models at an early stage as any delays in obtaining the models will delay the entire project. \subsection{Deliverables} \subsubsection*{GIS Workbench} -The core of the project is to produce a GIS workbench. This will +A key component of the project is to produce a GIS workbench. This will be the framework that ties all the components together. This will involve using an existing Workflow System and setting it up to represent the flow of a GIS project. \subsubsection*{Middleware for Core Functionalities} -Once the GIS workflow is properly understood and modelled, it is +Once the GIS workflow is properly understood and modeled, it is important to create middleware that interfaces with the systems that are currently being used. \subsubsection*{Data Flow Facilitator} @@ -234,11 +253,11 @@ \subsubsection*{Data Flow Facilitator} facilitator will need to be developed that is integrated with the workflow management system. \subsubsection*{Hierarchical Data Structure} -In order to facilitate level of detail streaming it will be essential to +In order to facilitate level of detail streaming, it will be essential to implement a hierarchical data structure which can support interactive viewing of models containing billions of points. \subsubsection*{Streaming Infrastructure} -A realtime streaming infrastrucutre from the server to client machines will be +A real-time streaming infrastructure from the server to client machines will be an important deliverable. This will also need to be implemented as early as possible. @@ -303,37 +322,11 @@ \subsubsection{Project Milestones} \hline Project Demonstrations & 5 days & 3 November 2012 & 8 November 2012 \\ \hline - Write Reflection paper &3 days & 8 November 2012 & 11 Noveber 2012 \\ + Write Reflection paper &3 days & 8 November 2012 & 11 November 2012 \\ \hline - Final project presentation & 7 days &11 November 2012 & 18 Noveber 2012 \\ + Final project presentation & 7 days &11 November 2012 & 18 November 2012 \\ \end{tabular} -\subsubsection{Implementation Milestones} - -\subsubsection*{Datastructure Implemented} -It will be important to implement the datastructure at an early stage as it -forms the basis for several other tasks. -\subsubsection*{System is able to render large models} -Once the datastructure has been implemented it will be used to test the -rendering of one of the large models supplied by the Geomatics Department. -\subsubsection*{System is able to stream large models from the server} -Once the system is capable of rendering large models the next important -milestone will be to allow streaming of the model from the central server to -the client workstation. -\subsubsection*{Developed a model for basic GIS workflow} -Understanding how GIS research is vitally important for this part of -the project. This will allow a model to be developed. This model will -be the center of the design of the system. It will provide the -information required to pick an appropriate sytem. -\subsubsection*{Basic Middleware Implemented} -This will act as the link between the current software and the WFMS. -\subsubsection*{Content Delivery System} -This system will facilitate automated data transfer between the workstations -where it is required. -\subsubsection*{Implemented a User Interface} -The user interface will be the main control mechanism of the sytem. This -will allow the workflow to be monitored and adjusted according to the users -requirements. \subsection{Work Allocation} Timothy Trewartha will be implementing the hierarchical data structure to support