-
Notifications
You must be signed in to change notification settings - Fork 11
A basic pipeline
The purpose of this page is to provide a broad overview of the whole analysis pipeline, from raw physical activity data files to summary dataset, and to demonstrate where pampro fits into that pipeline. It should be made clear that pampro is not yet a standalone solution; some interchange of data with other data management programs is still necessary to achieve a neat dataset.
The following assumes that you have already followed the installation instructions. Prior to initiating a new analysis, it is best practice to ensure you are running the latest version of the software.
An analysis script will be written with the expectation that certain bits of information are available - at a minimum, this will include a participant ID and a filename. For this purpose, we create a job file which catalogues this information. The script works by feeding the relevant data to the analysis function; it should be made clear by the script itself exactly what variables are required, what they should be called, and what format they should take.
The general layout of a pampro analysis script will be:
- Create one or more output files
- Define a function that analyses a line in a job file
- Feed sections of the job file to the function
The script will probably need to be told where to put the output files and charts, and where the job file is. It is up to the analyst to determine the resource usage of an analysis, and judge whether batch processing is appropriate and necessary.
If the analysis script was run in a parallel fashion, each parallel process will have created its own set of output files, probably numbered sequentially to make them unique. These can all be appended together into one long file; the output would then be the same as if we ran just 1 process for a long time.
This is the stage where the involvement of pampro ends.