A basic pipeline

The purpose of this page is to provide a broad overview of the whole analysis pipeline, from raw physical activity data files to summary dataset, and to demonstrate where pampro fits into that pipeline. It should be made clear that pampro is not yet a standalone solution; some interchange of data with other data management programs is still necessary to achieve a neat dataset.

The following assumes that you have already followed the installation instructions. Prior to initiating a new analysis, it is best practice to ensure you are running the latest version of the software.

1. Build a job file

An analysis script will be written with the expectation that certain bits of information are available - at a minimum, this will include a participant ID and a filename. For this purpose, we create a job file which catalogues this information. The script works by feeding the relevant data to the analysis function; it should be made clear by the script itself exactly what variables are required, what they should be called, and what format they should take.

2. Run an analysis script (or more likely, run many in parallel)

The general layout of a pampro analysis script will be:

Create one or more output files
Define a function that analyses a line in a job file
Feed sections of the job file to the function

The script will probably need to be told where to put the output files and charts, and where the job file is. It is up to the analyst to determine the resource usage of an analysis, and judge whether batch processing is appropriate and necessary.

3. Collate the many outputs

If the analysis script was run in a parallel fashion, each parallel process will have created its own set of output files, probably numbered sequentially to make them unique. These can all be appended together into one long file; the output would then be the same as if we ran just 1 process for a long time.

4. Post-process the results

This is the stage where the involvement of pampro ends.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A basic pipeline

1. Build a job file

2. Run an analysis script (or more likely, run many in parallel)

3. Collate the many outputs

4. Post-process the results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally