Skip to content

Building a job file for batch processing

Tom White edited this page May 9, 2016 · 2 revisions

"Job files"

In pampro, a job file is a file that contains a list of all the physical activity files in a dataset, and all the additional information to be supplied to an analysis function at run time. A simple analysis does not require any external information, so it only requires a job file that contains a column of IDs and filenames. However, more sophisticated analyses can be designed to make use of that extra information. For example, if a study asked participants to complete a sleep diary, those times could be added to the job file, and the analysis could have a few additional lines of code to ignore activity during those times.


What follows is a verbosely commented Python code snippet to create a basic job file. The objective is to create a plaintext Comma Separated Values (CSV) file with 2 columns, containing an ID and respective filename, like this:

id,filename
person_A,C:/path/to/files/person_A.bin
person_B,C:/path/to/files/person_B.bin
...
person_Z,C:/path/to/files/person_Z.bin

This code will create a list of all the files in a given folder with a given suffix, create an output file called job_file.csv in that folder, and the ID for each file will simply be the filename without the filepath or file suffix.

import glob

# The folder containing your dataset - change this!
folder = "/path/to/your/data"

# Get a list of all files in the folder that end in .bin
# Change the .bin part if you have a different file format, such as .cwa
files = glob.glob(folder + "/*.bin")

# Create the actual output file in write mode
output = open(folder + "/job_file.csv", "w")

# Write the variable names as the first line
output.write("id,filename\n")

# For each filename in the folder
for f in files:

    # Use the filename (not the full filepath) as the ID
    # Trim the suffix off the filename
    id = f.replace(folder + "/", "")
    id = id.replace(".bin", "")

    # Write the ID and filename as 1 line to the open file 
    output.write(id + "," + f + "\n")

# Close the file
output.close()

Copy this code into your favourite text editor, save it as build_job_file.py, and execute it at the command-line with: ipython build_job_file.py

If no errors appear at the terminal, there should be a file called job_file.csv in the nominated folder. If the file is empty except for the header information, the script did not find any relevant files in the folder - check that the folder path and suffix are correct.

Clone this wiki locally