Automated upload of sequence data to the IRIDA platform.
This tool assumes that the system has azcopy
on the PATH
. Follow the directions here to
download and install azcopy.
It's recommended to install the tool into an isolated environment using a tool such as conda, mamba or virtualenv:
conda create -n auto-irida-azure-upload python=3 pip
conda activate auto-irida-azure-upload
The tool is pip-installable. Using the -e
flag will allow updates/hotfixes to be applied directly to the codebase and automatically installed.
git clone https://github.com/BCCDC-PHL/auto-irida-azure-upload.git
cd auto-irida-azure-upload
pip install -e .
Start the tool as follows:
auto-irida-azure-upload --config config.json
See the Configuration section of this document for details on preparing a configuration file.
More detailed logs can be produced by controlling the log level using the --log-level
flag:
auto-irida-azure-upload --config config.json --log-level debug
This tool takes a single config file, in JSON format, with the following structure:
{
"container_url": "",
"sas_token": "",
"projects_definition_file": "/path/to/projects.csv",
"excluded_runs_list": "/path/to/excluded_runs.csv",
"scan_interval_seconds": 3600,
"upload_staging_dir": "/path/to/auto-irida-upload-tmp",
"run_parent_dirs": [
"/path/to/instrument_1",
"/path/to/instrument_2"
]
}
The projects_definition_file
should be .csv format and should include the following fields:
local_project_id
local_project_name
remote_project_id
remote_project_name
If downsampling is needed prior to upload, add the following to the config.json
file:
{
"downsampling": {
"enabled": true,
"output_dir": "/path/to/downsampled-reads",
"work_dir": "/path/to/downsampling-work",
"pipeline_name": "BCCDC-PHL/downsample-reads",
"pipeline_version": "v0.2.0"
}
}
...and the following fields to the projects.csv
file:
downsample_reads
genome_size_mb
max_depth
main('__main__.main')
main --> scan('core.scan')
This tool outputs structured logs in JSON Lines format:
Every log line should include the fields:
timestamp
level
module
function_name
line_num
message
...and the contents of the message
key will be a JSON object that includes at event_type
. The remaining keys inside the message
will vary by event type.
{"timestamp": "2022-09-22T11:32:52.287", "level": "INFO", "module", "core", "function_name": "scan", "line_num", 56, "message": {"event_type": "scan_start"}}