samplesheet_generator

generates samplesheet compatible with TSO500 LocalApp analysis.

Introduction

Running LocalApp analysis requires a samplesheet in a specific format consistent with the performed sequencing to guide the analysis. Such a samplesheet can be created manually using a template samplesheet file provided by Illumina and filling in a run specific information for every sequencing run using Excel or similar. This script is an attempt to automatize the task of samplesheet generation as much as possible. The script gets a sequencing run information in a simple tab separated table and generates a samplesheet fulfilling LocalApp requirements.

Dependencies

python3>=3.9,
pandas==2.2.0

Description of Input Parameters

Names of the input parameters defined for the script and their possible values are listed in the table below.

parameter	description	type	default
`--run-id`	ID of the sequencing run for which the samplesheet is generated.	str
`--index-type`	Type of the used index. Supported values are `dual` and `simple`.	str	`dual`
`--index-length`	Number of nucleotides in the used index. Supported values are `8` and `10`.	int
`--investigator-name`	Value of the `Investigator Name` field in the samplesheet. It is preferred for the string to be in the form `name (inpred_node)`. The string cannot contain a comma.	str	`''`
`--experiment-name`	Value of the `Experiment Name` field in the samplesheet. It is preferred for the string to be a space separated list of all the studies from which the run samples are. The string cannot contain a comma.	str	`''`
`--input-info-file`	Absolute path to an input info file.	str
`--read-length-1`	Length of the sequenced forward reads. This value will be filled in the `Reads` section of the samplesheet.	int	`101`
`--read-length-2`	Length of the sequenced reverse reads. This value will be filled in the `Reads` section of the samplesheet.	int	`101`
`--adapter-read-1`	Nucleotide adapter sequence of read1. This value will be filled in the `Settings` section of the samplesheet.	str
`--adapter-read-2`	Nucleotide adapter sequence of read2. This value will be filled in the `Settings` section of the samplesheet.	str
`--adapter-behavior`	This value will be filled in the `Settings` section of the samplesheet and passed as an input parameter to BCL convert. Supported values are `trim` and `mask`. For more info about BCL convert, see the BCL convert user guide.	str	`trim`
`--minimum-trimmed-read-length`	This value will be filled in the `Settings` section of the samplesheet and passed as an input parameter to BCL convert.	int	`35`
`--mask-short-reads`	This value will be filled in the `Settings` section of the samplesheet and passed as an input parameter to BCL convert.	int	`22`
`--override-cycles`	This value will be filled in the `Settings` section of the samplesheet and passed as an input parameter to BCL convert.	str
`--samplesheet-version`	Version in which the samplesheet is generated. Only `v1` is implemented now.	str	`v1`
`--help`	Print the help message and exit.

File format of the input_info_file follows.

Input Info File Format

The file is expected to contain tab separated values (.tsv file). The rows starting with character # are ignored (considered to be comments). The first non-commented row is considered to be a header containing column names.

Required columns of the file are: sample_id, molecule, run_id, barcode, index (at least one of columns barcode and index has to be non-empty).

column	description
`sample_id`	- id of a sample.
`molecule`	- `DNA` or `RNA`, choose the appropriate one.
`run_id`	- id of the sequencing run in which the sample was sequenced.
`barcode`	- one of the `Index_ID` values from the `indexes/TSO500_dual_.tsv` files or one of the `I7_Index_ID` values from the `indexes/TSO500simple.tsv` file. The value should correspond to the indexes used for sequencing of the sample. Alternatively, the field can be left empty or containing `NA` string.
`index`	- DNA sequence of the forward index from the column `index` from one of the `indexes/*.tsv` files. Alternatively, the field can be left empty or containing `NA` string.

Used Parameter Value Combinations

# nextseq instrument, dual indexes, index length 8
--index-type dual \
--index-length 8 \
--read-length-1 101 \
--read-length-2 101 \
--adapter-read-1 "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" \
--adapter-read-2 "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" \
--adapter-behavior "trim" \
--minimum-trimmed-read-length 35 \
--mask-short-reads 35 \
--override-cycles "U7N1Y93;I8;I8;U7N1Y93"

# novaseq instrument, dual indexes, index length 10 
--index-type dual \
--index-length 10 \
--read-length-1 101 \
--read-length-2 101 \
--adapter-read-1 "CTGTCTCTTATACACATCTCCGAGCCCACGAGAC" \
--adapter-read-2 "CTGTCTCTTATACACATCTGACGCTGCCGACGA" \
--adapter-behavior "trim" \
--minimum-trimmed-read-length 35 \
--mask-short-reads 35 \
--override-cycles "U7N1Y93;I10;I10;U7N1Y93"

# nextseq instrument, simple indexes, index length 8, legacy
--index-type simple \
--index-length 8 \
--read-length-1 101 \
--read-length-2 101 \
--adapter-read-1 "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" \
--adapter-read-2 "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" \
--adapter-behavior "trim" \
--minimum-trimmed-read-length 35 \
--mask-short-reads 22 \
--override-cycles "U7N1Y93;I8;I8;U7N1Y93"

Usage

Running locally

python3 generate_samplesheet.py [parameters]...

Docker

docker run --rm -it \
	-v /path_input_file:/container_path_input_file \
	inpred/samplesheet_generator:latest bash

Singularity/Apptainer

singularity run \
	-B /path_input_file:/container_path_input_file \
	docker://inpred/samplesheet_generator:latest bash	
	
apptainer run \
	-B /path_input_file:/container_path_input_file \
	docker://inpred/samplesheet_generator:latest bash

Run Test Data Example

Test data are located in the test subfolder of the repository. Input info file is named infoFile.tsv and expected output is stored in samplesheet.tsv.

The script is tested with data of a specific sequencing run. The run consists of artificial samples, including AcroMetrix samples. The sequencing was performed on a NextSeq instrument, with the legacy parameter setting and file formats.

Locally

# ${GITHUB_REPOSITORY_LOCAL_PATH} is an absolute path 
# to the samplesheet_generator repository on on the local compute.

# define the testRunID value
testRunID="191206_NB501498_0174_AHWCNMBGXC"

# execute samplesheet_generator
python3 samplesheet_generator.py \
	--run-id ${testRunID} \
	--index-type simple \
	--index-length 8 \
	--investigator-name "Name (InPreD node)" \
	--experiment-name "OUS pathology test run" \
	--input-info-file ${GITHUB_REPOSITORY_LOCAL_PATH}/test/infoFile.tsv \
	--read-length-1 101 \
	--read-length-2 101 \
	--adapter-read-1 "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" \
	--adapter-read-2 "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" \
	--adapter-behavior "trim" \
	--minimum-trimmed-read-length 35 \
	--mask-short-reads 22 \
	--override-cycles "U7N1Y93;I8;I8;U7N1Y93" \
	--samplesheet-version "v1"

Docker

# ${GITHUB_REPOSITORY_LOCAL_PATH} is an absolute path 
# to the samplesheet_generator repository on on the local compute.
# ${INFO_INPUT_FILE_CONTAINER} is an absolute path to the input info file 
# in the container. 

docker run --rm -it \
	-v ${GITHUB_REPOSITORY_LOCAL_PATH}/test/infoFile.tsv:${INFO_INPUT_FILE_CONTAINER} \
	docker://inpred/samplesheet_generator:latest bash

Docker> testRunID="191206_NB501498_0174_AHWCNMBGXC"
Docker> python3 samplesheet_generator.py \
		--run-id ${testRunID} \
		--index-type simple \
		--index-length 8 \
		--investigator-name "Name (InPreD node)" \
		--experiment-name "OUS pathology test run" \
		--input-info-file ${INFO_INPUT_FILE_CONTAINER} \
		--read-length-1 101 \
		--read-length-2 101 \
		--adapter-read-1 "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" \
		--adapter-read-2 "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" \
		--adapter-behavior "trim" \
		--minimum-trimmed-read-length 35 \
		--mask-short-reads 22 \
		--override-cycles "U7N1Y93;I8;I8;U7N1Y93" \
		--samplesheet-version "v1"

Singularity/Apptainer

singularity run \
	-B ${GITHUB_REPOSITORY_LOCAL_PATH}/test/infoFile.tsv:${INFO_INPUT_FILE_CONTAINER} \
	docker://inpred/samplesheet_generator:latest bash

Singularity> testRunID="191206_NB501498_0174_AHWCNMBGXC"
Singularity> python3 samplesheet_generator.py \
		--run-id ${testRunID} \
		--index-type simple \
		--index-length 8 \
		--investigator-name "Name (InPreD node)" \
		--experiment-name "OUS pathology test run" \
		--input-info-file ${INFO_INPUT_FILE_CONTAINER} \
		--read-length-1 101 \
		--read-length-2 101 \
		--adapter-read-1 "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" \
		--adapter-read-2 "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" \
		--adapter-behavior "trim" \
		--minimum-trimmed-read-length 35 \
		--mask-short-reads 22 \
		--override-cycles "U7N1Y93;I8;I8;U7N1Y93" \
		--samplesheet-version "v1"


apptainer run \
	-B ${GITHUB_REPOSITORY_LOCAL_PATH}/test/infoFile.tsv:${INFO_INPUT_FILE_CONTAINER} \
	docker://inpred/samplesheet_generator:latest bash

Apptainer> testRunID="191206_NB501498_0174_AHWCNMBGXC"
Apptainer> python3 samplesheet_generator.py \
		--run-id ${testRunID} \
		--index-type simple \
		--index-length 8 \
		--investigator-name "Name (InPreD node)" \
		--experiment-name "OUS pathology test run" \
		--input-info-file ${INFO_INPUT_FILE_CONTAINER} \
		--read-length-1 101 \
		--read-length-2 101 \
		--adapter-read-1 "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" \
		--adapter-read-2 "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" \
		--adapter-behavior "trim" \
		--minimum-trimmed-read-length 35 \
		--mask-short-reads 22 \
		--override-cycles "U7N1Y93;I8;I8;U7N1Y93" \
		--samplesheet-version "v1"

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
generator		generator
indexes		indexes
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
samplesheet_generator.py		samplesheet_generator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

samplesheet_generator

Contents

Introduction

Dependencies

Description of Input Parameters

Input Info File Format

Used Parameter Value Combinations

Usage

Running locally

Docker

Singularity/Apptainer

Run Test Data Example

Locally

Docker

Singularity/Apptainer

About

Releases

Packages

Contributors 2

Languages

License

InPreD/samplesheet_generator

Folders and files

Latest commit

History

Repository files navigation

samplesheet_generator

Contents

Introduction

Dependencies

Description of Input Parameters

Input Info File Format

Used Parameter Value Combinations

Usage

Running locally

Docker

Singularity/Apptainer

Run Test Data Example

Locally

Docker

Singularity/Apptainer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages