Skip to content

Commit

Permalink
Pytools related cleanup (#802)
Browse files Browse the repository at this point in the history
* Document pytools docker scripts, remove unused create-npz-output.py script and ConvertStarOutput task

* Update changelogs
  • Loading branch information
timaeusx authored Aug 24, 2022
1 parent 48febaf commit 55876e9
Show file tree
Hide file tree
Showing 17 changed files with 95 additions and 139 deletions.
8 changes: 4 additions & 4 deletions beta-pipelines/skylab/ATAC/ATAC.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ task BWAPairedEndAlignment {
read_group_sample_name: "the read group sample to be added upon alignment"
cpu: "the number of cpu cores to use during alignment"
output_base_name: "basename to be used for the output of the task"
docker_image: "the docker image using BWA to be used (default: us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110)"
docker_image: "the docker image using BWA to be used (default: us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730)"
}

# runtime requirements based upon input file size
Expand Down Expand Up @@ -673,13 +673,13 @@ task MakeCompliantBAM {
input {
File bam_input
String output_base_name
String docker_image = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker_image = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
}

parameter_meta {
bam_input: "the bam with barcodes in the read ids that need to be converted to barcodes in bam tags"
output_base_name: "base name to be used for the output of the task"
docker_image: "the docker image using the python script to convert the bam barcodes/read ids (default: us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110)"
docker_image: "the docker image using the python script to convert the bam barcodes/read ids (default: us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730)"
}

Int disk_size = ceil(2.5 * (if size(bam_input, "GiB") < 1 then 1 else size(bam_input, "GiB")))
Expand Down Expand Up @@ -707,7 +707,7 @@ task MakeCompliantBAM {
task BreakoutSnap {
input {
File snap_input
String docker_image = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker_image = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
String bin_size_list
}
Int num_threads = 1
Expand Down
51 changes: 51 additions & 0 deletions dockers/skylab/pytools/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# WARP Python Tools

## Quick reference

Copy and paste to pull this image

#### `docker pull us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730`

- __What is this image:__ This image is a Debian-based custom image that contains Python scripts used in various WARP pipelines.
- __How to see tool version used in image:__ Please see below.

## Versioning

PyTools uses the following convention for versioning:

#### `us.gcr.io/broad-gotc-prod/pytools:<image-version>-<unix-timestamp>`


We keep track of all past versions in [docker_versions](docker_versions.tsv) with the last image listed being the currently used version in WARP.

You can see more information about the image, including the tool versions, by running the following command:

```bash
$ docker pull us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730
$ docker inspect us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730
```

## Usage

```bash
$ docker run --rm -it \
us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730 <script.py>
```

## Scripts

This image contains the following scripts:

* `breakoutSnap.py` extracts the data in a snap file as csv files
* `create-merged-npz-output.py` takes a barcode.tsv, feature.tsv and matrix.mtx from STAR alignment outputs and creates 2 npy files and an npz file for row_index, col_index and the matrix. These files are required in the empty_drop step.
* `create_snss2_counts_csv.py` creates a csv file containing intron and exon counts from the Single Nucleus Smart-Seq2 pipeline
* `loomCompare.py` compares differences between loom files
* `ss2_loom_merge.py` creates a single loom file from multiple single sample loom files
* `makeCompliantBAM.py` make a BAM file with cellular barcodes in the read names compliant by moving them to the CB tag

The following scripts create a loom file from counts, metadata, and metrics from each pipeline:
* `create_loom_optimus.py` for Optimus pipeline
* `create_loom_snss2.py` for Single Nucleus Smart-Seq2 pipeline
* `create_snrna_optimus.py` for Optimus in `sn_rna` mode with `count_exons=false`
* `create_snrna_optimus_counts.py` for Optimus in `sn_rna` mode with `count_exons=true`

2 changes: 1 addition & 1 deletion dockers/skylab/pytools/docker_versions.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
DOCKER_VERSION
us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110
us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730
66 changes: 0 additions & 66 deletions dockers/skylab/pytools/tools/create-npz-output.py

This file was deleted.

2 changes: 1 addition & 1 deletion dockers/skylab/star/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ RUN set -eux; \
rm -r /usr/gitc/temp;

# Set tini as default entrypoint
ENTRYPOINT [ "/sbin/tini", "--"]
ENTRYPOINT ["/sbin/tini", "--"]
5 changes: 5 additions & 0 deletions pipelines/skylab/optimus/Optimus.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 5.5.3
2022-08-23 (Date of Last Commit)

* Remove an unused script in pytools docker image and removed unused ConvertStarOutputs task.

# 5.5.2
2022-08-16 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/optimus/Optimus.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ workflow Optimus {

# version of this pipeline
String pipeline_version = "5.5.2"
String pipeline_version = "5.5.3"

# this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays
Array[Int] indices = range(length(r1_fastq))
Expand Down
5 changes: 5 additions & 0 deletions pipelines/skylab/scATAC/scATAC.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.2.4
2022-08-23 (Date of Last Commit)

* Remove an unused script in pytools docker image.

# 1.2.3
2022-08-18 (Date of Last Commit)

Expand Down
6 changes: 3 additions & 3 deletions pipelines/skylab/scATAC/scATAC.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ workflow scATAC {
String bin_size_list = "10000"
}

String pipeline_version = "1.2.3"
String pipeline_version = "1.2.4"

parameter_meta {
input_fastq1: "read 1 input fastq, the read names must be tagged with the cellular barcodes"
Expand Down Expand Up @@ -238,7 +238,7 @@ task MakeCompliantBAM {
input {
File input_bam
String output_bam_filename
String docker_image = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker_image = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
}

parameter_meta {
Expand Down Expand Up @@ -271,7 +271,7 @@ task MakeCompliantBAM {
task BreakoutSnap {
input {
File snap_input
String docker_image = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker_image = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
String bin_size_list
String input_id
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 2.2.14
2022-08-23 (Date of Last Commit)

* Remove an unused script in pytools docker image.

# 2.2.13
2022-08-16 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ workflow MultiSampleSmartSeq2 {
Boolean paired_end
}
# Version of this pipeline
String pipeline_version = "2.2.13"
String pipeline_version = "2.2.14"

if (false) {
String? none = "None"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.2.11
2022-08-23 (Date of Last Commit)

* Remove an unused script in pytools docker image.

# 1.2.10
2022-08-16 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ workflow MultiSampleSmartSeq2SingleNucleus {
String? input_id_metadata_field
}
# Version of this pipeline
String pipeline_version = "1.2.10"
String pipeline_version = "1.2.11"

if (false) {
String? none = "None"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 5.1.13
2022-08-23 (Date of Last Commit)

* Remove an unused script in pytools docker image.

# 5.1.12
2022-08-16 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ workflow SmartSeq2SingleSample {
}

# version of this pipeline
String pipeline_version = "5.1.12"
String pipeline_version = "5.1.13"

parameter_meta {
genome_ref_fasta: "Genome reference in fasta format"
Expand Down
10 changes: 5 additions & 5 deletions tasks/skylab/LoomUtils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ version 1.0
task SmartSeq2LoomOutput {
input {
#runtime values
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
# the gene count file "<input_id>_rsem.genes.results" in the task results folder call-RSEMExpression
File rsem_gene_results
# file named "<input_id>_QCs.csv" in the folder "call-GroupQCOutputs/glob-*" of the the SS2 output
Expand Down Expand Up @@ -61,7 +61,7 @@ task OptimusLoomGeneration {

input {
#runtime values
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
# name of the sample
String input_id
# user provided id
Expand Down Expand Up @@ -163,7 +163,7 @@ task AggregateSmartSeq2Loom {
String? species
String? organ
String pipeline_version
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
Int disk = 200
Int machine_mem_mb = 4
Int cpu = 1
Expand Down Expand Up @@ -211,7 +211,7 @@ task SingleNucleusOptimusLoomOutput {

input {
#runtime values
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
# name of the sample
String input_id
# user provided id
Expand Down Expand Up @@ -292,7 +292,7 @@ task SingleNucleusOptimusLoomOutput {
task SingleNucleusSmartSeq2LoomOutput {
input {
#runtime values
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"

Array[File] alignment_summary_metrics
Array[File] dedup_metrics
Expand Down
56 changes: 1 addition & 55 deletions tasks/skylab/StarAlign.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -378,60 +378,6 @@ task STARsoloFastq {
}
}

task ConvertStarOutput {

input {
File barcodes
File features
File matrix

#runtime values
String docker = "quay.io/humancellatlas/secondary-analysis-python3-scientific:0.1.12"
Int machine_mem_mb = 8250
Int cpu = 1
Int disk = ceil(size(matrix, "Gi") * 2) + 10
Int preemptible = 3
}

meta {
description: "Create three numpy formats for the barcodes, gene names and the count matrix from the STARSolo count matrix in mtx format."
}

parameter_meta {
docker: "(optional) the docker image containing the runtime environment for this task"
machine_mem_mb: "(optional) the amount of memory (MiB) to provision for this task"
cpu: "(optional) the number of cpus to provision for this task"
disk: "(optional) the amount of disk space (GiB) to provision for this task"
preemptible: "(optional) if non-zero, request a pre-emptible instance and allow for this number of preemptions before running the task on a non preemptible machine"
}

command {
set -e

# create the compresed raw count matrix with the counts, gene names and the barcodes
python3 /tools/create-npz-output.py \
--barcodes ~{barcodes} \
--features ~{features} \
--matrix ~{matrix}

}

runtime {
docker: docker
memory: "${machine_mem_mb} MiB"
disks: "local-disk ${disk} HDD"
cpu: cpu
preemptible: preemptible
}

output {
File row_index = "sparse_counts_row_index.npy"
File col_index = "sparse_counts_col_index.npy"
File sparse_counts = "sparse_counts.npz"
}
}


task MergeStarOutput {

input {
Expand All @@ -441,7 +387,7 @@ task MergeStarOutput {
String input_id

#runtime values
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1660758110"
String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730"
Int machine_mem_mb = 8250
Int cpu = 1
Int disk = ceil(size(matrix, "Gi") * 2) + 10
Expand Down

0 comments on commit 55876e9

Please sign in to comment.