Releases: openpipelines-bio/openpipeline
openpipelines.bio v2.1.2
openpipelines.bio v2.1.1
openpipelines.bio v2.0.1
OpenPipelines.bio v1.0.5
OpenPipelines.bio v2.1.0
BREAKING CHANGES
-
Deprecation of
metadata/duplicate_obs
andmetadata/duplicate_var
components (PR #952). -
Deprecation of
workflows/annotation/scgpt_integration_knn
component (PR #952). -
annotate/scanvi
: Remove scarches functionality from this component, as it is already covered inintegrate/scarches
(PR #986).
NEW FUNCTIONALITY
-
dataflow/concatenate_h5mu
: addmodality
parameter (PR #977). -
filter_with_scrublet
: addexpected_doublet_rate
,stdev_doublet_rate
,n_neighbors
andsim_doublet_ratio
arguments (PR #974). -
feature_annotation/aling_query_reference
: Added a component to align a query and reference dataset (PR #948, #958, #972). -
workflows/qc/qc
workflow: Added ribosomal gene detection (PR #961). -
workflows/rna/rna_singlesample
,workflows/multiomics/process_samples
workflows: Added ribosomal gene detection (PR #968). -
scanvi
: enable CUDA acceleration (PR #969). -
workflows/annotation/scvi_knn
workflow: Cell-type annotation based on scVI integration followed by KNN label transfer (PR #954). -
convert/from_h5ad_to_seurat
: Add component to convert from h5ad to Seurat (PR #980). -
workflows/annotation/scanvi_scarches
workflow: Cell-type annotation based on scANVI integration and annotation with scArches for reference mapping (PR #898). -
integrate/scarches
: Implemented functionality to align the query dataset with the model registry and extend functionality to predict labels for scANVI models (PR #898). -
workflows/annotation/harmony_knn
workflow: Cell-type annotation based on harmony integration with KNN label transfer (PR #836). -
from_cellranger_multi_to_h5mu
: add support forcustom
modality (PR #982). -
integrate/scvi
: Enable passing any .var field for gene name information instead of .var index, using the--var_gene_names
parameter (PR #986).
MAJOR CHANGES
-
Several components: when a component processes a single modality, only that modality is read into memory (PR #944)
-
The
transfer/publish
component is deprecated and will be removed in a future major release (PR #941).
MINOR CHANGES
-
Bump viash to
0.9.3
(PR #995). -
Several workflows: refactor neighbors, leiden and UMAP in a separate subworkflow (PR #942 and PR #949).
-
grep_annotation_column
andsubset_obsp
: Fix compatibility for SciPy (PR #945). -
popv
: Pin numpy<2 after new release of scvi-tools (PR #946). -
Various components (
scgpt
andannotate
): Add resource labels (PR #947, PR #950). -
feature_annotation/highly_variable_features_scanpy
: Enable calculation of HVG on a subset of genes (PR #957, PR #959). -
integrate/scvi
,integrate/totalvi
andintegrate/scarches
: update base image to nvcr.io/nvidia/pytorch:24.12-py3, pin scvi-tools version to 1.1.5, unpin jax and jaxlib version (PR #970). -
annotate/celltypist
: Enable passing any layer with log normalized counts, enforce checking whether counts are log normalized (PR #971). -
process_10xh5/filter_10xh5
: update container base to ubuntu 24.04 (PR #983).
BUG FIXES
-
Fix
-stub
runs (PR #1000). -
cluster/leiden
: Fix an issue where insufficient shared memory (size of/dev/shm
) causes the processing to hang. -
utils/subset_vars
: Convert .var column used for subsetting of dtype "boolean" to dtype "bool" when it doesn't contain NaN values (PR #959). -
resources_test_scripts/annotation_test_data.sh
: Add a layer to the annotation reference dataset with log normalized counts (PR #960). -
annotate/celltypist
: Fix missing values in annotation column caused by index misalignment (PR #976). -
workflows/annotation/scgpt_annotation
andworkflows/integrate/scgpt_leiden
: Parameterization of HVG flavor with default methodcell_ranger
instead ofseurat_v3
(PR #979). -
dataflow/merge
: Resolved an issue where merging two MuData objects with overlappingvar
orobs
columns sometimes resulted in an unsupported nullable dtype (PR #990), for instance when mergingpd.IntegerDtype
andpd.FloatDtype
. These columns are now correctly cast to their native numpy dtypes before writing. -
workflows/annotation/harmony_knn
: Only process RNA modality in the workflow (PR #988). -
Documentation CI: Fix building the documentation using CI (PR #1003).
OpenPipelines.bio v2.1.0-rc.2
BUG FIXES
- Fix
-stub
runs (PR #1000).
OpenPipelines.bio v2.1.0-rc.1
BREAKING CHANGES
-
Deprecation of
metadata/duplicate_obs
andmetadata/duplicate_var
components (PR #952). -
Deprecation of
workflows/annotation/scgpt_integration_knn
component (PR #952). -
annotate/scanvi
: Remove scarches functionality from this component, as it is already covered inintegrate/scarches
(PR #986).
NEW FUNCTIONALITY
-
dataflow/concatenate_h5mu
: addmodality
parameter (PR #977). -
filter_with_scrublet
: addexpected_doublet_rate
,stdev_doublet_rate
,n_neighbors
andsim_doublet_ratio
arguments (PR #974). -
feature_annotation/aling_query_reference
: Added a component to align a query and reference dataset (PR #948, #958, #972). -
workflows/qc/qc
workflow: Added ribosomal gene detection (PR #961). -
workflows/rna/rna_singlesample
,workflows/multiomics/process_samples
workflows: Added ribosomal gene detection (PR #968). -
scanvi
: enable CUDA acceleration (PR #969). -
workflows/annotation/scvi_knn
workflow: Cell-type annotation based on scVI integration followed by KNN label transfer (PR #954). -
convert/from_h5ad_to_seurat
: Add component to convert from h5ad to Seurat (PR #980). -
workflows/annotation/scanvi_scarches
workflow: Cell-type annotation based on scANVI integration and annotation with scArches for reference mapping (PR #898). -
integrate/scarches
: Implemented functionality to align the query dataset with the model registry and extend functionality to predict labels for scANVI models (PR #898). -
workflows/annotation/harmony_knn
workflow: Cell-type annotation based on harmony integration with KNN label transfer (PR #836). -
from_cellranger_multi_to_h5mu
: add support forcustom
modality (PR #982). -
integrate/scvi
: Enable passing any .var field for gene name information instead of .var index, using the--var_gene_names
parameter (PR #986).
MAJOR CHANGES
-
Several components: when a component processes a single modality, only that modality is read into memory (PR #944)
-
The
transfer/publish
component is deprecated and will be removed in a future major release (PR #941).
MINOR CHANGES
-
Bump viash to
0.9.3
(PR #995). -
Several workflows: refactor neighbors, leiden and UMAP in a separate subworkflow (PR #942 and PR #949).
-
grep_annotation_column
andsubset_obsp
: Fix compatibility for SciPy (PR #945). -
popv
: Pin numpy<2 after new release of scvi-tools (PR #946). -
Various components (
scgpt
andannotate
): Add resource labels (PR #947, PR #950). -
feature_annotation/highly_variable_features_scanpy
: Enable calculation of HVG on a subset of genes (PR #957, PR #959). -
integrate/scvi
,integrate/totalvi
andintegrate/scarches
: update base image to nvcr.io/nvidia/pytorch:24.12-py3, pin scvi-tools version to 1.1.5, unpin jax and jaxlib version (PR #970). -
annotate/celltypist
: Enable passing any layer with log normalized counts, enforce checking whether counts are log normalized (PR #971). -
process_10xh5/filter_10xh5
: update container base to ubuntu 24.04 (PR #983).
BUG FIXES
-
cluster/leiden
: Fix an issue where insufficient shared memory (size of/dev/shm
) causes the processing to hang. -
utils/subset_vars
: Convert .var column used for subsetting of dtype "boolean" to dtype "bool" when it doesn't contain NaN values (PR #959). -
resources_test_scripts/annotation_test_data.sh
: Add a layer to the annotation reference dataset with log normalized counts (PR #960). -
annotate/celltypist
: Fix missing values in annotation column caused by index misalignment (PR #976). -
workflows/annotation/scgpt_annotation
andworkflows/integrate/scgpt_leiden
: Parameterization of HVG flavor with default methodcell_ranger
instead ofseurat_v3
(PR #979). -
dataflow/merge
: Resolved an issue where merging two MuData objects with overlappingvar
orobs
columns sometimes resulted in an unsupported nullable dtype (e.g. mergingpd.IntegerDtype
andpd.FloatDtype
). These columns are now correctly cast to their native numpy dtypes before writing(PR #990). -
workflows/annotation/harmony_knn
: Only process RNA modality in the workflow (PR #988).
OpenPipelines.bio v1.0.4
OpenPipelines.bio v2.0.0
BREAKING CHANGES
-
velocity/scvelo
: updatescvelo
to0.3.3
, which also removes support for usingloom
input files. The component now uses aMuData
object as input. Several arguments were added to support selecting different inputs from the MuData file:counts_layer
,modality
,layer_spliced
,layer_unspliced
,layer_ambiguous
. Anoutput_h5mu
argument was has been added (PR #932). -
src/annotate/onclass
andsrc/annotate/celltypist
: Input parameter for gene name layers of input datasets has been updated to--input_var_gene_names
andreference_var_gene_names
(PR #919). -
Several components under
src/scgpt
(cross_check_genes
,tokenize_pad
,binning
) now processes the input (query) datasets differently. Instead of subsetting datasets based on genes in the model vocabulary and/or highly variable genes, these components require an input .var column with a boolean mask specifying this information. The results are written back to the original input data, preserving the dataset structure (PR #832). -
query/cellxgene_census
: The default output layer has been changed from.layers["counts"]
to.X
to be more aligned with the standard OpenPipelines format (PR #933).
Use argument--output_layer_counts counts
to revert the behaviour to the previous default. -
Added cell multiplexing support to the
from_cellranger_multi_to_h5mu
component and thecellranger_multi
workflow. For thefrom_cellranger_multi_to_h5mu
component, theoutput
argument now requires a value containing a wildcard character*
, which will be replaced by the sample ID to form the final output file names. Additionally, asample_csv
argument is added to thefrom_cellragner_multi_to_h5mu
component which describes the sample name per output file. No change is required for theoutput_h5mu
argument from thecellranger_multi
workflow, the workflow will just emit multiple events in case of a multiplexed run, one for each sample. The id of the events (and default output file names) are set by--sample_ids
(in case of cell multiplexing), or (as before) by the user providedid
for the input (PR #803 and PR #902). -
demux/bcl_convert
: update BCL convert from 3.10 to 4.2 (PR #774). -
demux/cellranger_mkfastq
,mapping/cellranger_count
,mapping/cellranger_multi
andreference/build_cellranger_reference
: update cellranger to8.0.1
(PR #774 and PR #811). -
Removed
--disable_library_compatibility_check
in favour of--check_library_compatibility
to themapping/cellranger_multi
component and theingestion/cellranger_multi
workflow (PR #818). -
lianapy
: bumped version to1.3.0
(PR #827 and PR #862). Additionally,groupby
is now a required argument. -
concat
: this component was deprecated and has now been removed, useconcatenate_h5mu
instead (PR #796). -
The
workflows
folder in the root of the project no longer contains symbolic links to the build workflows intarget
.
Using any workflows that was previously linked in this directory will now result in an error which will indicate
the location of the workflow to be used instead (PR #796). -
XGBoost
: bump version to2.0.3
(PR #646). -
Several components: update anndata to
0.11.1
and mudata to0.3.1
(PR #645 and PR #901), and scanpy to1.10.4
(PR #901). -
filter/filter_with_hvg
: this component was deprecated and has now been removed. Usefeature_annotation/highly_variable_features_scanpy
instead (PR #843). -
dataflow/concat
: this component was deprecated and has now been removed. Usedataflow/concatenate_h5mu
instead (PR #857). -
convert/from_h5mu_to_seurat
: bump seurat to latest version (PR #850). -
workflows/ingestion/bd_rhapsody
: Upgrade BD Rhapsody 1.x to 2.x, thereby changing the interface of the workflow (PR #846). -
mapping/bd_rhapsody
: Upgrade BD Rhapsody 1.x to 2.x, thereby changing the interface of the workflow (PR #846). -
reference/make_bdrhap_reference
: Upgrade BD Rhapsody 1.x to 2.x, thereby changing the interface of the workflow (PR #846). -
reference/build_star_reference
: Renamemapping/star_build_reference
toreference/build_star_reference
(PR #846). -
reference/cellranger_mkgtf
: Renamereference/mkgtf
toreference/cellranger_mkgtf
(PR #846). -
labels_transfer/xgboost
: Align interface with new annotation workflow- Store label probabilities instead of uncertainties
- Take
.h5mu
format as an input instead of.h5ad
-
reference/build_cellranger_arc_reference
: a default value of "output" is now specified for the argument--genome
, inline withreference/build_cellranger_reference
component. Additionally, providing a value for--organism
is no longer required and its default value ofHomo Sapiens
has been removed (PR #864).
NEW FUNCTIONALITY
Important
Workflows from the workflows/annotation
and workflows/integration/scgpt_leiden
namespaces, plus their newly implemented dependencies, are not yet considered to be part of the stable public API. Their functionality and interface may be subject to change.
-
velocyto_to_h5mu
: now writes counts to.X
(PR #932) -
qc/calculate_atac_qc_metrics
: new component for calculating ATAC QC metrics (PR #868). -
workflows/annotation/scgpt_integration_knn
workflow: Cell-type annotation based on scGPT integration with KNN label transfer (PR #875). -
CI: Use
params.resources_test
in test workflows in order to point to an alternative location (e.g. a cache) (PR #889). -
Added
demux/cellranger_atac_mkfastq
component: demultiplex raw sequencing data for ATAC experiments (PR #726). -
process_samples
,process_batches
andrna_multisample
workflows: added functionality to scale the log-normalized
gene expression data to unit variance and zero mean. The scaled data will be output to a different layer and the
representation with reduced dimensions will be created and stored in addition to the non-scaled data (PR #733). -
transform/scaling
: add--input_layer
and--output_layer
arguments (PR #733). -
CI: added checking of mudata contents for multiple workflows (PR #783).
-
Added multiple arguments to the
cellranger_multi
workflow in order to maintain feature parity with themapping/cellranger_multi
component (PR #803). -
convert/from_cellranger_to_h5mu
: add support for antigen analysis. -
Added
demux/cellranger_atac_mkfastq
component: demultiplex raw sequencing data for ATAC experiments (PR #726). -
Added
reference/build_cellranger_reference
component: build reference file compatible with ATAC and ATAC+GEX experiments (PR #726). -
demux/bcl_convert
: add support for no lane splitting (PR #804). -
reference/cellranger_mkgtf
component: Added cellranger mkgtf as a standalone component (PR #771). -
scgpt/cross_check_genes
component: Added a gene-model cross check component for scGPT (PR #758). -
scgpt/embedding
: component: Added scGPT embedding component (PR #761) -
scgpt/tokenize_pad
: component: Added scGPT padding and tokenization component (PR #754). -
scgpt/binning
component: Added a scGPT pre-processing binning component (PR #765). -
workflows/integration/scgpt_leiden
workflow with scGPT integration followed by Leiden clustering (PR #794). -
scgpt/cell_type_annotation
component: Added scGPT cell type annotation component (PR #798). -
resources_test_scripts/scGPT.sh
: Added script to include scGPT test resources (PR #800). -
transform/clr
component: Added the option to set theaxis
along which to apply CLR. Possible to override
on workflow level as well (PR #767). -
annotate/celltypist
component: Added a CellTypist annotation component (PR #825). -
dataflow/split_h5mu
component: Added a component to split a single h5mu file into multiple h5mu files based on the values of an .obs column (PR #824). -
workflows/test_workflows/ingestion
components &workflows/ingestion
: Added standalone components for integration testing of ingestion workflows (PR #801). -
workflows/ingestion/make_reference
: Add additional arguments passed through to the STAR and BD Rhapsody reference components (PR #846). -
annotate/random_forest_annotation
component: Added a random forest cell type annotation component (PR #848). -
dataflow/concatenate_h5mu
: data from.uns
, both originating from the global and per-modality slots, is now retained in the final concatenated output object. Additionally, added theuns_merge_mode
argument in order to tune the behavior when conflicting keys are detected across samples (PR #859). -
dimred/densmap
component: Added a densMAP dimensionality reduction component (PR #748). -
annotate/scanvi
component: Added a component to annotate cells using scANVI (PR #833). -
transform/bpcells_regress_out
component: Added a component to regress out effects of confounding variables in the count matrix using BPCells (PR #863). -
transform/regress_out
: Allow providing 'input' and 'output' layers for scanpy regress_out functionality (PR #863). -
workflows/ingestion/make_reference
: add possibility to build CellRanger ARC references. Added--motifs_file
,--non_nuclear_contigs
and--output_cellranger_arc
arguments (PR #864). -
Test resources (reference_gencodev41_chr1): switch reference genome for CellRanger to ARC variant (PR #864).
-
transform/bpcells_regress_out
component: Added a component to regress out effects of confounding variables in the count matrix using BPCells (PR #863). -
transform/regress_out
: Allow providing 'input' and 'output' layers for scanpy regress_out functionality (PR #863). -
Added
transform/tfidf
component: normalize ATAC data with TF-IDF (PR #870). -
Added
dimred/lsi
component (PR #552). -
metadata/duplicate_obs
component: Added a component to make a copy from one .obs field or index to another .obs field within...