Skip to content

Commit

Permalink
Pipestat addition (#188)
Browse files Browse the repository at this point in the history
* Work towards pypier utilizing pipestat changes. Related: pepkit/pipestat#21

* Refactor to align with pipestat refactoring. Create default output schema to satisfy pipestat requirements.

* Call pipestat backend for setting status. Set status for pipeline.manager.

* Pytest fixes.

* Pytest fixes, revert old test.

* Move make_sure_path_exists before creating PipestatManager Object

* Ensure pypiper is calling pipestat interface and NOT the backend.

* Removed redundant default output schema function and refactored.

* Add clear_status to end of unit test.

* Implement pipestat.report for reporting results and objects with and without annotations.

* Fix unit test, reporting results, and refresh_stats. #187

* lint

* remove report_object and simplify report_result

* Add report_object back for backwards compatibility.

* Add ability to pass result_format to pipestat and return formatted string. Added annotation for backwards compatibility.

* Change to passing formatting function to pipestat instead of format flag.

* Added default mark down formatter to pass during pipestatmanager creation. Fixed sample_name and pipeline_name mix up for pipeline manager.

* Add passing multi flag to pipestatmanager during creation.

* Add printing pipestat arguments to log file.

* Polish output of pipestat object after initialization

* Update docs

* Update requirements

* fix _failed property return

* update changelog
  • Loading branch information
donaldcampbelljr committed Jun 29, 2023
1 parent 336cc5e commit 948a003
Show file tree
Hide file tree
Showing 19 changed files with 237 additions and 209 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
# Pypiper

[![Documentation Status](https://readthedocs.org/projects/pypiper/badge/?version=latest)](http://pypiper.readthedocs.org/en/latest/?badge=latest)
[![Build Status](https://travis-ci.org/databio/pypiper.svg?branch=master)](https://travis-ci.org/databio/pypiper)
[![Build Status](https://github.com/databio/pypiper/actions/workflows/run-pytest.yml/badge.svg?branch=dev)](https://github.com/databio/pypiper/actions/workflows/run-pytest.yml?branch=dev)

A lightweight python toolkit for gluing together restartable, robust shell pipelines. Learn more in the [documentation](http://pypiper.databio.org).
2 changes: 1 addition & 1 deletion docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Changelog

## [0.13.0] -- unreleased
## [0.13.0] -- 2023-06-29
### Added

- [pipestat](http://pipestat.databio.org/en/latest/) support
Expand Down
24 changes: 12 additions & 12 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@
master_doc = "index"

# General information about the project.
project = u"pypiper"
copyright = u"2015, Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"
project = "pypiper"
copyright = "2015, Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
Expand Down Expand Up @@ -215,8 +215,8 @@
(
"index",
"pypiper.tex",
u"pypiper Documentation",
u"Nathan Sheffield, Johanna Klughammer, Andre Rendeiro",
"pypiper Documentation",
"Nathan Sheffield, Johanna Klughammer, Andre Rendeiro",
"manual",
),
]
Expand Down Expand Up @@ -250,8 +250,8 @@
(
"index",
"pypiper",
u"pypiper Documentation",
[u"Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"],
"pypiper Documentation",
["Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"],
1,
)
]
Expand All @@ -269,8 +269,8 @@
(
"index",
"pypiper",
u"pypiper Documentation",
u"Nathan Sheffield, Johanna Klughammer, Andre Rendeiro",
"pypiper Documentation",
"Nathan Sheffield, Johanna Klughammer, Andre Rendeiro",
"pypiper",
"One line description of project.",
"Miscellaneous",
Expand All @@ -293,10 +293,10 @@
# -- Options for Epub output ----------------------------------------------

# Bibliographic Dublin Core info.
epub_title = u"pypiper"
epub_author = u"Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"
epub_publisher = u"Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"
epub_copyright = u"2015, Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"
epub_title = "pypiper"
epub_author = "Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"
epub_publisher = "Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"
epub_copyright = "2015, Nathan Sheffield, Johanna Klughammer, Andre Rendeiro"

# The basename for the epub file. It defaults to the project name.
# epub_basename = u'pypiper'
Expand Down
2 changes: 1 addition & 1 deletion docs/outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Assume you are using a pypiper pipeline named `PIPE` ( it passes `name="PIPE"` t
* **PIPE_status.flag**
As the pipeline runs, it produces a flag in the output directory, which can be either `PIPE_running.flag`, `PIPE_failed.flag`, or `PIPE_completed.flag`. These flags make it easy to assess the current state of running pipelines for individual samples, and for many samples in a project simultaneously.

* **stats.tsv**
* **stats.yaml**
Any results reported by the pipeline are saved as key-value pairs in this file, for easy parsing.

* **PIPE_profile.md**
Expand Down
8 changes: 4 additions & 4 deletions docs/pipestat.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ You can browse the pipestat documentation to learn more about it, but briefly pi

## Advancements

There are a multiple advantages of using piestat instead of the current pieline results reporiting system:
There are a multiple advantages of using pipestat instead of the current pipeline results reporting system:

1. **Database results storage:** the results can be stored either in a database or a YAML-formatted results file. This way a pypiper pipeline running in an emphemeral compute environment can report the results to the database and exit. No need to sync the results with a central results storage.
2. **Strict and clear results definition:** all the results that can be reported by a pipeline run *must* be pre-defined in a [pipestat results schema](http://pipestat.databio.org/en/latest/pipestat_specification/#pipestat-schema-format) that in a simplest case just indicates the result's type. This presents piepstat clients with the possibility to *reliably* gather all the possible results and related metadata.
2. **Strict and clear results definition:** all the results that can be reported by a pipeline run *must* be pre-defined in a [pipestat results schema](http://pipestat.databio.org/en/latest/pipestat_specification/#pipestat-schema-format) that in a simplest case just indicates the result's type. This presents pipestat clients with the possibility to *reliably* gather all the possible results and related metadata.
3. **On-the-fly results validation:** the schema is used to validate and/or convert the reported result to a strictly determined type, which makes the connection of pypiper with downstream pipeline results processing software seamless.
4. **Unified, pipeline-agnostic results interface:** other pipelines, possibly created with different pipeline frameworks, can read and write results via Python API or command line interface. This feature significantly incerases your pipeline interoperability.

Expand Down Expand Up @@ -41,8 +41,8 @@ pm = pypiper.PipelineManager(
...,
pipestat_schema="custom_results_schema.yaml",
pipestat_results_file="custom_results_file.yaml",
pipestat_record_id="my_record",
pipestat_namespace="my_namespace",
pipestat_sample_name="my_record",
pipestat_project_name="my_namespace",
pipestat_config="custom_pipestat_config.yaml",
)
```
Expand Down
2 changes: 2 additions & 0 deletions docs/report.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ When you call `pm.report_result(key, value)`, pypiper simply writes the key-valu

## Reporting objects

**Note**: Reporting objects will be deprecated in a future release. It is recommended to use `report_result`.

Starting in version 0.8, pypiper now implements a second reporting function, `report_object`. This is analogous to the `report_result` function, but instead of reporting simple key-value pairs, it lets you record any produced file as an output. Most commonly, this is used to record figures (PDFs, PNGs, etc.) produced by the pipeline. It can also be used to report other files, like HTML files.

Pypiper writes results to `objects.tsv`, which can then be aggregated for project-level summaries of plots and other pipeline result files.
Expand Down
1 change: 0 additions & 1 deletion example_pipelines/logmuse_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@


def build_argparser():

parser = ArgumentParser(
description="A pipeline to count the number of reads and file size. Accepts"
" BAM, fastq, or fastq.gz files."
Expand Down
1 change: 1 addition & 0 deletions pypiper/const.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@


CHECKPOINT_EXTENSION = ".checkpoint"
DEFAULT_SAMPLE_NAME = "DEFAULT_SAMPLE_NAME"
PIPELINE_CHECKPOINT_DELIMITER = "_"
STAGE_NAME_SPACE_REPLACEMENT = "-"
PROFILE_COLNAMES = ["pid", "hash", "cid", "runtime", "mem", "cmd", "lock"]

0 comments on commit 948a003

Please sign in to comment.