pyrkit

a tool to archive and co-locate NGS data with project-level, sample-level, and analysis-level metadata.

1. Overview

pyrkit, pronouced park-it, automates the process of moving data from the cluster into object storage in HPC DME. It instantiates a collection heirarchy to archive raw data and results. pyrkit parses a project request template, a pipeline's output directory, and a MultiQC directory to capture project, analysis, quality-control metadata. pyrkit was created to enable FAIR scientific data management and stewardship.

Please Note: Some of the metadata listed in the example above is pipeline-specific (i.e. only for the RNA-seq pipeline).

2. Getting Started

2.1 Dependencies

pykrit has a few required dependencies. It requires the installation of the following programs:

Please note that if you running pyrkit on Biowulf, the only dependency you will need to install in the HPC DME toolkit. pyrkit will attempt to module load jq and python/3.5 (which meets any python requirements), if they are not in your $PATH.

2.2 Installation

Installation of pyrkit is easy! Please clone the repository from Github, create a virtual enviroment, and install any dendencies. Again, if you are on Biowulf, all you will need to do is clone the repository.

# Clone the Repository
git clone https://github.com/skchronicles/pyrkit.git

# Steps below are optional for biowulf users
# Create a virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
# Update pip
pip install --upgrade pip
# Download Dependencies
pip install -r requirements.txt

3. Run pyrkit

3.1 Usage

usage: pyrkit -i INPUT_DIRECTORY -o OUTPUT_VAULT -r REQUEST_TEMPLATE
              -m MULTIQC_DIRECTORY -d DME_REPO [-p PROJECT_ID] [-n]
              [-l] [-v] [-h] [--version]

3.2 Required Arguments

Argument	Type	Description	Example
-i, --input-directory	Path	Pipeline output directory	`/scratch/RNA_hg38/`
-o, --output-vault	String	HPC DME vault to upload data	`/CCBR_Archive`
-r, --request-template	File	Project Request Template	`experiment_metadata.xlsx`
-m, --multiqc-directory	Path	MultiQC Output Directory	`/scratch/RNA_hg38/multiqc_data/`
-d, --dme-repo	Path	Path to a HPC DME toolkit install	`~/DME/HPC_DME_APIs/`

3.3 OPTIONS

Argument	Type	Description	Example
-p, --project-id	String	Project ID	`ccbr-123`
-n, --dry-run	Flag	Dry-run the entire pyrkit workflow	`-n`
-n, --local-run	Flag	Upload to DME without job submission	`-l`
-v, --validate	Flag	Validate entries before submission	`-v`
-h, --help	Flag	Display help message and exit	`-h`
--version	Flag	Display version information and exit	`--version`

3.4 Example

# Grab an interactive node or submit pyrkit command to cluster
# Do not run this on the head node!
sinteractive --mem=8g --cpus-per-task=2

# Dry runs pyrkit and submits job to upload data to cluster
./pyrkit -i /scratch/ccbr123/RNA_hg38/ \
         -o /CCBR_Archive \
         -r experiment_metadata.xlsx \
         -m /scratch/ccbr123/RNA_hg38/multiqc_data/ \
         -d ~/DME/HPC_DME_APIs/ \
         -p ccbr-123

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
assets		assets
data		data
dev		dev
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyrkit		pyrkit
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyrkit

Table of Contents

1. Overview

2. Getting Started

2.1 Dependencies

2.2 Installation

3. Run pyrkit

3.1 Usage

3.2 Required Arguments

3.3 OPTIONS

3.4 Example

About

Releases

Packages

Languages

License

CCBR/pyrkit

Folders and files

Latest commit

History

Repository files navigation

pyrkit

Table of Contents

1. Overview

2. Getting Started

2.1 Dependencies

2.2 Installation

3. Run pyrkit

3.1 Usage

3.2 Required Arguments

3.3 OPTIONS

3.4 Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages