-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WORKFLOWS-220] Create py-orca
demonstration script
#23
Conversation
Codecov Report
@@ Coverage Diff @@
## main #23 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 28 28
Lines 854 869 +15
Branches 134 137 +3
=========================================
+ Hits 854 869 +15
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! See discussion points.
@thomasyu888 @BWMac Could at least one of your try to run the script locally as per my instructions in the README? In theory, the |
@BrunoGrandePhD Giving it a shot now! After working out a couple of kinks with Bruno, the whole process was executed successfully. I think the instructions are great and it's easy for someone to try it out for themselves. |
The goal of this script is to demonstrate how you can use
py-orca
to process a dataset (in this case, RNA-seq) using a series of workflow runs:nf-synstage
,nf-core/rnaseq
, andnf-synindex
. I made some minor modifications to thepy-orca
module to make it easier to write this script. Its current location is temporary. We can discuss whether we want to move this to a subfolder or a separate repository.One key improvement over the scripts that I wrote for NTAP is that I wanted to create an actual DAG. Doing this in Airflow would make this challenging for users to play around with it. Instead, I decided to experiment with Metaflow, which makes it easy to test a DAG locally (but offers the chance to deploy it to AWS later). It's similar to Prefect and Dagster. We can revisit this once we settle on a strategy for consistency between Airflow and non-Airflow DAGs.
python3 demo.py run --dataset_id syn51514585
Some of the components in this script (e.g.
RnaseqDataset
andTowerRnaseqFlow
) could be reused in different contexts. We should also discuss if this is desirable and if so, where to store those abstract components (e.g. inpy-orca
or elsewhere).demo.py
in READMECommand and Output