RNA insert metrics #234

morsecodist · 2020-01-03T21:17:54Z

Description

This change computes insert size metrics for all paired end DNA samples and all paired end RNA samples provided we have a gtf file for the host genome.

Version

I have increased the appropriate version number in https://github.com/chanzuckerberg/idseq-dag/blob/master/idseq_dag/__init__.py. Guidelines here: https://github.com/chanzuckerberg/idseq-dag/blob/pr-template/README.md#release-notes
I have added release notes for my new version to https://github.com/chanzuckerberg/idseq-dag/blob/pr-template/README.md#release-notes

Tests

I have verified in IDseq staging that the pipeline still completes successfully:
- for single-end inputs
- for paired-end inputs
- for FASTQ inputs
- for FASTA inputs.
I have validated that my change does not introduce any correctness bugs to existing output types.
I have validated that my change does not introduce significant performance regressions or I have discussed with the team that the benefits of the change are substantial enough that we're comfortable accepting the size of the measured performance penalty.

This reverts commit 2f6edb8.

morsecodist · 2020-01-03T22:02:54Z

I have been unable to find any fasta files despite searching all of our samples. Am I searching incorrectly?

jshoe · 2020-01-06T20:03:13Z

README.md

@@ -232,6 +232,9 @@ Version numbers for this repo take the form X.Y.Z.
 - We increase X for a paradigm shift in how the pipeline is conceived. Example: adding a de-novo assembly step and then reassigning hits based on the assembled contigs.
 Changes to X or Y force recomputation of all results when a sample is rerun using idseq-web. Changes to Z do not force recomputation when the sample is rerun - the pipeline will lazily reuse existing outputs in AWS S3.

+- 3.15.5


This should be 3.15.0, no?

gregdingle

Good work! Please fix the version number, and I made some requests for clarification.

Also, it's not clear how you tested this in RNA and DNA modes... are both represented in examples/generic_test_dag.json?

idseq_dag/util/s3.py

examples/generic_test_dag.json

gregdingle · 2020-01-06T21:41:45Z

idseq_dag/__init__.py

@@ -1,2 +1,2 @@
 ''' idseq_dag '''
-__version__ = "3.14.5"
+__version__ = "3.15.5"


still hasn't changed

Thanks for catching this

idseq_dag/steps/run_star.py

…dag into rna-insert-metrics

gregdingle

Good comments but version number still hasn't fully changed.

gregdingle · 2020-01-07T21:20:02Z

idseq_dag/__init__.py

@@ -1,2 +1,2 @@
 ''' idseq_dag '''
-__version__ = "3.14.5"
+__version__ = "3.15.5"


still hasn't changed

boris-dimitrov · 2020-01-13T23:24:49Z

idseq_dag/util/s3.py

+def list_s3_keys(s3_path_prefix):
+    with botolock:
+        rate_limit_boto()
+        return _list_s3_keys(s3_path_prefix)


Sorry I am late to the party here but I have a number of concerns with the use of boto3 regarding the fact that it tends to implicitly use a global session object that cannot be shared between threads.

Is there a way to rewrite this without using boto3? If not, would you mind creating your own dedicated boto session for every client ant paginator creation?

You could look for instance here https://stackoverflow.com/questions/52820971/is-boto3-client-thread-safe but by no means is that the only place where its thread safety is questioned. The concern that the implicit global boto session should not be used from multiple threads is documented in the official boto docs. All code in s3.py is meant to be thread safe, and this example is even more important because the paginator is a long lived object (a generator).

If you must use boto3, it seems to me the botolock and the rate_limit_boto() functions are placed in the wrong place. Those should protect lines 92 and 93 above, where the first boto3 operations occur, not this line. It's also unclear to me if the paginator

morsecodist added 14 commits January 2, 2020 14:38

Added insert size metrics for RNA

3953e51

s/additional_attributes/additional_files

689ff2b

Set collect metrics for

0d8920d

Fix typo

ef87b2f

Add back counts

5ffafa8

Log testing

2f6edb8

Revert "Log testing"

372db5e

This reverts commit 2f6edb8.

Fixed gtf check

f5de340

Testing

7fcfb10

Improved gtf check

c610640

RNA test

9db9d3a

Remove duplicate flag

fd89db3

No match test

16ea7d5

Cleanup

3343c22

morsecodist requested review from katrinakalantar, cdebourcy and a team January 3, 2020 21:17

More cleanup

9279d4f

jshoe reviewed Jan 6, 2020

View reviewed changes

Update README.md

f665ad4

gregdingle approved these changes Jan 6, 2020

View reviewed changes

morsecodist added 2 commits January 6, 2020 17:56

Added comments

a9bd48e

Merge branch 'rna-insert-metrics' of github.com:chanzuckerberg/idseq-…

fa3d2a9

…dag into rna-insert-metrics

gregdingle reviewed Jan 7, 2020

View reviewed changes

Update __init__.py

178c509

morsecodist changed the title ~~Rna insert metrics~~ RNA insert metrics Jan 8, 2020

morsecodist merged commit 36e7ea8 into master Jan 8, 2020

morsecodist deleted the rna-insert-metrics branch January 8, 2020 17:15

boris-dimitrov reviewed Jan 13, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNA insert metrics #234

RNA insert metrics #234

morsecodist commented Jan 3, 2020 •

edited

Loading

morsecodist commented Jan 3, 2020

jshoe Jan 6, 2020

gregdingle left a comment

gregdingle Jan 6, 2020

gregdingle Jan 7, 2020

morsecodist Jan 8, 2020

gregdingle left a comment

gregdingle Jan 7, 2020

boris-dimitrov Jan 13, 2020

RNA insert metrics #234

RNA insert metrics #234

Conversation

morsecodist commented Jan 3, 2020 • edited Loading

Description

Version

Tests

morsecodist commented Jan 3, 2020

jshoe Jan 6, 2020

Choose a reason for hiding this comment

gregdingle left a comment

Choose a reason for hiding this comment

gregdingle Jan 6, 2020

Choose a reason for hiding this comment

gregdingle Jan 7, 2020

Choose a reason for hiding this comment

morsecodist Jan 8, 2020

Choose a reason for hiding this comment

gregdingle left a comment

Choose a reason for hiding this comment

gregdingle Jan 7, 2020

Choose a reason for hiding this comment

boris-dimitrov Jan 13, 2020

Choose a reason for hiding this comment

morsecodist commented Jan 3, 2020 •

edited

Loading