(PR 1 of 5) Fusion Analysis: filtering #92

kgaonkar6 · 2019-08-30T14:35:42Z

Purpose/implementation

_What scientific question is your analysis addressing? Run to standardize fusion calls from both STAR fusion and arriba caller and filter for biological relevant genes and recurrent status

What was your approach?

Standardize caller format

Added columns for Arriba
SpaninngReadCount in arriba<- split_read1+split_read2
DiscordantFragmentCount in arriba<- discordant_mates
Also added annotation from FusionAnnotator [column 27]
TODO change input column name
Added columns for STAR fusion
Confidence is set to NA

Expected outputs RDS file per caller standardized for pipeline :
../../scratch/star.tsv
../../scratch/arriba.tsv

If this is not adding an analysis, describe your changes in this section._

Issue

What GitHub issue does your pull request address?
Planned Analysis: Filter and Annotate Fusions #39

Directions for reviewers

_Tell potential reviewers what kind of feedback you are soliciting.
Are there particular areas that need a closer look? Code optimization, filtering strategy

Is there something you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?_

Results

If your pull request includes code that produces scientific results, please summarize the results here.
This can help facilitate discussion around interpretation.
Please state what kinds of results are included (e.g., table, figure).

Docker and continuous integration

Check all those that apply or remove this section if it is not applicable.

The dependencies required to run the code in this pull request have been added to the project Dockerfile.

#Resolved in previous commit @jaclyn-taroni Hi! I checked if the packages are available in the rstudio running the docker on http://localhost:8787/ but Sys.readlink2 didnt work in CI checks for the script, I updated to file.path that I saw is used in other pull requests but that also failed CI. Both options run ok in the script on my laptop. Don't know how to get it passed through CI correctly.

This analysis has been added to continuous integration.

Parts to this analysis

Fusion standardization
Fusion filtering
Putative driver fusion expression filtering
Filtering fusion expression filtering
Annotation of final calls
Plot recurrent fusion calls

jaclyn-taroni · 2019-08-30T16:41:28Z

analyses/fusion_filtering/01-fusion-filtering.R

+all.callers<-rbind(ar.total, sf.total)
+
+#histology
+clin<-read.delim(paste0(file.path("../../data/pbta-histologies.tsv")),stringsAsFactors = F,sep="\t")


Unfortunately I'm not going to be able to go through this whole script today, but this came to my attention because of #94 and wanted to discuss before making changes to the README.

I think this change would be equivalent to what you currently have:

Suggested change

clin<-read.delim(paste0(file.path("../../data/pbta-histologies.tsv")),stringsAsFactors = F,sep="\t")

clin<-read.delim("../../data/pbta-histologies.tsv",stringsAsFactors = F,sep="\t")

or if you wanted to use file.path:

Suggested change

clin<-read.delim(paste0(file.path("../../data/pbta-histologies.tsv")),stringsAsFactors = F,sep="\t")

clin<-read.delim(file.path("..", "..", "data", "pbta-histologies.tsv"),stringsAsFactors = F,sep="\t")

As far as I know, you don't need to treat the symlinks as any kind of special case when reading them into R. Provided bash download-data.sh ran to completion, you can use the path to the symlinks in ../../data and that will read in the equivalent file from the most recent release.

Please let me know if that works and if my explanation was helpful, it will help me formulate changes for #94.

I updated the file.path to file.path("..", "..", "data","pbta-fusion-starfusion.tsv.gz") still doesnt read the file and fails CI , am I missing something?
I had the extra paste0 from the time I was running paste0("../../data/",Sys.readlink2("../../data/pbta-fusion-starfusion.tsv.gz")) to get the symbolic link for gzfile() which works in R 3.5.1 as well as file.path("../../data/pbta-fusion-starfusion.tsv.gz")

Ah, okay looking at the CI details: https://circleci.com/gh/AlexsLemonade/OpenPBTA-analysis/252?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

I think I might know what's going on now that I have that context. When calling this in CI with:

./scripts/run_in_ci.sh Rscript analyses/fusion_filtering/01-fusion-filtering.R

The working directory will be the root directory of the repository and you will not have to traverse upward to ../../data. It would be data/pbta-fusion-starfusion.tsv.gz rather than ../../data/pbta-fusion-starfusion.tsv.gz.

I went back to another PR where the Rscript was being run similarly: https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/54/files#diff-1d37e48f9ceff6d8030570cd36286a61R20
and data was used: https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/54/files#diff-ad4abc42962164e2453a997a53b3505fR49

oh yes that makes sense, thanks!

jharenza · 2019-08-30T16:58:44Z

Good to know, @jaclyn-taroni !

kgaonkar6 · 2019-08-30T17:37:25Z

I missed the part that the scripts need to be runnable on the test datasets in the container, some parts of this code will not work with a small sample set. Let me know if I should edit those parts to make the code pass CI. Thanks!

cgreene · 2019-08-30T19:05:03Z

@kgaonkar6 could you elaborate on which parts of the code won't work with a differently sized dataset? These may be opportunities to improve the generality of the script.

kgaonkar6 · 2019-08-30T19:49:29Z

Oh yeah, sure!

I believe that the part of the code below looking for genes that are fused to more than ( an arbitrary number of) 5 times within a histology to save in variable "total" that is written to fusion_all_list_filt.rds as our "Filtered_Fusions" list.

Filtered_Fusion <- (fusions found in atleast 2 samples per histology |Fusions called by both callers |fusions found in transcription factor or kinase gene list | Fusions with either gene fused recurrently (>=5 times) per histology) & Uniquely found in histology)

Code starts from https://github.com/kgaonkar6/OpenPBTA-analysis/blob/f89849df225f391b8b27115015d2829aa557e5c3/analyses/fusion_filtering/01-fusion-filtering.R#L143

The CI failed here because the res dataframe is empty and I'm adding a column "note" to it capture what filtering resulted in keeping the gene

https://github.com/kgaonkar6/OpenPBTA-analysis/blob/f89849df225f391b8b27115015d2829aa557e5c3/analyses/fusion_filtering/01-fusion-filtering.R#L161

jaclyn-taroni

Hi @kgaonkar6, I wanted to ask a few general questions about the pipeline before going through this script line-by-line. There may be opportunities to revisit the structure of the pipeline in service of making it more generalizable and I want to understand the current structure better before making any further recommendations.

In looking ahead at https://github.com/d3b-center/fusion_filtering_pipeline, I understand there to be 3 filtering steps: 1) what is contained here (filtering artifacts) 2) filtering based on low expression in the RNA-seq data and 3) another step based on expression. What is the rationale for splitting up the filtering steps into these three parts? Without knowing more information or fully digging into the pipeline, I might expect that the filtering would happen as a single step after standardization and annotation.
Is it fair to say that there are two kinds of filtering: filtering that you would want to do regardless of the specifics of a project (TF/kinase, is this called by n callers?) and filtering that is specific to this project or more generally specific to a project (e.g., based on histologies)?
We can always expect that folks will have RNA-seq expression estimates of some kind because these tools call fusions from RNA-seq data, did I understand that correctly?

Thank you! I'm looking forward to learning more about this.

analyses/fusion_filtering/01-fusion-filtering.R

jaclyn-taroni · 2019-09-09T18:59:32Z

analyses/fusion_filtering/01-fusion-filtering.R

+
+
+
+#IGLC6 is expressed in rnaseq


Can you tell me why you are making this substitution? This comment could be more detailed.

Yes, so this specific gene name with the "@" is only in STARFusion calls and not in the RSEM expression matrix that I had so I had to change the gene name specifically for STARFusion to match to a known gene name

jaclyn-taroni · 2019-09-09T19:09:38Z

analyses/fusion_filtering/01-fusion-filtering.R

+library(tidyr)
+library(R.utils)
+
+# star fusion


Based on this code and what you stated, it seems that the first step is standardizing the file formats and based on #39 (comment), you may ultimately want to accommodate more fusion callers.

With this in mind, you could consider the first step in your pipeline to be standardizing the files, where you have a script that accepts the input file path, the caller type (e.g., Arriba), and the output file path as arguments. There would be logic in the script based on the type argument and the output would be some standardized format that the rest of your pipeline would accept. I am essentially suggesting that you break this data cleaning/standardizing step out from any kind of filtering. Though you may want to consider the read-through and discordant reads steps as part of 'data cleaning/standardizing.'

good idea, @jaclyn-taroni !

I agree that's a great idea. While creating the package I did make a note to keep the first function to only standardize the 2 callers that we have with just if statements to run according to the caller type but I had also included the artifact filtering in there which as you mentioned would need to be in the filtering step. I'll rewrite the first code for the fusion filtering as you mentioned and keeping the readthrough and read based filtering in the filtering steps for the aggregated standardized fusion calls.

jharenza

https://github.com/d3b-center/fusion_filtering_pipeline

@jaclyn-taroni I am also adding this for some context prioritization scheme. We are modeling it after work done with the PPTC paper - Figure 1 here, but have since removed known driver gene lists and instead try to generalize high-confidence fusions using known oncogene/TSG/COSMIC gene partners, which has enabled us to capture them all without using a list of known fusions per specific histologies in question.

For the expression filtering - there are two things - 1) completely removing fusions where genes have <1 FPKM for each gene and 2) doing a comparison of expression of each gene partner vs median expression of that gene in the histology to determine if that gene partner is up or down-regulated in comparison to the whole group. This was really rationale based on what pedcbio does with oncoprints when they display up/down-regulated gene expression - they compare within the group of interest. We are going to change this, though, to be a comparison of fusion gene partner expression vs GTEX normal expression (still up for discussion) because as you can imagine, there may be a gene with a fusion in a histology that is common, and the kinase partner, for example, is upregulated, so doing the comparison between sample of interest and the histology may not show the large difference in expression just because there may be many samples with this high expression and thus an inability to see an "up" or "down"-regulated gene.

analyses/fusion_filtering/01-fusion-filtering.R

jharenza · 2019-09-10T15:15:13Z

analyses/fusion_filtering/01-fusion-filtering.R

+library(tidyr)
+library(R.utils)
+
+# star fusion


good idea, @jaclyn-taroni !

jaclyn-taroni · 2019-09-10T16:32:30Z

Thanks for the link to the presentation @jharenza, that's helpful. Is it fair to say that

completely removing fusions where genes have <1 FPKM for each gene

is a general step, not specific to any project and

doing a comparison of expression of each gene partner vs median expression of that gene in the histology to determine if that gene partner is up or down-regulated in comparison to the whole group.

is a more project-specific step?

kgaonkar6 · 2019-09-10T17:34:32Z

Thanks for the link to the presentation @jharenza, that's helpful. Is it fair to say that

completely removing fusions where genes have <1 FPKM for each gene

is a general step, not specific to any project and

doing a comparison of expression of each gene partner vs median expression of that gene in the histology to determine if that gene partner is up or down-regulated in comparison to the whole group.

is a more project-specific step?

Yes, that's what I would think so too.

jaclyn-taroni · 2019-09-10T21:08:54Z

@kgaonkar6 I'm sorry I forgot to ask this as part of my general questions about the pipeline -- what is the difference between the annotation steps in the 01-fusion-filtering.R and 04-annotate-filtered-fusion.R? We can also cross that bridge when we get to it but wondering if there are opportunities to combine steps.

kgaonkar6 · 2019-09-10T22:04:55Z

Currently in the 01-fusion-filtering.R performs the genelist based filtering to capture all fusion in genes which are either tumor suppressors/Oncogenic/Cosmic/TCGA etc and saves them as putative_driver_fusion.rds and additionally it identifies recurrent fusions using the n caller, n samples per group , n recurrent times observed fused gene in group and saves them as filtered_fusion.rds.

04-annotate-filtered-fusion.R only annotates the final fusions with the genelists : TSGs, Oncogenes, TCGA fusions, TF, kinase that passed the expression filtering to save as final annotated files. Additionally I also had a quite a few substitutions to convert old/synonym gene names to HUGO Approved gene names in the script.

jaclyn-taroni · 2019-09-11T15:31:50Z

@kgaonkar6 @jharenza thank you for answering those questions - it's been very helpful for me.

I sketched out my thoughts on the structure and order of the pipeline if the goal is to reuse it broadly:

I would make every step except the project-specific step in the red box as general as possible, where each of the the inputs are standardized in some way and the scripts accept any cutoffs as command line arguments or options like we've discussed. So, you might prefer to accept the expression estimates in a different format (e.g., not RDS, use certain gene identifiers, etc.) than what is available as part of the the OpenPBTA data and then, as part of this analysis module, perform some data cleaning step(s) to get it into the correct format for the general pipeline. (This has an added benefit of being a public example of how to get things in the correct format for your pipeline.)

There also may be some things I've missed that are prohibitive. I wanted to put this out there to get your thoughts and see if this is helpful -- will help guide me in the rest of the reviews.

You have already tackled some of this with the addition of 01-fusion-standardization.R. For this PR, I think we probably want to remove 01-fusion-filtering.R.

jharenza · 2019-09-11T19:36:33Z

@kgaonkar6 @jharenza thank you for answering those questions - it's been very helpful for me.

I sketched out my thoughts on the structure and order of the pipeline if the goal is to reuse it broadly:

I would make every step except the project-specific step in the red box as general as possible, where each of the the inputs are standardized in some way and the scripts accept any cutoffs as command line arguments or options like we've discussed. So, you might prefer to accept the expression estimates in a different format (e.g., not RDS, use certain gene identifiers, etc.) than what is available as part of the the OpenPBTA data and then, as part of this analysis module, perform some data cleaning step(s) to get it into the correct format for the general pipeline. (This has an added benefit of being a public example of how to get things in the correct format for your pipeline.)

There also may be some things I've missed that are prohibitive. I wanted to put this out there to get your thoughts and see if this is helpful -- will help guide me in the rest of the reviews.

You have already tackled some of this with the addition of 01-fusion-standardization.R. For this PR, I think we probably want to remove 01-fusion-filtering.R.

thanks @jaclyn-taroni! This looks good, but I think that the red box is not entirely project-specific - there is filtering for artifacts in addition to filtering based on project. Maybe we should have a phone call about it since there are so many steps and that way we can hash out what can go in what order? We did a lot of editing along the way mainly because when looking at many iterations of results, we saw X artifact or Y duplicate or Z was missing. So I do worry a bit about changing orders of everything, but agree that we should have standardized as much as possible so that if people have only 1 histology, for example, they can run it vs if they have many, or if they just want to run the filtering+annotation, they can also do so. So maybe we split filtering into artifact removal+filtering, move annotation before filtering, and then add plotting?

jharenza · 2019-09-11T22:33:06Z

Actually, @kgaonkar6 - why don't we make a list of every step of the pipeline post-standardizing caller output and mark as general or project-specific and then go from there? Does this sound good, @jaclyn-taroni ?

jaclyn-taroni · 2019-09-11T22:37:43Z

Sounds good, that's a great idea @jharenza! I was just writing back to say that I believe we are in agreement that the filtering needs to be split up, it is probably a matter of how. We are definitely on the same page on this point:

agree that we should have standardized as much as possible so that if people have only 1 histology, for example, they can run it vs if they have many, or if they just want to run the filtering+annotation, they can also do so.

kgaonkar6 · 2019-09-12T13:34:48Z

Actually, @kgaonkar6 - why don't we make a list of every step of the pipeline post-standardizing caller output and mark as general or project-specific and then go from there? Does this sound good, @jaclyn-taroni ?

Great idea! I've drafted a filtering table here ,https://docs.google.com/spreadsheets/d/1fKvDkkGddyeQ-_BHlE06XfMz82SC_nAPo7zwximxJRo/edit#gid=0. Filtering criteria has values general or project-specific from what we have discussed above. I also added the Output-type to show resulting type of dataset and some more description.

@jaclyn-taroni thanks for pointing out that 01-fusion-filtering.R was still in the PR. I've removed that now.

jaclyn-taroni · 2019-09-12T15:27:50Z

@kgaonkar6 thanks for putting that information together! Can you change the access settings for the sheet please?

kgaonkar6 · 2019-09-12T16:06:34Z

Let me know if you have access now, thanks!

jharenza · 2019-09-12T16:39:30Z

Thanks @kgaonkar6! I added some notes with additional details as well, so @jaclyn-taroni, please let us know if you have any questions about the steps!

jaclyn-taroni · 2019-09-12T19:44:11Z

Thank you @kgaonkar6.

In my proposed structure, I noted two filtering steps - one that is general and one that is project specific.

Wondering if the following is possible --

Could you put all these steps, noted as general in the spreadsheet, in a single script:

Both gene partners with FPKM<1
JunctionReadCount==0
SpanningFragCount-JunctionReadCount>=10
Artifact filtering (readthrough)
Artifact filtering (red flag database from FusionAnnotator)
Fused genes with either gene in TSGs,Cosmic,Oncogenic,TCGA fusion list
In-frame fusions is called by both callers
Fusions with either gene in Kinase, TF list

And these steps, marked as project-specific in a different script/set of scripts/notebook(s) that takes the output from the above general script:

In-frame fusions is called in atleast 2 samples per histology
Fusions with either gene fused more than 5 times in a histology
Fusions in Filtered-fusions found in more than 1 histology
Fused genes in Filtered-fusions found to be multi-fused more than 5 times in a sample

Above, @jharenza proposed:

So maybe we split filtering into artifact removal+filtering, move annotation before filtering, and then add plotting?

I believe splitting things as outlined above meets part of this. What I'm not clear on is the rationale for moving annotation before filtering. If I understand the current pipeline in the d3b-center/fusion_filtering_pipeline repository and @kgaonkar6's comment here #92 (comment), the last two steps are annotation and plotting.

It's possible that the thinking on this has changed since putting together the table, but wanted to check in to make sure I understand.

kgaonkar6 · 2019-09-12T21:29:14Z

Thank you @jaclyn-taroni ! This makes sense, I'll gather the filtering steps and write a script to only do general filtering using the standardized output from the 01-fusion-standardization.R script

jaclyn-taroni

Hi @kgaonkar6, I'm glad we were able to get some more details about the structure of the pipeline hashed out! Thanks for your patience and willingness to walk me through things, I really appreciate it.

The documentation for 01-fusion-standardization.R looks pretty good. I've left a few questions and there are a few changes I would like to see made before merging:

Making sure outputfile is included in the documentation
Moving the column selection/filtering for standard_calls outside the logic and steps that are specific to a particular caller
Some error-handling if someone passes a caller that isn't (currently) supported
- Optionally, you might also consider forcing all the characters in the caller argument to uppercase or lowercase so STARFusion vs. STArfusion vs. STARfusion doesn't cause any headaches.
Removing any steps you don't need (e.g., steps around breakpoints)
A comment explaining the JunctionReadCount standardization for Arriba

I've also included some comments that are more for our discussion/information only, such as how one could do something using dplyr. Those are not things that need to get changed unless you'd prefer to change them.

Please let me know if anything needs clarification. Thank you!

analyses/fusion_filtering/01-fusion-standardization.R

jaclyn-taroni · 2019-09-13T13:19:05Z

analyses/fusion_filtering/01-fusion-standardization.R

+
+opt <- parse_args(OptionParser(option_list=option_list))
+
+inputfile<-opt$fusionfile


Suggested change

inputfile<-opt$fusionfile

inputfile <- opt$fusionfile

I find code to be much easier to read and understand if there are spaces around assignment operators and after commas.

Ah yup makes sense, I'll keep that in mind for assignments. Maybe this is a dumb question but does it matter in the review process if I commit the suggestion here or should it be ok to fix it with all other changes in the script.

Not a dumb question at all. Took me a while to figure it out. Overall the answer is no it does not matter. Here's how it does make a difference as far as I can tell:

If you do this from the user interface, you can add suggestions to batch and then you can commit them in a single commit. This would also give me credit for the commit (which is not something I care about when using this feature, I just find it convenient). The other thing that is sometimes annoying is if you have larger changes you need to make after committing the suggestions, you'll have to pull that before making changes locally.

I've found that the downside of the doing everything, including the suggestions, locally is that sometimes I miss things but some folks like using the resolving conversations feature to help with this.

Oh yes all credits for edits are yours and @jharenza's :) Thanks! Since I had to make a bunch of edits I think I'll use the resolving button to keep track of editing hopefully I didn't miss any in this commit.

analyses/fusion_filtering/01-fusion-standardization.R

jaclyn-taroni

These changes look good! The comments are really nice. This is very close, two things I'd like to see tweaked before merging:

Using either saveRDS or write.table to save the standardized output but not both. I'd also include the expected file extension/file type in the documentation around outputfile.
There's something off about the indentation throughout. In my personal experience, issues with indentation can sometimes happen if I switch between different text editors during my editing process. I've put a couple specific comments about how I'd expect some of the lines to look. If you are editing in RStudio you can select all and then use Ctrl + I (or Cmd + I on a Mac) to automatically fix the indentation for you. One of my favorite RStudio features :)

I've added some dplyr alternatives as comments as well. You don't have to make these changes, they are for your information if you are interested in adopting more dplyr into your workflows.

analyses/fusion_filtering/01-fusion-standardization.R

kgaonkar6 · 2019-09-16T12:56:52Z

These changes look good! The comments are really nice. This is very close, two things I'd like to see tweaked before merging:

Using either saveRDS or write.table to save the standardized output but not both. I'd also include the expected file extension/file type in the documentation around outputfile.

There's something off about the indentation throughout. In my personal experience, issues with indentation can sometimes happen if I switch between different text editors during my editing process. I've put a couple specific comments about how I'd expect some of the lines to look. If you are editing in RStudio you can select all and then use Ctrl + I (or Cmd + I on a Mac) to automatically fix the indentation for you. One of my favorite RStudio features :)

I've added some dplyr alternatives as comments as well. You don't have to make these changes, they are for your information if you are interested in adopting more dplyr into your workflows.

You caught me about the changing editors in between editing :D . I was looking into atom editor which I've been told has great features and packages to write beautiful code. I'm just beginning to learn how to do things in atom. I'll stick to Rstudio and use the Cmd+I feature.

I'll also implement only 1 outputfile and dplyr alt. has suggested to make code more readable. Thanks!

kgaonkar6 · 2019-09-16T18:49:05Z

I was wondering if there should be a step after standardization and before filtering to merge the standardized fusion calls from STARfusion and arriba? Or should the merge be included in 01-fusion-standardization?

jaclyn-taroni · 2019-09-16T20:11:03Z

I was wondering if there should be a step after standardization and before filtering to merge the standardized fusion calls from STARfusion and arriba? Or should the merge be included in 01-fusion-standardization?

Do you exclusively use merged files downstream of this step @kgaonkar6 ?

kgaonkar6 · 2019-09-16T20:25:59Z

yes all the filtering is done on merged files

jaclyn-taroni · 2019-09-16T20:51:19Z

I would say let's leave this as is and then add a merge step before filtering. We can always revisit, but I think it would be helpful for me to have eyes on the code that is exclusively for the merge step before commenting further.

jaclyn-taroni

Looks good to me 👍 I will fix the conflicts and then merge. Great work on these changes @kgaonkar6! Looking forward to seeing the next steps.

kgaonkar6 · 2019-09-17T14:29:12Z

Thank you so much! This has been so helpful to me especially to understand how to write good code :)

kgaonkar6 added 5 commits August 29, 2019 09:30

Add first script for new analysis

0adabcb

plots from STAR releasev2 and arriba release v3

217584c

updated to read symbolic links, pdf output

e5eab79

01 script commit

d85304c

removed plots,results folders

77a5513

jharenza changed the title ~~Fusion filtering~~ (PR 1 of 5) Fusion Analysis: filtering Aug 30, 2019

kgaonkar6 added 4 commits August 30, 2019 11:04

CI config file edit

f512e0c

CI config file edit

8add311

CI config file edit

f780c63

using file.path instead of Sys.readlink2

777fa8d

jharenza mentioned this pull request Aug 30, 2019

symlinks instructions in README #94

Closed

jaclyn-taroni reviewed Aug 30, 2019

View reviewed changes

edited file.path commands

7b6a1f2

kgaonkar6 added 3 commits August 30, 2019 13:15

edited path to access files for CI

834ea09

added reference datasheets to analysis folder

f07c076

edited path for reference files

f89849d

jaclyn-taroni reviewed Sep 10, 2019

View reviewed changes

jharenza reviewed Sep 10, 2019

View reviewed changes

kgaonkar6 added 4 commits September 10, 2019 16:18

standardize fusion calls

4b2d5d5

updated path

6845b0a

add optparse package in Dockerfile

f430e19

add optparse package in Dockerfile

f65f4c8

kgaonkar6 requested review from jaclyn-taroni and jharenza September 10, 2019 20:39

removing fusion-filtering as 1st step

2eb7ddf

jaclyn-taroni reviewed Sep 13, 2019

View reviewed changes

outputfile documentation, error handling, restructured

000b245

jaclyn-taroni reviewed Sep 15, 2019

View reviewed changes

dplyr updates and identation

e928582

jaclyn-taroni approved these changes Sep 16, 2019

View reviewed changes

Merge branch 'master' into fusion_filtering

eaaeff0

jaclyn-taroni merged commit 160c8f9 into AlexsLemonade:master Sep 16, 2019

jaclyn-taroni mentioned this pull request Sep 20, 2019

Planned data release: V5 #121

Closed

yuankunzhu mentioned this pull request Sep 25, 2019

Data release/add v5 release #127

Merged

jaclyn-taroni mentioned this pull request Jan 26, 2020

Update fusion-summary to include union of biospecimen IDs in fusion callers #478

Merged

5 tasks

kgaonkar6 deleted the fusion_filtering branch December 8, 2020 23:02

	clin<-read.delim(paste0(file.path("../../data/pbta-histologies.tsv")),stringsAsFactors = F,sep="\t")
	clin<-read.delim("../../data/pbta-histologies.tsv",stringsAsFactors = F,sep="\t")

	clin<-read.delim(paste0(file.path("../../data/pbta-histologies.tsv")),stringsAsFactors = F,sep="\t")
	clin<-read.delim(file.path("..", "..", "data", "pbta-histologies.tsv"),stringsAsFactors = F,sep="\t")


		opt <- parse_args(OptionParser(option_list=option_list))

		inputfile<-opt$fusionfile

(PR 1 of 5) Fusion Analysis: filtering #92

(PR 1 of 5) Fusion Analysis: filtering #92

Conversation

kgaonkar6 commented Aug 30, 2019 • edited Loading

Purpose/implementation

Issue

Directions for reviewers

Results

Docker and continuous integration

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 Aug 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jharenza commented Aug 30, 2019

kgaonkar6 commented Aug 30, 2019

cgreene commented Aug 30, 2019

kgaonkar6 commented Aug 30, 2019 • edited Loading

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 Sep 10, 2019 • edited Loading

Choose a reason for hiding this comment

jharenza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaclyn-taroni commented Sep 10, 2019

kgaonkar6 commented Sep 10, 2019

jaclyn-taroni commented Sep 10, 2019

kgaonkar6 commented Sep 10, 2019

jaclyn-taroni commented Sep 11, 2019

jharenza commented Sep 11, 2019

jharenza commented Sep 11, 2019

jaclyn-taroni commented Sep 11, 2019

kgaonkar6 commented Sep 12, 2019 • edited Loading

jaclyn-taroni commented Sep 12, 2019

kgaonkar6 commented Sep 12, 2019

jharenza commented Sep 12, 2019

jaclyn-taroni commented Sep 12, 2019

kgaonkar6 commented Sep 12, 2019

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 Sep 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 Sep 15, 2019 • edited Loading

Choose a reason for hiding this comment

jaclyn-taroni left a comment

Choose a reason for hiding this comment

kgaonkar6 commented Sep 16, 2019

kgaonkar6 commented Sep 16, 2019

jaclyn-taroni commented Sep 16, 2019

kgaonkar6 commented Sep 16, 2019

jaclyn-taroni commented Sep 16, 2019

jaclyn-taroni left a comment

Choose a reason for hiding this comment

kgaonkar6 commented Sep 17, 2019

kgaonkar6 commented Aug 30, 2019 •

edited

Loading

kgaonkar6 Aug 30, 2019 •

edited

Loading

kgaonkar6 commented Aug 30, 2019 •

edited

Loading

kgaonkar6 Sep 10, 2019 •

edited

Loading

kgaonkar6 commented Sep 12, 2019 •

edited

Loading

kgaonkar6 Sep 13, 2019 •

edited

Loading

kgaonkar6 Sep 15, 2019 •

edited

Loading