Addition of `01-transcriptomic-analysis-prep.Rmd` #83

cbethell · 2019-08-27T15:02:40Z

Purpose/implementation

The purpose of this draft PR is to conduct an unsupervised transcriptomic analysis using three dimensionality reduction techniques. The techniques used include PCA, t-SNE, and UMAP.

You can find the html output of this notebook here.

Issue

This draft PR addresses issue #9.

Directions for reviewers

Given the first set of visualizations produced, it was suggested that I backtrack using the Kids_First_Participant_ID and experimental_strategy information to find the sequencing strategies used for each observation. The unique strategies were found to be WGS, WXS, Panel, and RNA-Seq (which is renamed and treated as Unknown).

My attempt to correct for batch effects did not produce significant results according to the visualizations. Should I remove this attempt and open a separate issue for this purpose?
I have made this a draft PR because I believe the code can be further simplified. Please feel free to provide suggestions you believe may be useful on this matter.
Are there any additional steps I should add to this analysis?

Results

Visualizations of the reduction scores for each of the dimension reduction techniques executed in this notebook are available on the html output and in the directory analyses/unsupervised-trancriptomic-analysis/plots
The .PDF files in the plots directory include visualizations before and after the attempt to correct for batch effects. Those beginning with meta_grid represent the data before batch correction attempt, while those beginning with no_batch_grid represent the data after the batch correction attempt.

Docker and continuous integration

Check all those that apply or remove this section if it is not applicable.

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Pull Request Check List:

Run a linter (not done as yet)
Set the seed (if applicable)
Comments and/or documentation up to date
Double checked paths
Spell check any Rmd file or md file
Restart R and run all notebooks fresh and save
Connect the pertinent issues on ZenHub
Updated Dockerfile and built Docker image successfully

jaclyn-taroni · 2019-08-27T15:20:16Z

My attempt to correct for batch effects did not produce significant results according to the visualizations. Should I remove this attempt and open a separate issue for this purpose?

Yes, I would remove any attempts at batch correction from this notebook. The notebook is 600 lines long in its current state which will make review to find ways to simplify the code more difficult.

jharenza · 2019-08-27T15:35:04Z

Hi @cbethell! I think this is a great start and one thing we have been struggling with is that some of the RNA-Seq was a poly-A prep while the majority was stranded. I am not sure anyone has yet to figure out how to normalize across these two batches (eg I think UCSC Treehouse folks are also very interested in this). In some of our discussions, it has been suggested to subset the dataset for only those genes which will be captured in a polyA prep, and see if that helps, but you will lose a lot of data in that way.

jaclyn-taroni · 2019-08-27T15:56:10Z

One thing we have been struggling with is that some of the RNA-Seq was a poly-A prep while the majority was stranded.

I don't believe this is currently noted in the histologies file, is that correct @jharenza? We came up with an idea, reflected in this PR, that the other experimental strategies (e.g. WXS, WGS) linked to a participant ID would tell us about RNA-seq prep based on this section of the manuscript: https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#data-generation This may be error-prone.

Can we add the prep information to the histologies file since we expect it will be important for folks analyzing the gene expression data?

cbethell · 2019-08-27T15:58:02Z

Hi @cbethell! I think this is a great start and one thing we have been struggling with is that some of the RNA-Seq was a poly-A prep while the majority was stranded. I am not sure anyone has yet to figure out how to normalize across these two batches (eg I think UCSC Treehouse folks are also very interested in this). In some of our discussions, it has been suggested to subset the dataset for only those genes which will be captured in a polyA prep, and see if that helps, but you will lose a lot of data in that way.

Hi @jharenza, this is good to know! I do hope we can find a more conservative method of solving this issue.

jharenza · 2019-08-27T16:02:10Z

@jaclyn-taroni agreed - I will add this in a V3 of the file, along with survival and sex prediction

jaclyn-taroni · 2019-08-27T16:02:39Z

Nice, thank you!

cgreene · 2019-08-27T16:46:37Z

ribo-deplete vs poly-a normalization came up on twitter this morning too: https://twitter.com/GreeneScientist/status/1166297770833121280?s=20

jharenza · 2019-08-27T18:56:05Z

@jaclyn-taroni I created a V3 clinical file to include predicted sex, overall survival, and RNA library type, which you can access immediately here if it will help with this analysis - we will add this to a V3 release, which may come sooner than extra variant calls cc: @yuankunzhu

- removed the attempt at batch correction from `01-transcriptomic-analysis.Rmd` and its html output - removed the installation of `limma` package from `Dockerfile` as the need for it was removed

jaclyn-taroni

Nice start @cbethell! Per your directions for reviewers, I've included comments about how I think things could be simplified/DRY'd up. I've also sketched out how you might change up the arrangement of the plots a bit, which you don't need to accomplish as part of this pull request but may find it fun and interesting to mess around with at this point.

jaclyn-taroni · 2019-08-28T10:46:06Z

.circleci/config.yml

+
+      - run:
+          name: Unsupervised Transcriptomic Analysis
+          command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd', clean = TRUE)"


Can you give this notebook, and possibly the analysis folder, a more descriptive name please? Consider that unsupervised analysis is quite broad and you may want to call this transcriptomic-dimension-reduction. I would say if you expect to try unsupervised methods in this module (e.g., hierarchical clustering) then the analysis folder name itself is probably a-okay.

jaclyn-taroni · 2019-08-28T10:49:29Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+# magrittr pipe
+`%>%` <- dplyr::`%>%`
+
+reduction_fn <- function(name, id) {


Given #83 (comment), I would take this out for now. I'll also note that reduction_fn is not particularly descriptive and without reading the code, I expected for this to run the three dimension reduction techniques. In general, I like for functions to begin with a verb (plot_tsne_scatterplot) to be consistent with the style guide used at the CCDL.

jaclyn-taroni · 2019-08-28T10:52:16Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+# Detect the ".git" folder -- this will in the project root directory.
+# Use this as the root directory to ensure proper execution, no matter where
+# it is called from.
+root_dir <- rprojroot::find_root(rprojroot::has_dir(".git"))


Because this is an R notebook and not a script, the working directory will be wherever the file is located. So this becomes less important and you could safely use the symlinked files with ../data. You don't have to change this, more of an FYI.

I actually made this change in the updated version I have yet to commit.

I didn't note it, but that means you would change the output directory stuff. You may have already caught this, but it's one of those things that I sometimes have trouble spotting during code review (e.g., the paths for plots would not be super obvious to me in the GH interface) and I don't think it'd be caught via CI.

gotcha 👍

jaclyn-taroni · 2019-08-28T10:52:54Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+}
+```
+
+Read the metadata into a data.frame named `df2`.


Why df2 and not something more like metadata_df or histologies_df?

Good question, I'll make the appropriate change.

jaclyn-taroni · 2019-08-28T10:54:27Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+respectively.
+```{r}
+# Read in dataset
+df2 <- data.frame(readr::read_tsv(file.path(root_dir, "data", "pbta-histologies.tsv")))


Personal preference here -- I would set data_dir <- file.path(root_dir, "data") above.

jaclyn-taroni · 2019-08-28T11:05:16Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+rsem_pca_plot <- ggplot2::ggplot(
+  rsem_pca_data,
+  ggplot2::aes(
+    x = rsem_pca_data[, 1],


This process, where you specify a data.frame that contains the score information and use the first two columns as the x and y variables for your scatterplot, happens quite a bit. It seems that so long as you set up the data.frame correctly and in the same manner, you could supply that data.frame to a function that does your plotting.

Also, you should consider making the font size a bit bigger with

ggplot2::theme(text = ggplot2::element_text(size = 15))

jaclyn-taroni · 2019-08-28T11:24:44Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+  ggplot2::aes(
+    x = rsem_pca_data[, 1],
+    y = rsem_pca_data[, 2],
+    color = method


Because this color argument the only thing that changes between the plot above and what you have here, you can add this as one of the arguments to your function (mentioned above). This is probably going to be tricky because the other arguments for x and y won't be strings so it would not be as easy as using aes_string() instead of aes(). Maybe it would look like, assuming the argument to the function you write is point_color = "method"...

Step 1 Add this to the beginning of the plotting function:

color_sym <- rlang::sym(point_color)

Step 2 the aes call would look something like:

ggplot2::aes( x = score_df[, 1], y = score_df[, 2], color = !!color_sym )

Untested but here's my reference: https://community.rstudio.com/t/how-to-turn-strings-from-function-arguments-into-column-names-with-dplyr/19438

jaclyn-taroni · 2019-08-28T11:27:40Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+# Make a data.frame with PCA scores
+rsem_pca_data <- data.frame(rsem_pca$x[, 1:2])
+# Run the reduction_fn which aligns metadata and prepares data.frame for ggplot
+rsem_pca_data <- reduction_fn(rsem_pca_data, rsem_ID)


If you remove this function as suggested, you can probably replace it with joining whatever 'scores' data.frame you have with the histologies data.frame which contain what you need to color the points. I think this would be the procedure in every case (i.e., RSEM PCA, kallisto UMAP).

jaclyn-taroni · 2019-08-28T11:30:31Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+It addresses [issue #9](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/9)
+in the Open-PBTA analysis repository. 
+
+## Output files:


I assume some of these came out with 0d74cc5?

That is correct, I will make the appropriate changes here.

jaclyn-taroni · 2019-08-28T11:54:19Z

analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd

+(poly-A versus stranded).
+
+```{r, fig.width = 17, fig.height = 14}
+# Plot grid with RSEM data colored by method vs tumor type


I have an idea about these figures could be laid out so you would be able to letter the panels:

I would use cowplot for this, which very well may be on the Docker image already as a dependency with the addition of colorblindr here, and I think the way to build this with cowplot is to follow the three steps I laid out here. I haven't worked with cowplot v1.0.0 yet, though, IIRC. Because of the MRAN snapshot constraint noted here in the project README, double check what version, if any, is installed on the image.

You might find the arranging plots in a grid and shared legends section of the cowplot references helpful.

This is not a must have, but it is an opportunity to play around with this kind of thing which will almost certainly pop up again in your work and maybe even in this project!

I like this idea!

jaclyn-taroni · 2019-08-28T12:15:54Z

Oh, forgot to note, you may want to try filtering out low count genes or genes missing from a large proportion of samples before dimension reduction and plotting.

jaclyn-taroni · 2019-08-28T12:29:33Z

I also to forgot to mention that you’ll need to set a seed to make this reproducible.

cbethell · 2019-08-28T13:17:40Z

I created a V3 clinical file to include predicted sex, overall survival, and RNA library type, which you can access immediately here if it will help with this analysis

The link in the comment above requires that I request permission to access. @jharenza would you be able to change the permissions to this clinical file?

jharenza · 2019-08-29T16:05:55Z

I created a V3 clinical file to include predicted sex, overall survival, and RNA library type, which you can access immediately here if it will help with this analysis

The link in the comment above requires that I request permission to access. @jharenza would you be able to change the permissions to this clinical file?

Changed permissions, and we just updated that file in the V3 release: #88 !

cbethell · 2019-08-29T16:07:21Z

Changed permissions, and we just updated that file in the V3 release: #88 !

Great, thank you!

- changed analysis directory name and name of `.circleci` run to “transcriptomic-dimension-reduction” - changed the data_dir file path - renamed `df2` `metadata_df` - created a function to specify the scores data.frame and use the first 2 columns as x and y for a ggplot - removed output files that are no longer produced from documentation - `cowplot` is used to arrange plots in a grid (I am still playing around with this to achieve the best possible layout for the plots and legend) - filtered out the low count genes (I am also still looking into whether or not the method used is the best) - set a seed - replaced `reduction_fn` with `align_metadata` (this will also be changed once

jaclyn-taroni

Nice job on 'functionalizing' the plotting! Having looked at the entire thing again with your changes, I have some thoughts on how I would organize this and what functions I would write. These may conflict slightly with the inline comments because I made those before I put it all together.

You essentially have 3 steps that you repeat for gene expression matrix (e.g., kallisto, RSEM) method (e.g., PCA, t-SNE) pairs: the dimension reduction itself that yields a data.frame that contains scores you will plot, joining the scores with the metadata, plotting the first two dimensions and the variable you use to color the points changes

So I propose that you have functions for each of those three steps that are then wrapped up in a function that puts it all together:

plot_dimension_reduction: make suggested tweaks to font size and add arguments for the x-axis and y-axis labels and title

perform_dimension_reduction

Arguments:
- transposed expression matrix
- method (e.g., PCA, tSNE, UMAP)
- seed with some default value
Returns:
- scores_df with a column that includes sample identifiers
Steps:
- sets seed using seed argument
- uses logic to get the correct scores_df based on the methods argument

align_metadata

Arguments:
- metadata_df
- scores_df
Returns:
- joined data.frame with both
Steps:
- inner_join

dimension_reduction_wrapper

Brings all of this together ☝️ , so we need to account for the individual arguments from these functions

Arguments:
- transposed expression matrix
- dimension reduction method
- seed with some default value
- metadata_df
Returns:
- list that contains n number of plots (however many different ways you end up coloring the scatter plot points
Steps:
- perform_dimension_reduction
- align_metadata using the scores_df from perform_dimension_reduction
- plot_dimension_reduction using align_metadata output n times, changing the point_color argument each time - you should be able to tell from the dimension reduction method argument how to label the x-axis, y-axis and title (for PCA I would expect those to be PC1, PC2, PCA, respectively)

What I'm proposing has a weakness in that it doesn't save any of the models (e.g., prcomp output) to file which may be useful. You could put that in perform_dimension_reduction and make that take a file name as an argument -> account for that in dimension_reduction_wrapper. There may also be a better way of breaking out the plotting from the wrapper function, but let's try starting with this.

jaclyn-taroni · 2019-08-30T11:19:47Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+  # Align the metadata with the `Biospecimen_ID` variable in the data.frame
+  metadata <- metadata_df %>%
+    dplyr::filter(Kids_First_Biospecimen_ID %in% name$Biospecimen_ID) %>%
+    dplyr::filter(!duplicated(Kids_First_Biospecimen_ID))


How often does this duplication happen?

Just once instance of this duplication in the RSEM and Kallisto data

Ah, believe that's tracked here #95

jaclyn-taroni · 2019-08-30T11:20:43Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+  # metadata's `Kids_First_Biospecimen_ID`
+  #
+  # Args:
+  #   name: Name of the data.frame containing dimension reduction scores


scores_df would be a much more descriptive variable name. I could not intuit what name was without reading your comments.

jaclyn-taroni · 2019-08-30T11:22:16Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+  #
+  # Args:
+  #   name: Name of the data.frame containing dimension reduction scores
+  #   id: Vector containing Biospecimen IDs extracted from the transposed


This one is also difficult to intuit.

jaclyn-taroni · 2019-08-30T11:27:23Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+    dplyr::filter(!duplicated(Kids_First_Biospecimen_ID))
+  # Filter the data.frame to contain only the `Biospecimen_ID` values common
+  # to the metadata
+  name <- name %>%


I believe you should be able to do a dplyr::inner_join somewhere in this function instead. It will do the filtering for you.

jaclyn-taroni · 2019-08-30T11:28:44Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+  name <- name %>%
+    dplyr::filter(Biospecimen_ID %in% metadata$Kids_First_Biospecimen_ID)
+  # Assign Kids_First_Participant_ID to column `type` of the data.frame
+  name$Kids_First_Participant_ID <-


Why does this need to be a factor?

It does not need to be, good point

jaclyn-taroni · 2019-08-30T11:46:57Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+exp_kallisto <- exp_kallisto[, -1] %>%
+  dplyr::filter(!duplicated(gene_id)) %>%
+  tibble::column_to_rownames("gene_id") %>%
+  na.omit()


#83 (comment)

Ah, yes it is, I'll use that function in both cases.

You don't have to, I just was making sure that I understood the use of na.omit() here.

Understood. However, I do prefer the use of complete.cases(.) here.

jaclyn-taroni · 2019-08-30T11:49:58Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+  tibble::column_to_rownames("gene_id") %>%
+  na.omit()
+
+# Filter out low count genes


This is a constant that would need to be changed in multiple places, so you should assign it to something like

gene_count_threshold <- 100

and then

kallisto_genes_to_keep <- rowSums(exp_kallisto) >= gene_count_threshold

jaclyn-taroni · 2019-08-30T11:57:34Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+                                                color = !!color_sym
+                                              )) +
+    ggplot2::geom_point() +
+    ggplot2::theme(legend.position = "none")


Can you make the font size bigger please? #83 (comment)

jaclyn-taroni · 2019-08-30T12:03:14Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+
+```{r}
+# Run PCA on kallisto
+kallisto_pca <- prcomp(transposed_kallisto_data)


You're repeating a lot of the same steps. Consider a function where one of the arguments is which dimension reduction method you want used and then you use logic to do the first two steps which are run the dimension reduction and set up the scores data.frame (also takes a transposed expression matrix as an argument)

if (method == "PCA" ){ prcomp_results <- prcomp(transposed_expression) scores_df <- data.frame(prcomp_results$x[, 1:2]) } else if { ...

And then the other steps that you repeat after setting up a score data.frame, which are the same, are 1) joining with metadata 2) plot with disease_type_new 3) plot with experimental_strategy

I would also have this function take seed as an argument with some default (could be 2019) and use set.seed(seed) within the function. You could then potentially use the function to do something like run t-SNE multiple times with the same seed to see how the plot changes.

Gotcha, made this change.

jaclyn-taroni · 2019-08-30T12:24:24Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis.Rmd

+# magrittr pipe
+`%>%` <- dplyr::`%>%`
+
+align_metadata <- function(name, id) {


Here's what I think this function is trying to accomplish:

Joining the relevant columns from the metadata to a data.frame that contains the scores

Changing certain columns to be factors

I don't think you need to only select the relevant columns -- you might decide that you want to color points using other columns in the future.

I think this should take the scores data.frame and the metadata data.frame as arguments, possibly with an additional argument that tells you what column the relevant sample identifiers are in. (Don't worry about that last part too much, let's try changing this incrementally.)

So I think you essentially want to do:

scores_meta_df <- scores_df %>% # using the by argument this way allows us to join two variables with different names together dplyr::inner_join(metadata_df, by = c("<sample identifier column name in scores_df>" = "<sample identifier column name in metadata_df>"))

Where you make sure that scores_df is set up to have sample identifiers in it as a column before you pass it to this function.

The sample identifiers I'm assuming would be biospecimen ids. The use of inner_join means it would drop out anything that didn't have a sample identifier that matched in both data.frames you are joining, i.e., all of the WXS, WGS biospecimen ids + their metadata should get dropped. (You won't need this information anymore because the selection strategy should be in the updated v3 files.)

This joined metadata, scores_meta_df, is what you want for plotting. I believe that passing a column that is a character and not a factor (provided you don't need to do any reordering of levels) shouldn't give you any trouble with your scatter plots.

- split the notebook into 2 separate notebooks (one for data prep and the other for plotting) - made the parallel changes to `.circleci/config.yml` - used `dply::inner_join` to merge the dimension reduction scores with the metadata - added tsv files of the dimension reduction score data.frames - Created the suggested functions, including a wrapper function to execute all of the functions - saved aligned dimension reduction score data.frames as `RDS` files to be read into the second notebook

jaclyn-taroni · 2019-09-10T21:21:04Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis-prep.Rmd

+- `analysis/transcriptomic-dimension-reduction/results/kallisto_tsne_model.RDS`
+- `analysis/transcriptomic-dimension-reduction/results/kallisto_umap_model.RDS`
+
+_Note that the tsv files holding the dimension reduction scores represent the scores for the RSEM (denoted by the output tsv files beginning with 15.17) and Kallisto expression data (denoted by the output tsv files beginning with 109.299)._


Why is naming the files this way rather than using rsem or kallisto in the filenames preferable?

The naming is a result of the as.name function. This was meant to be a temporary fix, until I find a better method of using the transposed_expression_matrix argument for naming.

…into transcriptomic-analysis

jaclyn-taroni · 2019-09-10T21:22:29Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis-prep.Rmd

+  umap_results <- umap::umap(transposed_expression_matrix)
+  dimension_reduction_df <- data.frame(umap_results$layout)
+  } else {
+    print ("Error")


You can use stop() here.

jaclyn-taroni · 2019-09-11T13:15:22Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis-prep.Rmd

+  ID <- rownames(transposed_expression_matrix)
+  # Perform dimension reduction 
+  if (method == "PCA" ){
+  prcomp_results <- prcomp(transposed_expression_matrix)


Indentation here and throughout this function.

jaclyn-taroni

Recording the outcomes/discussion from pair programming

jaclyn-taroni · 2019-09-11T13:17:41Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis-prep.Rmd

+  #                              
+
+  # Run `perform_dimension_reduction` function 
+  dimension_reduction_df <- perform_dimension_reduction(transposed_expression_matrix, method, seed = 2019)


Suggested change

dimension_reduction_df <- perform_dimension_reduction(transposed_expression_matrix, method, seed = 2019)

dimension_reduction_df <- perform_dimension_reduction(transposed_expression_matrix, method, seed = seed)

jaclyn-taroni · 2019-09-11T13:20:50Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis-prep.Rmd

+# magrittr pipe
+`%>%` <- dplyr::`%>%`
+
+perform_dimension_reduction <- function(transposed_expression_matrix, method, seed = 2019) {


Accept an argument that is the file name for the model RDS

jaclyn-taroni · 2019-09-11T13:25:00Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis-prep.Rmd

+  # Assign the relevant rownames(Kids_First_Biospecimen_ID) to a column
+  dimension_reduction_df$Kids_First_Biospecimen_ID <- ID
+  # Write the resulting data.frame to a tsv file
+  readr::write_tsv(dimension_reduction_df, 


I would expect the file written by this function would be the entire model as an RDS object. I would also not use the results_dir from the global environment but instead pass the filename (which contains the full path information) to the function.

jaclyn-taroni · 2019-09-11T13:25:45Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis-prep.Rmd

+    dplyr::inner_join(metadata_df, by = c("Kids_First_Biospecimen_ID")) 
+
+  # Write the resulting metadata aligned data.frame to a RDS file
+  readr::write_rds(aligned_scores_df, 


I would expect this to be a TSV file so that folks that are not R users could plot, say, PC1 vs. a variable of their interest.

jaclyn-taroni · 2019-09-11T13:31:04Z

analyses/transcriptomic-dimension-reduction/01-transcriptomic-analysis-prep.Rmd

+                   file.path(results_dir, file = paste(matrix_name, method, "scores.tsv", sep = "_")))
+
+  return(dimension_reduction_df)
+}


New line here

-Renamed the output files to better represent their content - Replaced `print("error")` with `stop` function - Fixed indentation - Included `filename` as an argument in the functions executed in this nb - edited the `.circleci/config.yml` to reflect only the file on this branch

- transformed the .Rmd file to R script in order to pass command line argument to `perform_dimension_reduction` function - made the appropriate change to `.circleci/config.yml` - changed file paths using `rprojroot` function - added `optparse` to Dockerfile (also needed to pass command line argument to function) - renamed the output files to better represent their content

…anscriptomic-analysis

- also fixed a type and some spacing

jaclyn-taroni

👍 excellent job on the progression of this PR @cbethell !

Addition of unsupervised-transcriptomic-analysis

f8d6424

cbethell requested a review from jaclyn-taroni August 27, 2019 15:02

Removed attempt at batch correction

0d74cc5

- removed the attempt at batch correction from `01-transcriptomic-analysis.Rmd` and its html output - removed the installation of `limma` package from `Dockerfile` as the need for it was removed

jaclyn-taroni reviewed Aug 28, 2019

View reviewed changes

Resolve merge conflicts in .circleci/config.yml and Dockerfile

2f873b3

cbethell and others added 2 commits August 29, 2019 12:22

Merge branch 'master' into transcriptomic-analysis

1557541

jaclyn-taroni reviewed Aug 30, 2019

View reviewed changes

cbethell changed the title ~~In Progress: Addition of unsupervised-transcriptomic-analysis~~ In Progress: Addition of 01-transcriptomic-analysis-prep.Rmd Sep 9, 2019

cbethell mentioned this pull request Sep 9, 2019

Addition of 02-transcriptomic-analysis-plotting.Rmd #100

Merged

10 tasks

jaclyn-taroni reviewed Sep 10, 2019

View reviewed changes

Merge branch 'master' of https://github.com/cbethell/OpenPBTA-analysis …

257f219

…into transcriptomic-analysis

jaclyn-taroni reviewed Sep 11, 2019

View reviewed changes

cbethell and others added 10 commits September 11, 2019 16:09

Merge branch 'master' into transcriptomic-analysis

ead625e

Merge remote-tracking branch 'origin/transcriptomic-analysis' into tr…

d1c6bd9

…anscriptomic-analysis

Bump down perplexity for CI

3c42afb

Changed short flag from pp to p

1d97a94

Added neighbors flag for UMAP n_neigbors parameter

d809d2c

updated output_dir to point to results directory

5169762

Updated results directory and removed models directory

9d02c75

- also fixed a type and some spacing

Reorganize, style updates

4663010

Changed perplexity in .circleci and comment line spacing in R script

7c14360

jaclyn-taroni marked this pull request as ready for review September 12, 2019 14:17

jaclyn-taroni changed the title ~~In Progress: Addition of 01-transcriptomic-analysis-prep.Rmd~~ Addition of 01-transcriptomic-analysis-prep.Rmd Sep 12, 2019

cbethell added 2 commits September 12, 2019 11:03

Removal of large dimension reduction model files

ade0c50

Addition of .gitignore to ignore large model output files

645551f

jaclyn-taroni approved these changes Sep 12, 2019

View reviewed changes

jaclyn-taroni merged commit b8bbb04 into AlexsLemonade:master Sep 12, 2019

This was referenced Sep 16, 2019

Proposed Analysis: poly-A vs stranded #109

Closed

Proposed Analysis: poly-A vs stranded/PNOC vs. CBTTC #110

Closed

cbethell deleted the transcriptomic-analysis branch December 18, 2019 19:11

	dimension_reduction_df <- perform_dimension_reduction(transposed_expression_matrix, method, seed = 2019)
	dimension_reduction_df <- perform_dimension_reduction(transposed_expression_matrix, method, seed = seed)

Addition of 01-transcriptomic-analysis-prep.Rmd #83

Addition of 01-transcriptomic-analysis-prep.Rmd #83

Conversation

cbethell commented Aug 27, 2019 • edited Loading

Purpose/implementation

Issue

Directions for reviewers

Results

Docker and continuous integration

Pull Request Check List:

jaclyn-taroni commented Aug 27, 2019

jharenza commented Aug 27, 2019

jaclyn-taroni commented Aug 27, 2019

cbethell commented Aug 27, 2019

jharenza commented Aug 27, 2019

jaclyn-taroni commented Aug 27, 2019

cgreene commented Aug 27, 2019 • edited Loading

jharenza commented Aug 27, 2019

jaclyn-taroni left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaclyn-taroni commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019

cbethell commented Aug 28, 2019

jharenza commented Aug 29, 2019

cbethell commented Aug 29, 2019

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaclyn-taroni Sep 11, 2019 • edited Loading

Choose a reason for hiding this comment

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Addition of `01-transcriptomic-analysis-prep.Rmd` #83

Addition of `01-transcriptomic-analysis-prep.Rmd` #83

cbethell commented Aug 27, 2019 •

edited

Loading

cgreene commented Aug 27, 2019 •

edited

Loading

jaclyn-taroni left a comment •

edited

Loading

jaclyn-taroni Sep 11, 2019 •

edited

Loading