-
Notifications
You must be signed in to change notification settings - Fork 67
Addition of 01-transcriptomic-analysis-prep.Rmd
#83
Addition of 01-transcriptomic-analysis-prep.Rmd
#83
Conversation
Yes, I would remove any attempts at batch correction from this notebook. The notebook is 600 lines long in its current state which will make review to find ways to simplify the code more difficult. |
Hi @cbethell! I think this is a great start and one thing we have been struggling with is that some of the RNA-Seq was a poly-A prep while the majority was stranded. I am not sure anyone has yet to figure out how to normalize across these two batches (eg I think UCSC Treehouse folks are also very interested in this). In some of our discussions, it has been suggested to subset the dataset for only those genes which will be captured in a polyA prep, and see if that helps, but you will lose a lot of data in that way. |
I don't believe this is currently noted in the histologies file, is that correct @jharenza? We came up with an idea, reflected in this PR, that the other experimental strategies (e.g. WXS, WGS) linked to a participant ID would tell us about RNA-seq prep based on this section of the manuscript: https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#data-generation This may be error-prone. Can we add the prep information to the histologies file since we expect it will be important for folks analyzing the gene expression data? |
Hi @jharenza, this is good to know! I do hope we can find a more conservative method of solving this issue. |
@jaclyn-taroni agreed - I will add this in a V3 of the file, along with survival and sex prediction |
Nice, thank you! |
ribo-deplete vs poly-a normalization came up on twitter this morning too: https://twitter.com/GreeneScientist/status/1166297770833121280?s=20 |
@jaclyn-taroni I created a V3 clinical file to include predicted sex, overall survival, and RNA library type, which you can access immediately here if it will help with this analysis - we will add this to a V3 release, which may come sooner than extra variant calls cc: @yuankunzhu |
- removed the attempt at batch correction from `01-transcriptomic-analysis.Rmd` and its html output - removed the installation of `limma` package from `Dockerfile` as the need for it was removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start @cbethell! Per your directions for reviewers, I've included comments about how I think things could be simplified/DRY'd up. I've also sketched out how you might change up the arrangement of the plots a bit, which you don't need to accomplish as part of this pull request but may find it fun and interesting to mess around with at this point.
.circleci/config.yml
Outdated
|
||
- run: | ||
name: Unsupervised Transcriptomic Analysis | ||
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/unsupervised-transcriptomic-analysis/01-transcriptomic-analysis.Rmd', clean = TRUE)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give this notebook, and possibly the analysis folder, a more descriptive name please? Consider that unsupervised analysis is quite broad and you may want to call this transcriptomic-dimension-reduction
. I would say if you expect to try unsupervised methods in this module (e.g., hierarchical clustering) then the analysis folder name itself is probably a-okay.
# magrittr pipe | ||
`%>%` <- dplyr::`%>%` | ||
|
||
reduction_fn <- function(name, id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given #83 (comment), I would take this out for now. I'll also note that reduction_fn
is not particularly descriptive and without reading the code, I expected for this to run the three dimension reduction techniques. In general, I like for functions to begin with a verb (plot_tsne_scatterplot
) to be consistent with the style guide used at the CCDL.
# Detect the ".git" folder -- this will in the project root directory. | ||
# Use this as the root directory to ensure proper execution, no matter where | ||
# it is called from. | ||
root_dir <- rprojroot::find_root(rprojroot::has_dir(".git")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is an R notebook and not a script, the working directory will be wherever the file is located. So this becomes less important and you could safely use the symlinked files with ../data
. You don't have to change this, more of an FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually made this change in the updated version I have yet to commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't note it, but that means you would change the output directory stuff. You may have already caught this, but it's one of those things that I sometimes have trouble spotting during code review (e.g., the paths for plots would not be super obvious to me in the GH interface) and I don't think it'd be caught via CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha 👍
} | ||
``` | ||
|
||
Read the metadata into a data.frame named `df2`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why df2
and not something more like metadata_df
or histologies_df
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I'll make the appropriate change.
respectively. | ||
```{r} | ||
# Read in dataset | ||
df2 <- data.frame(readr::read_tsv(file.path(root_dir, "data", "pbta-histologies.tsv"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference here -- I would set data_dir <- file.path(root_dir, "data")
above.
rsem_pca_plot <- ggplot2::ggplot( | ||
rsem_pca_data, | ||
ggplot2::aes( | ||
x = rsem_pca_data[, 1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This process, where you specify a data.frame that contains the score information and use the first two columns as the x and y variables for your scatterplot, happens quite a bit. It seems that so long as you set up the data.frame correctly and in the same manner, you could supply that data.frame to a function that does your plotting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you should consider making the font size a bit bigger with
ggplot2::theme(text = ggplot2::element_text(size = 15))
ggplot2::aes( | ||
x = rsem_pca_data[, 1], | ||
y = rsem_pca_data[, 2], | ||
color = method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this color argument the only thing that changes between the plot above and what you have here, you can add this as one of the arguments to your function (mentioned above). This is probably going to be tricky because the other arguments for x
and y
won't be strings so it would not be as easy as using aes_string()
instead of aes()
. Maybe it would look like, assuming the argument to the function you write is point_color = "method"
...
Step 1 Add this to the beginning of the plotting function:
color_sym <- rlang::sym(point_color)
Step 2 the aes
call would look something like:
ggplot2::aes(
x = score_df[, 1],
y = score_df[, 2],
color = !!color_sym
)
Untested but here's my reference: https://community.rstudio.com/t/how-to-turn-strings-from-function-arguments-into-column-names-with-dplyr/19438
# Make a data.frame with PCA scores | ||
rsem_pca_data <- data.frame(rsem_pca$x[, 1:2]) | ||
# Run the reduction_fn which aligns metadata and prepares data.frame for ggplot | ||
rsem_pca_data <- reduction_fn(rsem_pca_data, rsem_ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove this function as suggested, you can probably replace it with joining whatever 'scores' data.frame you have with the histologies data.frame which contain what you need to color the points. I think this would be the procedure in every case (i.e., RSEM PCA, kallisto UMAP).
It addresses [issue #9](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/9) | ||
in the Open-PBTA analysis repository. | ||
|
||
## Output files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume some of these came out with 0d74cc5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, I will make the appropriate changes here.
(poly-A versus stranded). | ||
|
||
```{r, fig.width = 17, fig.height = 14} | ||
# Plot grid with RSEM data colored by method vs tumor type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea about these figures could be laid out so you would be able to letter the panels:
I would use cowplot
for this, which very well may be on the Docker image already as a dependency with the addition of colorblindr
here, and I think the way to build this with cowplot
is to follow the three steps I laid out here. I haven't worked with cowplot
v1.0.0
yet, though, IIRC. Because of the MRAN snapshot constraint noted here in the project README, double check what version, if any, is installed on the image.
You might find the arranging plots in a grid and shared legends section of the cowplot
references helpful.
This is not a must have, but it is an opportunity to play around with this kind of thing which will almost certainly pop up again in your work and maybe even in this project!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea!
Oh, forgot to note, you may want to try filtering out low count genes or genes missing from a large proportion of samples before dimension reduction and plotting. |
I also to forgot to mention that you’ll need to set a seed to make this reproducible. |
The link in the comment above requires that I request permission to access. @jharenza would you be able to change the permissions to this clinical file? |
Changed permissions, and we just updated that file in the V3 release: #88 ! |
Great, thank you! |
- changed analysis directory name and name of `.circleci` run to “transcriptomic-dimension-reduction” - changed the data_dir file path - renamed `df2` `metadata_df` - created a function to specify the scores data.frame and use the first 2 columns as x and y for a ggplot - removed output files that are no longer produced from documentation - `cowplot` is used to arrange plots in a grid (I am still playing around with this to achieve the best possible layout for the plots and legend) - filtered out the low count genes (I am also still looking into whether or not the method used is the best) - set a seed - replaced `reduction_fn` with `align_metadata` (this will also be changed once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job on 'functionalizing' the plotting! Having looked at the entire thing again with your changes, I have some thoughts on how I would organize this and what functions I would write. These may conflict slightly with the inline comments because I made those before I put it all together.
You essentially have 3 steps that you repeat for gene expression matrix (e.g., kallisto, RSEM) method (e.g., PCA, t-SNE) pairs: the dimension reduction itself that yields a data.frame that contains scores you will plot, joining the scores with the metadata, plotting the first two dimensions and the variable you use to color the points changes
So I propose that you have functions for each of those three steps that are then wrapped up in a function that puts it all together:
plot_dimension_reduction
: make suggested tweaks to font size and add arguments for the x-axis and y-axis labels and title
perform_dimension_reduction
- Arguments:
- transposed expression matrix
- method (e.g., PCA, tSNE, UMAP)
- seed with some default value
- Returns:
- scores_df with a column that includes sample identifiers
- Steps:
- sets seed using seed argument
- uses logic to get the correct scores_df based on the methods argument
align_metadata
- Arguments:
- metadata_df
- scores_df
- Returns:
- joined data.frame with both
- Steps:
- inner_join
dimension_reduction_wrapper
Brings all of this together ☝️ , so we need to account for the individual arguments from these functions
- Arguments:
- transposed expression matrix
- dimension reduction method
- seed with some default value
- metadata_df
- Returns:
- list that contains n number of plots (however many different ways you end up coloring the scatter plot points
- Steps:
perform_dimension_reduction
align_metadata
using the scores_df fromperform_dimension_reduction
plot_dimension_reduction
usingalign_metadata
output n times, changing the point_color argument each time - you should be able to tell from the dimension reduction method argument how to label the x-axis, y-axis and title (for PCA I would expect those to be PC1, PC2, PCA, respectively)
What I'm proposing has a weakness in that it doesn't save any of the models (e.g., prcomp
output) to file which may be useful. You could put that in perform_dimension_reduction
and make that take a file name as an argument -> account for that in dimension_reduction_wrapper
. There may also be a better way of breaking out the plotting from the wrapper function, but let's try starting with this.
# Align the metadata with the `Biospecimen_ID` variable in the data.frame | ||
metadata <- metadata_df %>% | ||
dplyr::filter(Kids_First_Biospecimen_ID %in% name$Biospecimen_ID) %>% | ||
dplyr::filter(!duplicated(Kids_First_Biospecimen_ID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does this duplication happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just once instance of this duplication in the RSEM
and Kallisto
data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, believe that's tracked here #95
# metadata's `Kids_First_Biospecimen_ID` | ||
# | ||
# Args: | ||
# name: Name of the data.frame containing dimension reduction scores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scores_df
would be a much more descriptive variable name. I could not intuit what name
was without reading your comments.
# | ||
# Args: | ||
# name: Name of the data.frame containing dimension reduction scores | ||
# id: Vector containing Biospecimen IDs extracted from the transposed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is also difficult to intuit.
dplyr::filter(!duplicated(Kids_First_Biospecimen_ID)) | ||
# Filter the data.frame to contain only the `Biospecimen_ID` values common | ||
# to the metadata | ||
name <- name %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you should be able to do a dplyr::inner_join
somewhere in this function instead. It will do the filtering for you.
name <- name %>% | ||
dplyr::filter(Biospecimen_ID %in% metadata$Kids_First_Biospecimen_ID) | ||
# Assign Kids_First_Participant_ID to column `type` of the data.frame | ||
name$Kids_First_Participant_ID <- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a factor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not need to be, good point
exp_kallisto <- exp_kallisto[, -1] %>% | ||
dplyr::filter(!duplicated(gene_id)) %>% | ||
tibble::column_to_rownames("gene_id") %>% | ||
na.omit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes it is, I'll use that function in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to, I just was making sure that I understood the use of na.omit()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. However, I do prefer the use of complete.cases(.)
here.
tibble::column_to_rownames("gene_id") %>% | ||
na.omit() | ||
|
||
# Filter out low count genes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a constant that would need to be changed in multiple places, so you should assign it to something like
gene_count_threshold <- 100
and then
kallisto_genes_to_keep <- rowSums(exp_kallisto) >= gene_count_threshold
color = !!color_sym | ||
)) + | ||
ggplot2::geom_point() + | ||
ggplot2::theme(legend.position = "none") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the font size bigger please? #83 (comment)
|
||
```{r} | ||
# Run PCA on kallisto | ||
kallisto_pca <- prcomp(transposed_kallisto_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're repeating a lot of the same steps. Consider a function where one of the arguments is which dimension reduction method you want used and then you use logic to do the first two steps which are run the dimension reduction and set up the scores data.frame (also takes a transposed expression matrix as an argument)
if (method == "PCA" ){
prcomp_results <- prcomp(transposed_expression)
scores_df <- data.frame(prcomp_results$x[, 1:2])
} else if {
...
And then the other steps that you repeat after setting up a score data.frame, which are the same, are 1) joining with metadata 2) plot with disease_type_new
3) plot with experimental_strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also have this function take seed
as an argument with some default (could be 2019
) and use set.seed(seed)
within the function. You could then potentially use the function to do something like run t-SNE multiple times with the same seed to see how the plot changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, made this change.
# magrittr pipe | ||
`%>%` <- dplyr::`%>%` | ||
|
||
align_metadata <- function(name, id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I think this function is trying to accomplish:
- Joining the relevant columns from the metadata to a data.frame that contains the scores
- Changing certain columns to be factors
I don't think you need to only select the relevant columns -- you might decide that you want to color points using other columns in the future.
I think this should take the scores data.frame and the metadata data.frame as arguments, possibly with an additional argument that tells you what column the relevant sample identifiers are in. (Don't worry about that last part too much, let's try changing this incrementally.)
So I think you essentially want to do:
scores_meta_df <- scores_df %>%
# using the by argument this way allows us to join two variables with different names together
dplyr::inner_join(metadata_df, by = c("<sample identifier column name in scores_df>" = "<sample identifier column name in metadata_df>"))
Where you make sure that scores_df
is set up to have sample identifiers in it as a column before you pass it to this function.
The sample identifiers I'm assuming would be biospecimen ids. The use of inner_join
means it would drop out anything that didn't have a sample identifier that matched in both data.frames you are joining, i.e., all of the WXS, WGS biospecimen ids + their metadata should get dropped. (You won't need this information anymore because the selection strategy should be in the updated v3 files.)
This joined metadata, scores_meta_df
, is what you want for plotting. I believe that passing a column that is a character and not a factor (provided you don't need to do any reordering of levels) shouldn't give you any trouble with your scatter plots.
- split the notebook into 2 separate notebooks (one for data prep and the other for plotting) - made the parallel changes to `.circleci/config.yml` - used `dply::inner_join` to merge the dimension reduction scores with the metadata - added tsv files of the dimension reduction score data.frames - Created the suggested functions, including a wrapper function to execute all of the functions - saved aligned dimension reduction score data.frames as `RDS` files to be read into the second notebook
01-transcriptomic-analysis-prep.Rmd
- `analysis/transcriptomic-dimension-reduction/results/kallisto_tsne_model.RDS` | ||
- `analysis/transcriptomic-dimension-reduction/results/kallisto_umap_model.RDS` | ||
|
||
_Note that the tsv files holding the dimension reduction scores represent the scores for the RSEM (denoted by the output tsv files beginning with 15.17) and Kallisto expression data (denoted by the output tsv files beginning with 109.299)._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is naming the files this way rather than using rsem
or kallisto
in the filenames preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is a result of the as.name
function. This was meant to be a temporary fix, until I find a better method of using the transposed_expression_matrix
argument for naming.
…into transcriptomic-analysis
umap_results <- umap::umap(transposed_expression_matrix) | ||
dimension_reduction_df <- data.frame(umap_results$layout) | ||
} else { | ||
print ("Error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use stop()
here.
ID <- rownames(transposed_expression_matrix) | ||
# Perform dimension reduction | ||
if (method == "PCA" ){ | ||
prcomp_results <- prcomp(transposed_expression_matrix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation here and throughout this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recording the outcomes/discussion from pair programming
# | ||
|
||
# Run `perform_dimension_reduction` function | ||
dimension_reduction_df <- perform_dimension_reduction(transposed_expression_matrix, method, seed = 2019) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimension_reduction_df <- perform_dimension_reduction(transposed_expression_matrix, method, seed = 2019) | |
dimension_reduction_df <- perform_dimension_reduction(transposed_expression_matrix, method, seed = seed) |
# magrittr pipe | ||
`%>%` <- dplyr::`%>%` | ||
|
||
perform_dimension_reduction <- function(transposed_expression_matrix, method, seed = 2019) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accept an argument that is the file name for the model RDS
# Assign the relevant rownames(Kids_First_Biospecimen_ID) to a column | ||
dimension_reduction_df$Kids_First_Biospecimen_ID <- ID | ||
# Write the resulting data.frame to a tsv file | ||
readr::write_tsv(dimension_reduction_df, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the file written by this function would be the entire model as an RDS object. I would also not use the results_dir
from the global environment but instead pass the filename (which contains the full path information) to the function.
dplyr::inner_join(metadata_df, by = c("Kids_First_Biospecimen_ID")) | ||
|
||
# Write the resulting metadata aligned data.frame to a RDS file | ||
readr::write_rds(aligned_scores_df, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to be a TSV file so that folks that are not R users could plot, say, PC1 vs. a variable of their interest.
file.path(results_dir, file = paste(matrix_name, method, "scores.tsv", sep = "_"))) | ||
|
||
return(dimension_reduction_df) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line here
-Renamed the output files to better represent their content - Replaced `print("error")` with `stop` function - Fixed indentation - Included `filename` as an argument in the functions executed in this nb - edited the `.circleci/config.yml` to reflect only the file on this branch
- transformed the .Rmd file to R script in order to pass command line argument to `perform_dimension_reduction` function - made the appropriate change to `.circleci/config.yml` - changed file paths using `rprojroot` function - added `optparse` to Dockerfile (also needed to pass command line argument to function) - renamed the output files to better represent their content
…anscriptomic-analysis
- also fixed a type and some spacing
01-transcriptomic-analysis-prep.Rmd
01-transcriptomic-analysis-prep.Rmd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 excellent job on the progression of this PR @cbethell !
Purpose/implementation
The purpose of this draft PR is to conduct an unsupervised transcriptomic analysis using three dimensionality reduction techniques. The techniques used include PCA, t-SNE, and UMAP.
You can find the html output of this notebook here.
Issue
This draft PR addresses issue #9.
Directions for reviewers
Given the first set of visualizations produced, it was suggested that I backtrack using the
Kids_First_Participant_ID
andexperimental_strategy
information to find the sequencing strategies used for each observation. The unique strategies were found to beWGS
,WXS
,Panel
, andRNA-Seq
(which is renamed and treated asUnknown
).Results
Visualizations of the reduction scores for each of the dimension reduction techniques executed in this notebook are available on the html output and in the directory
analyses/unsupervised-trancriptomic-analysis/plots
The .PDF files in the
plots
directory include visualizations before and after the attempt to correct for batch effects. Those beginning withmeta_grid
represent the data before batch correction attempt, while those beginning withno_batch_grid
represent the data after the batch correction attempt.Docker and continuous integration
Check all those that apply or remove this section if it is not applicable.
Pull Request Check List: