Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ComplexHeatmaps for Pilot Data plate 1 #27

Merged
merged 5 commits into from
Jan 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
745 changes: 745 additions & 0 deletions 5_analyze_data/notebooks/Heatmap_analysis/nf1_complexheatmap.ipynb

Large diffs are not rendered by default.

165 changes: 165 additions & 0 deletions 5_analyze_data/notebooks/Heatmap_analysis/nf1_complexheatmap.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
suppressPackageStartupMessages(library(ComplexHeatmap))
suppressPackageStartupMessages(library(dplyr))

# Set paths and constants
input_data_dir <- file.path("..", "..", "..", "4_processing_features", "data")
output_figure_dir <- "figures"

cp_heatmap_file_noext <- file.path(output_figure_dir, "cp_complex_heatmap")
dp_heatmap_file_noext <- file.path(output_figure_dir, "dp_complex_heatmap")

# Set heatmap colors
well_cols = c(
"C6" = "#E1DAAE",
"C7" = "#FF934F",
"D6" = "#CC2D35",
"D7" = "#058ED9",
"E6" = "#848FA2",
"E7" = "#2D3142",
"F6" = "#FFC857",
"F7" = "#5f7a12"
)
genotype_cols = c(
"Null" = "#785EF0",
"WT" = "#DC267F"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you decide on these color specifically? Would it not be easy to use a cmap or preset color palette to use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be an R related thing I don't know about now that I think of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided on these colors for a separate project 😂 (https://github.com/broadinstitute/profiling-resistance-mechanisms/blob/master/3.resistance-signature/scripts/nbconverted/7.profile-heatmaps.r)

I picked them because they are compatible with different kinds of color blindness. I don't remember exactly, but I think I used https://colorbrewer2.org

# Load data
cp_file <- file.path(input_data_dir, "nf1_sc_norm_fs_cellprofiler.csv.gz")

cp_df <- readr::read_csv(
cp_file,
col_types = readr::cols(
.default="d",
Metadata_WellRow="c",
Metadata_WellCol="c",
Metadata_Well="c",
Metadata_gene_name="c",
Metadata_genotype="c"
)
) %>% dplyr::select(-...1) # Drop index col

print(dim(cp_df))
head(cp_df, 3)

# Split metadata and feature data
cp_metadata_df <- cp_df %>% dplyr::select(tidyr::starts_with("Metadata"))
cp_meta_cols <- colnames(cp_metadata_df)
cp_df <- cp_df %>% dplyr::select(-!!cp_meta_cols)

# Calculate correlation matrix from feature data
cp_cor_matrix <- t(cp_df) %>% cor()

print(dim(cp_cor_matrix))
head(cp_cor_matrix, 3)

ht <- Heatmap(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since you make a correlation matrix in R and then make a heatmap, why is the one that I created in Python so different to this one? I see there is a lot of code to make this heatmap so is R better equipped to make a better correlation heatmap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scale in python is different from the scale in R. ComplexHeatmap auto scales, which is a caveat to interpretation.

It's also possible that the dendrogram ordering is different in python

cp_cor_matrix,
name = "Pearson\nCorrelation",
column_dend_side = "top",

clustering_method_columns = "average",
clustering_method_rows = "average",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this clustering method for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the method for hierarchical clustering (determines how the dendrogram is ordered). See https://jokergoo.github.io/ComplexHeatmap/reference/Heatmap.html


top_annotation = HeatmapAnnotation(
Genotype = cp_metadata_df$Metadata_genotype,
CellCount = anno_barplot(
cp_metadata_df$Metadata_number_of_singlecells,
height = unit(1, "cm")
),
Well = cp_metadata_df$Metadata_Well,

col = list(
Genotype = genotype_cols,
Well = well_cols
)
)
)

draw(ht)

# Save heatmap to file
pdf(paste0(cp_heatmap_file_noext, ".pdf"))
draw(ht)
dev.off()

png(paste0(cp_heatmap_file_noext, ".png"), width = 6.5, height = 6, units = "in", res = 500)
draw(ht)
dev.off()



# Load data
dp_file <- file.path(input_data_dir, "nf1_sc_norm_fs_deepprofiler_nuc.csv.gz")

dp_df <- readr::read_csv(
dp_file,
col_types = readr::cols(
.default="d",
Metadata_Plate="c",
Metadata_Well="c",
Metadata_Site="c",
Metadata_Plate_Map_Name="c",
Metadata_DNA="c",
Metadata_ER="c",
Metadata_Actin="c",
Metadata_Genotype="c",
Metadata_Genotype_Replicate="c",
Metadata_Model="c"
)
)

print(dim(dp_df))
head(dp_df, 3)

# Split metadata and feature data
dp_metadata_df <- dp_df %>% dplyr::select(tidyr::starts_with("Metadata"))
dp_meta_cols <- colnames(dp_metadata_df)
dp_meta_cols <- c(dp_meta_cols, c("Location_Center_X", "Location_Center_Y"))

dp_df <- dp_df %>% dplyr::select(-!!dp_meta_cols)

# Calculate number of single cells per well in DP data
dp_metadata_df <- dp_metadata_df %>%
dplyr::group_by(Metadata_Well) %>%
dplyr::add_tally(name = "Metadata_cell_count")

# Calculate correlation matrix from feature data
dp_cor_matrix <- t(dp_df) %>% cor()

print(dim(dp_cor_matrix))
head(dp_cor_matrix, 3)

ht <- Heatmap(
dp_cor_matrix,
name = "Pearson\nCorrelation",
column_dend_side = "top",

clustering_method_columns = "average",
clustering_method_rows = "average",

top_annotation = HeatmapAnnotation(
Genotype = dp_metadata_df$Metadata_Genotype,
CellCount = anno_barplot(
dp_metadata_df$Metadata_cell_count,
height = unit(1, "cm")
),
Well = dp_metadata_df$Metadata_Well,

col = list(
Genotype = genotype_cols,
Well = well_cols
)
)
)

draw(ht)

# Save heatmap to file
pdf(paste0(dp_heatmap_file_noext, ".pdf"))
draw(ht)
dev.off()

png(paste0(dp_heatmap_file_noext, ".png"), width = 6.5, height = 6, units = "in", res = 500)
draw(ht)
dev.off()
2 changes: 2 additions & 0 deletions 5_analyze_data/visualize_env.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: nf1_figures
channels:
- conda-forge
- bioconda
dependencies:
- conda-forge::jupyter
- conda-forge::jupyterlab
Expand All @@ -23,3 +24,4 @@ dependencies:
- conda-forge::r-reshape2
- conda-forge::r-cowplot
- conda-forge::r-patchwork
- bioconda::bioconductor-complexheatmap