Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no top_genes in outputs #81

Open
DiegoSafian opened this issue Apr 12, 2024 · 4 comments
Open

no top_genes in outputs #81

DiegoSafian opened this issue Apr 12, 2024 · 4 comments

Comments

@DiegoSafian
Copy link

Hi,

I am running version 4.1 in command line and I am getting all the output, except for top_genes. Do you know if there is something I can do to obtain it??
cnmf.txt

Kind regards,
Diego

@dylkot
Copy link
Owner

dylkot commented May 7, 2024

Hi Diego, currently top_genes is created on the fly with the cnmf_obj.load_results() function in the Python environment and isn't created in any of the functions run from the command line. I'll consider adding something like the top_genes output to the consensus step in the future.

@dylkot dylkot closed this as completed May 7, 2024
@blain1995
Copy link

Hi Dylan, thank you for creating such a fantastic tool. If you do have time, adding top_genes output for the command line would be really helpful! Thank you so much!

@dylkot
Copy link
Owner

dylkot commented Sep 6, 2024

Yes, hopefully I'll get to this soon. In the mean time, the python code to get this is below and I asked chatgpt to convert to R and it gave me the code below that:

spectra_scores = pd.read_csv(spectra_scores_file, sep='\t', index_col=0)
n_top_genes = 50
top_genes = []
for gep in spectra_scores.columns:
    top_genes.append(list(spectra_scores.sort_values(by=gep, ascending=False).index[:n_top_genes]))
        
top_genes = pd.DataFrame(top_genes, index=spectra_scores.columns).T
# Load required libraries
library(readr)
library(dplyr)

# Read the spectra scores from the file
n_top_genes <- 50
spectra_scores <- read_tsv(spectra_scores_file, col_names = TRUE)

# Initialize an empty list to store top genes
top_genes <- list()

# Loop through each column (excluding the first column, which is typically row names)
for (gep in colnames(spectra_scores)) {
  # Sort values by column, extract top n genes
  top_genes[[gep]] <- spectra_scores %>%
    arrange(desc(!!sym(gep))) %>%
    slice(1:n_top_genes) %>%
    pull(1)  # Pull the first column (assuming it's the row index or gene name)
}

# Convert the list of top genes to a data frame and transpose it
top_genes_df <- as.data.frame(do.call(cbind, top_genes))
colnames(top_genes_df) <- colnames(spectra_scores)

# Optionally, transpose the data frame
top_genes_df <- t(top_genes_df)

@dylkot dylkot reopened this Sep 6, 2024
@blain1995
Copy link

Thank you very much for your quick and detailed response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants