Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge multiple KEMET results #11

Closed
mattoslmp opened this issue Sep 17, 2022 · 2 comments
Closed

Merge multiple KEMET results #11

mattoslmp opened this issue Sep 17, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@mattoslmp
Copy link

Dear,
I performed kemet against several samples, can you give me some tips on how to merge these tables into one?
Best regards,
Leandro.

@Matteopaluh
Copy link
Owner

Matteopaluh commented Sep 19, 2022

Dear Leandro,
Thanks for using our tool.
Regarding how to summarize KEMET output into a single table, it would depend how you'd like to have them summarized (i.e. the specific format).

I'd personally do that using a combination of bash commands to extract the columns of interest from the .tsv table files.
For example I quickly tried these commands:

# move to the KEMET report folder
cd KEMET/reports_tsv

# create first column of summary file
echo samples > modules.start

# add modules ID in summary file
# replace [NAME] w/ any single .tsv filename
cut -f1 [NAME] >> modules.start

# extract module compleness per each genome as a tmp file
for f in *.tsv; do echo ${f:10:-4} > $f.tmp; cut -f3 $f >> $f.tmp; done

# create new folder for result
mkdir summary
# unite modules ID and result per each genome
paste modules.start *.tmp > summary/summarized_table.tsv

# clean from tmp files
rm *.tmp modules.start

Do you have anything specific in mind?

Best,
Matteo

@mattoslmp
Copy link
Author

mattoslmp commented Sep 19, 2022

Dear Matteo, thank you for your attention and help, your script worked perfectly. It was exactly what I needed.

I ended up (parser) doing something similar in R, I'll post it below in case anyone needs a second solution:

rm(list=ls())
library (purrr)
library(readr)
library(ggpubr)
library(stringr)

setwd ("D:/ITV/KEMET_resultados/reports_tsv_KASS")

path: To specify directory contain KEMET results:
data_join <-list.files(path="D:/ITV/KEMET_resultados/reports_tsv_KASS/", pattern="*.tsv", full.names=TRUE) %>%
lapply(read_tsv) %>%
reduce(full_join, by = "Module_id") %>% unique()

modules_id <- data_join$Module_id # colname: module_id
modules_names <- data_join$Module_name.x # colname: module_name
df <- data_join %>%
select(matches("(Completeness)"))

My filenames pattern of KEMET results: reportKMC_Ga0541012_bin.tsv
myfilenames <-list.files(path="D:/ITV/KEMET_resultados/reports_tsv_KASS/", pattern="*.tsv", full.names=TRUE)
name_files <- sapply(strsplit(myfilenames, split='reportKMC_', fixed=TRUE), function(x) (x[2]))
name_files <- str_remove(name_files, pattern = ".tsv")
df2 <- data.frame(modules_id, modules_names, df)
colnames(df2) <- c ("Module_id", "Completeness", name_files)
write.table (df2, "Res_KEMET.tsv")

Best regards,
Leandro.

@Matteopaluh Matteopaluh added the enhancement New feature or request label Dec 13, 2022
@Matteopaluh Matteopaluh pinned this issue Dec 25, 2022
@Matteopaluh Matteopaluh changed the title Merge KEMET table Merge multiple KEMET results Dec 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants