You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for developing such an excellent tool as semibin2, which performs exceptionally well and can generate a large number of high-quality MAGs.
Therefore, we are interested in applying semibin2 to the analysis of our large datasets. Considering that the analysis of large datasets is usually very time-consuming, we hope to streamline the pipline as much as possible.
Sorting Bam files often consumes a significant amount of computational and storage resources (e.g., temporary files when sorting are usually hundreds of Gbs per bam in our case). However, it seems that Semibin2 does not support unsorted bam as input, as an error occurs when running the "generate_sequence_features_single" module:
Input error: Chromosome k127_4971567 found in non-sequential lines. This suggests that the input file is not sorted correctly.
I would like to ask if there are any alternative tools or ways to generate the "data.csv" and "data.split.csv" based on unsorted bam files? Or, is it possible to make simple modifications on the "generate_sequence_features_single" module to adapt it to unsorted bam?
The text was updated successfully, but these errors were encountered:
Thanks for the reply. Currently, I can generate tetramer frequencies in "data.csv". The abundance calculated by NGLess seems to be similar to the trend of abundance generated by Bedtools in semibin. So, can the abundance calculated by NGLess replace the abundance calculated by Bedtools?
Additionally, I noticed that "data_split.csv" appears to sample the contig from "data.csv", and then split its abundance and tetramer frequencies into two numbers (it seems the average of this two values is the number in "data.csv"). How is this process achieved? Could you briefly introduce the logic behind it?
Thank you for developing such an excellent tool as semibin2, which performs exceptionally well and can generate a large number of high-quality MAGs.
Therefore, we are interested in applying semibin2 to the analysis of our large datasets. Considering that the analysis of large datasets is usually very time-consuming, we hope to streamline the pipline as much as possible.
Sorting Bam files often consumes a significant amount of computational and storage resources (e.g., temporary files when sorting are usually hundreds of Gbs per bam in our case). However, it seems that Semibin2 does not support unsorted bam as input, as an error occurs when running the "generate_sequence_features_single" module:
I would like to ask if there are any alternative tools or ways to generate the "data.csv" and "data.split.csv" based on unsorted bam files? Or, is it possible to make simple modifications on the "generate_sequence_features_single" module to adapt it to unsorted bam?
The text was updated successfully, but these errors were encountered: