Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of the output #2

Open
xuxingyubio opened this issue Nov 22, 2023 · 3 comments
Open

The problem of the output #2

xuxingyubio opened this issue Nov 22, 2023 · 3 comments

Comments

@xuxingyubio
Copy link

Thank you for the development of the software PRAWNS! Here I have some problems.

I used the following parameters: --min_perc 85.0

I used the --min_perc 85.0 parameter, but why are there some retained blocks in the retained_block_coords.csv that do not appear in more than 85% of the samples, and the coordinates appear to be consistent. Are these coordinate information given in the order of the sample files in --input? The results of metablock_coords.csv seem to conform to appearing in more than 85% of the samples, and the coordinates are inconsistent.

Below is a line of the retained_block_coords.csv:
53,50,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3249343,3249392,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3249343,3249392,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3249343,3249392,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3249343,3249392,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3249343,3249392,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

@KiranJavkar
Copy link
Owner

Hey! Thanks a lot for using PRAWNS! Apologies for replying a bit late.

To understand the situation with multiple 0s in the retained block coordinates, let me walk you through the process of block identification and metablock construction.

With your setting of min_perc, PRAWNS will first locate the blocks present in at least 85% of the input genomes. Once these blocks are identified, PRAWNS will attempt to merge these blocks into metablocks wherever possible. I would encourage you to have a look at the PRAWNS manuscript if you like to understand the metablock construction better.

During the metablock construction, the corresponding constituent blocks from some genomes may not be merged into the metablock: This can be due to multiple reasons such as not all "core" blocks being identified, the core blocks were not in the required relative orientations, or the separation between the collocated blocks were larger than specified max_neighbor_distance
The retained blocks refer to all the blocks (present in at least min_perc genomes) that weren't merged into metablocks in the corresponding genomes.
Accordingly, in your example, the block (block id 53 of length 50) should be merged into a metablock. The coordinates shown in this line refer to the genome coords where this block can be found but NOT with the associated metablock region.

The logs generated during PRAWNS execution mention the blocks that went into the construction of each metablock. In case these logs are not available now, an easier way to determine this is to perform an alignment of the block FASTA sequence with the metablocks.fasta sequences. If this block is a core block in a metablock, the alignment should be an exact match there, otherwise, there could be a few mismatches.

Let me know if this helps. In case you need further detailed assistance with understanding the outputs, you may reach out to me over email and we can discuss this further.

Thanks again for using PRAWNS and all the best with your research :)

@xuxingyubio
Copy link
Author

Thank you for your detailed answer. I want to make sure that the coordinates are given in the order of the sample file in -- input?

@KiranJavkar
Copy link
Owner

Yes, that is correct. Further, the fasta sequences (contigs) within a genome file input are deemed concatenated in the same order as that from the provided input.
You can use the additional utility scripts to extract the corresponding sequences from the genomes of interest.

For instance, in the earlier example, you can run the following command and extract the corresponding retained block sequence from genome number 39 (i.e. 40th genome in the list---the genomes are 0-indexed):

./kmer_variant_coords_fasta_display.o PRAWNS_results/all_assembly_filepaths.txt 39 3249343 3249392

Retained block index: 53,
Length: 50,
Genome Coordinates ==>
Genome 0: 0,0,0,
Genome 1: 0,0,0,
Genome 2: 0,0,0,
Genome 3: 0,0,0,
Genome 4: 0,0,0,
Genome 5: 0,0,0,
Genome 6: 0,0,0,
Genome 7: 0,0,0,
Genome 8: 0,0,0,
Genome 9: 0,0,0,
Genome 10: 0,0,0,
Genome 11: 0,0,0,
Genome 12: 0,0,0,
Genome 13: 0,0,0,
Genome 14: 0,0,0,
Genome 15: 0,0,0,
Genome 16: 0,0,0,
Genome 17: 0,0,0,
Genome 18: 0,0,0,
Genome 19: 0,0,0,
Genome 20: 0,0,0,
Genome 21: 0,0,0,
Genome 22: 0,0,0,
Genome 22: 0,0,0,
Genome 23: 0,0,0,
Genome 24: 0,0,0,
Genome 25: 0,0,0,
Genome 26: 0,0,0,
Genome 27: 0,0,0,
Genome 28: 0,0,0,
Genome 29: 0,0,0,
Genome 30: 0,0,0,
Genome 31: 0,0,0,
Genome 32: 0,0,0,
Genome 33: 0,0,0,
Genome 34: 0,0,0,
Genome 35: 0,0,0,
Genome 36: 0,0,0,
Genome 37: 0,0,0,
Genome 38: 0,0,0,
Genome 39: 3249343,3249392,1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants