Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation file: Label (in numeric form) not displaying default color options #53

Closed
jrdickey9 opened this issue Apr 17, 2023 · 4 comments

Comments

@jrdickey9
Copy link

Howdy there,

I am using VizBin to visualize and manually bin mags for host associated bacterial populations. I have a single .fasta file that contains scaffolds corresponding to samples. In other words, multiple sample .fasta files were combined into a single .fasta file so that I can bin genomes across all samples of interest. My goal here was to create an annotation file to reflect this. Each sample, or label (#1-9), corresponding to a color. Any color, really doesn't matter. I created the annotation file from the combined fasta file in efforts to maintain the scaffold order and find other interesting properties such as length and gc content.

The issue I am having is that the annotation file is working, I think, but the labels are not being read. To explain further, it appears that the size of each point is changing based on length. That is helpful somewhat.

In the future I would like to add a reference genome of my "bacteria of interest" as a marker to aid my binning (receive more complete bins and avoid contamination as much as possible). I have yet to add this to my annotation file since the labels aren't being read.

Beyond that -- I have MANY scaffolds that I am inputting into VizBin (>4mil). I have set minimum contig length to 2Kb or 3Kb. The annotation file and fasta contain the same number of entries prior to input into VizBin. The minimum contig length does toss out plenty of scaffolds, but not so much that only one label is left.

Below is the head and tails of both my annotation file and the .fasta file.

A) head annotation file
label,length,gc
1,134077,45.42
1,87175,45.16
1,65686,45.71
1,52865,45.92
1,44948,34.86
1,42530,45.30
1,42475,46.38
1,40293,45.94
1,29404,48.00

B) tail annotation file
9,200,56.50
9,200,55.50
9,200,60.00
9,200,53.00
9,200,52.50
9,200,53.00
9,200,42.00
9,200,42.50
9,200,34.00
9,200,57.00

C) head fasta file

D0_SEK2_2_scaffold_1_c1
CAATCGATACGACCCCGGAGAGCGGCTTTTGCTAAAACTCGAGCAGTTTCTTGAAAACTT
GCTTCTGATATGAAACTTTGAGTATTTAGAGATGCTTTCGTTATTCCCAATAAGATGGCT
CGATAACAGATCGCTTCTTCCAAAGCACGCCCTGTTCGTTCCGCCCGCAACAATTCAATT
AGTTCTCCGGGTGAAAAAACATTAGACATTCTATCTTCTGAAACCAACACTTTTGATGTT
ATTTGACGCACAATAATCTCTATATGTCGATTATGAATCTGCACTCCCTGAGATCGATAA
ACTTTTTGGATCTTATTAACCAAAGAGATACGACTTTGCACTATAGTTAGCTCAGCACCA
ATCAAGAATCCCCAAGGAATTCCAAGAATTTTTGCTATACGCTCGTTCCAACCCTCAATC
CTCTTTTCTAGGTTCATCGATATTGAATCAATCGAACGAACTTCTAACACTTGTTCCACT
TTTGGAAGACCTTGCGTTATATCTCCAGATCTCTATTTTTCATATATAAATGTAACTAAC

D) tail fasta file

W_SEK2_D15_scaffold_211474_c1
ATATTCCACTCCCATATCTGTGTTATTATTCGTTGCAATAAACCTTCATTACACTTTATA
TGATAGCACGACCATATTCCACTCCCATATCTGTGTTATTATTCGTTGCAATAAACCTTC
ATTACACTTTATATGATAGCACGACCATATTCCACTCCCATATCTGTGTTATTATTCGTT
GCAATAAACCTTCATTACAC
W_SEK2_D15_scaffold_113313_c1
CAGCAGACCGTGATGTCTTACGCCTGTGTTGCCCTCTACCGCTATGCGGTTGGTAAGCCA
GTGCCAGGGTTCGACCCAACGGCTATGCAGGGAGCGTTCCGAGTGAAGAAGCAGAAGTTC
ACCGGACAAGCCGGAGCCTAATTAGCGCCTAGGGCCACTCCGCGAACGAGAGCCTTCTGG
AAGTTCAGGTAAATGAACAC

note: D0_SEK2_2 is a sample name that I replaced with 1 in the label column of the annotation file. I thought potentially this software wasn't reading the labels correctly due to the underscores or the combination of letters and numbers. However, when it is just numbers, I am failing to get anything.

Any help would be great,

Jonathan
Post Doc, UCSD

@jrdickey9
Copy link
Author

PS: Here is the png from the output with a csv formatted annotation file and the fasta input.

Screenshot 2023-04-17 at 3 54 45 PM

@jrdickey9
Copy link
Author

Resolved - size filter fasta file before input into VizBin. Make annotation file from size selected fasta. Selected same size filter and proceed.

@claczny
Copy link
Owner

claczny commented Apr 24, 2023

Hi Jonathan,

thank you for the issue and great to see that you have been able to resolve it.
I was off for a week, so could only reply now.

Indeed, this is the way that I'd have suggested to you too.
It is a point where the UX could surely be improved, but, unfortunately, I currently do not have resources available that I could dedicate to this.

As you mentioned this to be host-associated, my "suspicion" is that the big cluster in the middle might be genomic fragments from the host. Or maybe the distorted "C" shape cluster at 12 o'clock 🤔
Unless you filtered out reads prior already, than this is a different story 😉

Should you have further questions, please do not hesitate to ask.

Best wishes and stay safe,

Cedric

@jrdickey9
Copy link
Author

jrdickey9 commented Apr 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants