Replies: 2 comments 11 replies
-
There's a cluster of repeat sequences, indicated by the dense parallel lines. If you hover your mouse cursor over the plot, the sequence coordinates will be shown. So you could slice the individual sequences to get the corresponding segments and examine them. To understand the "title" of the plot (i.e. what window, threshold, gap mean), look at help on the dotplot method. That is, in the notebook cell help(seqs.dotplot)
# or for prettier display inside a notebook you can do
seqs.dotplot? As a side note, the default values are for DNA. I think those are too stringent for a comparison of protein sequences. I recommend reducing window size, but keep the relative magnitude of threshold the same. Better yet, I strongly encourage you to do the same plot but with the corresponding DNA sequences. On a side note, one of the best way to understand these sorts of techniques is to construct a small synthetic data set that has a property of interest. By using such small synthetic cases and modifying the settings you will be able to interpret real data in a more informed manner. |
Beta Was this translation helpful? Give feedback.
-
What I mean by "...construct a small synthetic data set..." was literally make a DNA sequence up. Even though there's only one sequence in this collection, a dotplot is possible since the sequence will be just compared to itself. In the following example I have constructed a synthetic sequence that has a specific property which only manifest if the window size is exactly 4. I've set the threshold to also 4. Code is below. from cogent3 import make_unaligned_seqs
seqs = make_unaligned_seqs({"seq1": 'CACACCACTGCAGTCGGATAGACC'}, moltype="dna")
seqs.dotplot(window=4, threshold=4, rc=True) I'll let you run that code yourself. If you take a look at the sequence object itself (not the collection), you will get colour information that can assist in seeing patterns. That is, in a Jupyter notebook, you can access seqs.seqs[0] Can you figure out the "specific property" I added to the sequence? |
Beta Was this translation helpful? Give feedback.
-
Can anyone guide me about the result interpretations of Dot plot? How can I best describe the results of Dot plot output in the thesis?
![image](https://user-images.githubusercontent.com/107029444/172769351-8eebc316-40fc-495e-8edc-76603ee590b3.png)
![image](https://user-images.githubusercontent.com/107029444/172769527-7b43a620-76c6-40e2-8f7f-7bf13415e2f9.png)
As per my understanding, the straight line indicates the matching amino acids and short parallel lines indicate repeats. Thus, we can say both the sequences are similar having tandem repeats at the end. At the top of output, it is written Matched>=13/20 what does this mean?
Also, Gap <=0 but according to alignment (clustal omega) there is one repeat missing in sequence on y-axis
Beta Was this translation helpful? Give feedback.
All reactions