Skip to content

How to chose the right plasmids

pedroscampoy edited this page Nov 5, 2018 · 12 revisions

Summary

Summarizing, the best plasmids to use as scaffold are the longest, most-covered for each size range. The recommended steps are as follow:

  1. Execute summary_table.sh and sort by either size or coverage percentage.
  2. Select the longest most-covered plasmid for each size range aided with the summary image in order to delimit each group.
  3. Analyze contig distribution among candidates using individual images:
    • Contigs align completely: contig track and complete contig track are very similar.
    • Contigs don't overlap completely, small contigs should not be included within a large one.
  4. Check expected annotation looking for plasmidic features such Inc groups or known phenotipic traits such ARG.

Choosing the plasmids in Salmonella enterica subsp. enterica serovar Typhi str. CT18 (ASM19599v1):

PlasmidID was executed on simulated reads using this sample. Parameters were 90% coverage and 90% clustering. The summary table was opened and ordered by descending order of coverage percentage, resulting in this table:

AC_Number Length Species Description SEN
NZ_LT904874.1 2090 Salmonella enterica subsp. enterica serovar Typhi strain ERL11909 genome assembly plasmid: 3 100
NC_003385.1 106516 Salmonella enterica subsp. enterica serovar Typhi str. CT18 plasmid pHCM2 99.8498
NZ_LT904880.1 106516 Salmonella enterica subsp. enterica serovar Typhi strain ty3-193 genome assembly plasmid: 3 99.8498
NZ_LT883154.1 106706 Salmonella enterica subsp. enterica serovar Typhi strain ERL12148 genome assembly plasmid: 2 99.5155
NZ_LT904895.1 106704 Salmonella enterica subsp. enterica serovar Typhi strain ERL12960 genome assembly plasmid: 2 99.4836
NZ_LT904853.1 106704 Salmonella enterica subsp. enterica serovar Typhi strain TY585 genome assembly plasmid: 2 99.3384
NC_003384.1 218160 Salmonella enterica subsp. enterica serovar Typhi str. CT18 plasmid pHCM1 98.667
NC_013365.1 204604 Escherichia coli O111:H- str. 11128 plasmid pO111_1 DNA 96.8637
K00826.1 4012 Escherichia coli plasmid pCM959 95.8126
NC_009981.1 208409 Salmonella enterica subsp. enterica serovar Choleraesuis plasmid pMAK1 92.5713
NC_002305.1 180461 Salmonella typhi plasmid R27 90.8113

In this table we can see all plasmids in the database that were covered more than 90% and clustered by an homology of 90%. In that case there were 11 plasmids:

  • 5 with a size of ~106000 bases
  • 4 with a size of 180000 - 218000 bases
  • 1 with 2090 bases
  • 1 with 4012 bases

In the summary image, those plasmids are represented like that:

01_img_guide


The connections between those 11 plasmids gather them in 4 groups color coded by their contig composition:

  • Dark blue group
  • Light blue + pink + green + red
  • Orange
  • No color (we'll see later)

01_img_guide


Now, checking individual images of each group we can find:

Group 1, dark blue contig

NC_003385.1 NZ_LT904880.1 NZ_LT883154.1
NZ_LT904895.1 NZ_LT904853.1

All those reference plasmids are very similar. Since each sequence start at a different position, cd-hit was unable to cluster them into one single representative sequence, still, reducing clustering parameter lower than 90% may reduce the number of entries in this group in favour of the largest. In this sample, NC_003385.1 was the actual plasmid but all were useful as scaffold. All reference plasmids unveiled the contig 20 as the actual sequence that belongs to the sample plasmid and, since the genes are annotated over the contig, the final result is independent from the scaffold.

Group 2, Light blue + pink + green + red

NC_003384.1 NC_013365.1
NC_009981.1 NC_002305.1

NC_003384.1 is the best option since all contigs are correctly distributed, there is no overlap within them. NC_013365.1 and NC_009981.1 are also a good scaffolds. These plasmids are similar but smaller and that is the reason why contigs 44 and 43 partially overlap in complete contig track. Also the main antibiotic resistance module (ARM) is inserted in opposite directions, but does not affect annotation and plasmid reconstruction. NC_002305.1 is the worst option between the 4 plasmids in this group. It only reconstruct the backbone sequence. but fails to reconstruct the ARM. The complete contig track shows a complete overlap between contigs 5 and 19 wich is a symptom of a bad plasmid reconstruction so, in those cases, other options should be considered, in that example all three options are better, being the best NC_003384.1.

Group 3, orange


01_img_guide


This plasmid seems to be another plasmid present in the sample. It is composed by contigs 67 and 61. Contig 67 is aligned completelly, but contig 61 is much longer than the plasmid itself, which points as false positive. Following the steps recommended, we can't find any plamid specific annotation. Finally, we BLASTed those contigs and none of them matched any plasmid, instead they matched chromosomal sequences. We can assume that this is either a wrong submission or there is a tiny plasmid that has partial identity with this this sample.

Group 4, no color


01_img_guide


This is a clear example of an absent plasmid. Even though is mapped more than 90% of it small length, those sequences are not in a plasmid within the sample analyzed because no contig matched this plasmid. This is the reason there is no track under the annotation track. No contigs matched the threshold for each track, as explained in this wiki page.

Finally we can assume this sample have 2 plasmids, very similar or identical to NC_003384.1 and NC_003385.1. Both are properly annotated and this annotation is inherent to the sample, not the reference plasmid that is used just as a reference for the actual plasmid reconstruction.