# Plasmid reconstruction with MOB-Suite

A description of MOB-Suite, adapted from the paper: 

> MOB-Suite includes tools for the typing and reconstruction of plasmid sequences from WGS assemblies (https://github.com/phac-nml/mob-suite). The MOB-suite is a modular set of tools for the clustering, reconstruction and typing of plasmids from assemblies. It works on genome assemblies produced by any assembler. It uses a reference database approach for identifying contigs of plasmid origin and then aggregates the plasmid contigs into groups based on an internal clustering scheme.

The paper describing MOB-Suite: https://doi.org/10.1099/mgen.0.000206

Here we will use two tools from the MOB-Suite:

* `mob_typer`: Provides in silico predictions of the replicon family, relaxase type, mate-pair formation type and predicted transferability of the plasmid. 
* `mob_recon`: This tool reconstructs individual plasmid sequences from draft genome assemblies using the clustered plasmid reference databases provided by MOB-cluster

## Using MOB-Suite with our Unicycler assembly


### mob_typer 

> Provides in silico predictions of the replicon family, relaxase type, mate-pair formation type and predicted transferability of the plasmid. 

This is a simplified report compared to `mob_recon` below, and useful for a first look. 

```bash
mob_typer --multi --infile ori/sample_unicycler_assembly.fasta  --out_file res/mobsuite_unicycler.txt
```

| sample_id | num_contigs    | size        | gc | md5     | rep_type(s) | rep_type_accession(s)            | relaxase_type(s)     | relaxase_type_accession(s)             | mpf_type | mpf_type_accession(s) |
|-----------|----------------|-------------|----|---------|-------------|----------------------------------|----------------------|----------------------------------------|----------|-----------------------|
| 1         | length=5109618 | depth=1.00x | 1  | 5109618 | 50.77150582 | d7228346cba063582741927f9d1a4a4e | -                    | -                                      | MOBP     | NC_021819_00066       |
| 2         | length=135479  | depth=1.00x | 1  | 135479  | 52.48783944 | 873c56604165e850c630e1fb030aae6c | IncFIA,IncFIC,IncFII | MK878891_00042,CP014493_00001,KF954760 | MOBF     | NC_017627_00068       |
| 3         | length=3962    | depth=1.02x | 1  | 3962    | 49.46996466 | f9594ad94a1fdbe60cffa34783d36d9b | rep_cluster_1778     | 001201__CP010878_00001                 | MOBQ     | NC_011411_00002       |


| sample_id | mash_nearest_neighbor | mash_neighbor_distance | mash_neighbor_identification | primary_cluster_id | predicted_host_range_overall_name |
|-----------|-----------------------|------------------------|------------------------------|--------------------|-----------------------------------|
| 1         | -                     | mobilizable            | MK439959                     | 0.178173           | Escherichia coli AA860            |
| 2         | KT754162              | conjugative            | HG941719                     | 2.38E-05           | Escherichia coli O25b:H4-ST131    |
| 3         | NC_010672,NC_016036   | mobilizable            | HG941720                     | 0.000631368        | Escherichia coli O25b:H4-ST131    |


### mob_recon

> This tool reconstructs individual plasmid sequences from draft genome assemblies using the clustered plasmid reference databases provided by MOB-cluster. It will also automatically provide the full typing information provided by MOB-typer. It optionally can use a chromosome depletion strategy based on closed genomes or user supplied filter of sequences to ignore.

We can run it on our sample long read assembly as follows: 

```bash
mob_recon -u  --infile ../ori/sample_unicycler_assembly.fasta  --outdir mobrecon_unicycler
```

`mob_recon` includes an optional flag if the assembly was generated with Unicycler where the circularity information can be parsed directly from the header of the unmodified Unicycler assembly. It will check for circularity flag generated by unicycler in fasta headers. Since we *have* used unicycler, we can use this feature with `-u`, or `--unicycler_contigs`. The resulting circularity check appears in `contig_report`.

`mob_recon` produces a variety of outputs:

* contig_report.txt	This file describes the assignment of the contig to chromosome or a particular plasmid grouping
* mge.report.txt	Blast HSP of detected MGE's/repetitive elements with contextual information
* chromosome.fasta	Fasta file of all contigs found to belong to the chromosome
* plasmid_(X).fasta	Each plasmid group is written to an individual fasta file which contains the assigned contigs
* mobtyper_results	Aggregate MOB-typer report files for all identified plasmid

**mge.report.txt**

| sample_id                 | molecule_type | contig_id                    | size    | gc       | md5                              | mge_id | mge_acs    | mge_type | mge_subtype | mge_length | mge_start | mge_end | contig_start | contig_end |
|---------------------------|---------------|------------------------------|---------|----------|----------------------------------|--------|------------|----------|-------------|------------|-----------|---------|--------------|------------|
| sample_unicycler_assembly | chromosome    | 1 length=5109618 depth=1.00x | 5109618 | 50.77151 | d7228346cba063582741927f9d1a4a4e | 1154   | JN157804   | ISEc23   | IS66        | 2534       | 2         | 2533    | 233668       | 236199     |
| sample_unicycler_assembly | chromosome    | 1 length=5109618 depth=1.00x | 5109618 | 50.77151 | d7228346cba063582741927f9d1a4a4e | 1399   | CP016586.1 | 16s-rRNA | 16S         | 1554       | 1         | 1554    | 243431       | 244984     |
| sample_unicycler_assembly | chromosome    | 1 length=5109618 depth=1.00x | 5109618 | 50.77151 | d7228346cba063582741927f9d1a4a4e | 1400   | CP016586.1 | 23s-rRNA | 23S         | 3027       | 659       | 3027    | 245967       | 248335     |
| sample_unicycler_assembly | chromosome    | 1 length=5109618 depth=1.00x | 5109618 | 50.77151 | d7228346cba063582741927f9d1a4a4e | 270    | KT777639   | ISEc12   | IS21        | 2582       | 1         | 2581    | 1479897      | 1482477    |
| sample_unicycler_assembly | chromosome    | 1 length=5109618 depth=1.00x | 5109618 | 50.77151 | d7228346cba063582741927f9d1a4a4e | 1247   | NC_002695  | ISEc5    | ISAs1       | 1291       | 1         | 1291    | 1647466      | 1648756    |

## Using MOB-Suite with our Shovill assembly


MOB-Suite works on genomes produced by any assembler. 


```
mob_recon --infile ori/sample_shovill_assembly.fasta  --outdir res/mobrecon_shovill  --force
```