# Example 1

To take a glimpse of how GetOrganelle works and what the output files look like, in this part, we use GetOrganelle to assemble the _Arabidopsis thaliana_ chloroplast genome from [a simulated mini-dataset](https://github.com/Kinggerm/GetOrganelleGallery/tree/master/Test/reads).

 Computational Resource | Requirements
---- | ----
 System | Linux/MacOS
 Memory | ~600 MB
 CPU time | ~60 sec


### Downloading reads

Let's download the reads using `wget`:

In [5]:
%%bash
curl -L https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.1.fq.gz -o Arabidopsis_simulated.1.fq.gz
curl -L https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.2.fq.gz -o Arabidopsis_simulated.2.fq.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0100   176    0   176    0     0    129      0 --:--:--  0:00:01 --:--:--   129
  0 8590k    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0  2 8590k    2  223k    0     0  65392      0  0:02:14  0:00:03  0:02:11  184k  4 8590k    4  403k    0     0  88685      0  0:01:39  0:00:04  0:01:35  170k  6 8590k    6  565k    0     0   106k      0  0:01:20  0:00:05  0:01:15  187k 10 8590k   10  860k    0     0   136k      0  0:01:03  0:00:06  0:00:57  214k 12 8590k   12 1068k    0     0   146k      0  0:00:58  0:00:07  0:00:51  213k 14 8590k   14 1244k    0     0   148k      0  0:0

### Conduct plastome assembly

In [7]:
%%bash
get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10


GetOrganelle v1.7.1

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.7.3 (default, Mar 27 2019, 16:54:48)  [Clang 4.0.1 (tags/RELEASE_401/final)]
PYTHON LIBS: GetOrganelleLib 1.7.1; numpy 1.18.1; sympy 1.6.1; scipy 1.4.1; psutil 5.7.0
DEPENDENCIES: Bowtie2 2.3.5.1; SPAdes 3.12.0; Blast 2.9.0; Bandage 0.8.1
SEED  DB: embplant_pt customized; embplant_mt customized
LABEL DB: embplant_pt customized; embplant_mt customized
WORKING DIR: /Users/JJJ/Documents/Codes/Python/Workshop/Rings
/Users/JJJ/.pyenv/versions/anaconda3-5.3.1/bin/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10

2020-07-25 23:25:03,186 - INFO: Pre-reading fastq ...
2020-07-25 23:25:03,187 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf')

### Running log

More info could be found via https://github.com/Kinggerm/GetOrganelle/wiki/Example-1
![More info could be found via https://github.com/Kinggerm/GetOrganelle/wiki/](resources/pics/2.online.running.log.png)

### Brief description of the used options

Flag | Value | Illustration
 - | - | -
-1 | Arabidopsis_simulated.1.fq.gz | Input file with the forward paired-end reads (*.fq/.gz/.tar.gz)
-2 | Arabidopsis_simulated.2.fq.gz | Input file with the reverse paired-end reads (*.fq/.gz/.tar.gz)
-t | 1 | Maximum threads to use. Default: 1
-o |Arabidopsis_simulated.plastome | Output directory
-F | embplant_pt | Target organelle genome type(s)
-R | 10 | Maximum extension rounds

#### Find more
for frequently used options

    get_organelle_from_reads.py -h

for all options

    get_organelle_from_reads.py --help

### Output files

You will see the following files in the output directory `Arabidopsis_simulated.plastome`. 

In [16]:
%%bash
ls -lah Arabidopsis_simulated.plastome

total 31224
drwxr-xr-x  16 JJJ  staff   512B Jul 25 23:31 .
drwxr-xr-x  12 JJJ  staff   384B Jul 26 00:01 ..
-rw-r--r--   1 JJJ  staff   151K Jul 25 23:31 embplant_pt.K115.complete.graph1.1.path_sequence.fasta
-rw-r--r--   1 JJJ  staff   151K Jul 25 23:31 embplant_pt.K115.complete.graph1.2.path_sequence.fasta
-rw-r--r--   1 JJJ  staff   126K Jul 25 23:31 embplant_pt.K115.complete.graph1.selected_graph.gfa
-rw-r--r--   1 JJJ  staff   114K Jul 25 23:37 embplant_pt.K115.complete.graph1.selected_graph.png
-rw-r--r--   1 JJJ  staff   6.9M Jul 25 23:30 filtered_1_paired.fq
-rw-r--r--   1 JJJ  staff    25K Jul 25 23:30 filtered_1_unpaired.fq
-rw-r--r--   1 JJJ  staff   6.9M Jul 25 23:30 filtered_2_paired.fq
-rw-r--r--   1 JJJ  staff    19K Jul 25 23:30 filtered_2_unpaired.fq
-rw-r--r--   1 JJJ  staff   256K Jul 25 23:31 filtered_K115.assembly_graph.fastg
-rw-r--r--   1 JJJ  staff   4.3K Jul 25 23:31 filtered_K115.assembly_graph.fastg.extend-embplant_pt-embplant_mt.csv
-rw-r--r--   1 JJJ  staf

In samples with IRs, two isomeric plastome sequences will be generated, differing in the orientation of SSC. These two isomeric configurations both exist in the plant ([Palmer 1983](https://doi.org/10.1038%2F301092a0); [JF Walker et al. 2015](https://bsapubs.onlinelibrary.wiley.com/doi/full/10.3732/ajb.1500299); [Wang & Lanfear 2020](https://doi.org/10.1093/gbe/evz256)) and are both usable. 
In practice, people usually arbitrarily use the one with a commonly-used order (e.g. [_Arabidopsis thaliana_ NC_000932.1](https://www.ncbi.nlm.nih.gov/nuccore/NC_000932.1/)). 

The final assembly graph will be visualized as png format if `Bandage` was added to the path (or use following command to do so).

In [15]:
%%bash
# Bandage image Arabidopsis_simulated.plastome/embplant_pt.K115.complete.graph1.selected_graph.gfa Arabidopsis_simulated.plastome/embplant_pt.K115.complete.graph1.selected_graph.png --names --depth --lengths --outline 0 --colour depth --depcollow "#D8D8D8" --depcolhi "#C60E29" --depvallow 10.0427 --depvalhi 20.583693 --fontsize 6 --iter 4

![](resources/pics/1.embplant_pt.K115.complete.graph1.selected_graph.png)