# Ensamble con genoma de referencia

### Genoma de referencia: _Quercus robur_ 

El genoma de _Quercus robur_ se descargó de la página Oak Genome Sequencing. El sitio web pertenece a un proyecto internacional de secuenciación del genoma del roble europeo (_Q. robur_). El proyecto está asociado con instituciones como INRAE (Institut National de la Recherche pour l'Agriculture, l'Alimentation et l'Environnement) en Francia.

Este genoma fue publicado en el trabajo de [Plomion y colaboradores (2018)](https://www.nature.com/articles/s41477-018-0172-3).


- Prueba con Trimmomatic02 

- 79 archivos fastq de *Quercus macdougallii*

Después del paso 6 se crearon las ramas (_branches_) **ref_gen_qrob_trim02_1** y **ref_gen_qrob_trim02_2**, en las cuales se establecieron parámetros específicos para los umbrales de agrupamiento, se agregó un archivo de asignación de población y se determinó el número mínimo de muestras por locus.

In [1]:
#conda install ipyrad -c ipyrad
#conda install toytree -c eaton-lab
#conda install entrez-direct -c bioconda
#conda install sratools -c bioconda

In [2]:
#Importar librerias
import ipyrad as ip
import ipyparallel as ipp

In [3]:
## Crear un objeto Ensamble llamado data1. 
ref_gen_rob_trim02 = ip.Assembly("ref_gen_rob_trim02")


New Assembly: ref_gen_rob_trim02


In [4]:
## prints the parameters to the screen
ref_gen_rob_trim02.get_params()

0   assembly_name               ref_gen_rob_trim02                           
1   project_dir                 /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/bin
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path                                                        
5   assembly_method             denovo                                       
6   reference_sequence                                                       
7   datatype                    rad                                          
8   restriction_overhang        ('TGCAG', '')                                
9   max_low_qual_bases          5                                            
10  phred_Qscore_offset         33                                           
11  mindepth_statistical        6                                            
12  mindepth_majrule            6

In [5]:
## setting/modifying parameters for this Assembly object
ref_gen_rob_trim02.set_params('project_dir', '../../data/1.3.assembly_variant_calling/ref_gen_qrob_trim02')
ref_gen_rob_trim02.set_params('sorted_fastq_path', "../../data/1.1.filter/*trim02.fastq.gz")
ref_gen_rob_trim02.set_params('assembly_method', "reference")
ref_gen_rob_trim02.set_params('reference_sequence', '../../data/reference_genomes/Qrob_PM1N.fa')
ref_gen_rob_trim02.set_params('datatype', 'ddrad')
ref_gen_rob_trim02.set_params('output_formats', ['p', 's', 'k', 'g', 'v'])

## prints the parameters to the screen
ref_gen_rob_trim02.get_params()

0   assembly_name               ref_gen_rob_trim02                           
1   project_dir                 /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/1.3.assemble_variant_calling/1.3.2.ipyrad/ref_gen_qrob_trim02
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path           /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/1.1.filter/*trim02.fastq.gz
5   assembly_method             reference                                    
6   reference_sequence          /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/reference_genomes/Qrob_PM1N.fa
7   datatype                    ddrad                                        
8   restriction_overhang        ('TGCAG', '')                                
9   max_low_qual_bases          5                                            
10 

In [6]:
## run step 1 to create Samples objects
ref_gen_rob_trim02.run("1", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:26 | loading reads        | s1 |
Parallel connection closed.


In [7]:
ref_gen_rob_trim02.stats

Unnamed: 0,state,reads_raw
CR_01_S115.trim02,1,315211
CR_02_S127.trim02,1,325350
CR_03_S139.trim02,1,278113
CR_04_S151.trim02,1,312728
CR_05_S163.trim02,1,308404
CR_06_S175.trim02,1,278013
CR_07_S186.trim02,1,293720
CR_08_S104.trim02,1,276057
CR_09_S116.trim02,1,289102
CR_10_S128.trim02,1,274193


In [8]:
## run step 2 to create Samples objects
ref_gen_rob_trim02.run("2", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:01:50 | processing reads     | s2 |
Parallel connection closed.


In [9]:
ref_gen_rob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter
CR_01_S115.trim02,2,315211,315194
CR_02_S127.trim02,2,325350,325316
CR_03_S139.trim02,2,278113,278091
CR_04_S151.trim02,2,312728,312672
CR_05_S163.trim02,2,308404,308357
CR_06_S175.trim02,2,278013,277987
CR_07_S186.trim02,2,293720,293700
CR_08_S104.trim02,2,276057,276038
CR_09_S116.trim02,2,289102,289086
CR_10_S128.trim02,2,274193,274176


In [10]:
## run step 3
ref_gen_rob_trim02.run("3", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:00 | indexing reference   | s3 |
[####################] 100% 0:02:24 | dereplicating        | s3 |
[####################] 100% 0:27:38 | mapping reads        | s3 |
[####################] 100% 0:06:03 | building clusters    | s3 |
[####################] 100% 0:00:08 | calc cluster stats   | s3 |
Parallel connection closed.


In [11]:
ref_gen_rob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth
CR_01_S115.trim02,3,315211,315194,266591,48603,25024,9993
CR_02_S127.trim02,3,325350,325316,260306,65010,25137,9861
CR_03_S139.trim02,3,278113,278091,223657,54434,24360,8932
CR_04_S151.trim02,3,312728,312672,253692,58980,25943,9870
CR_05_S163.trim02,3,308404,308357,244388,63969,24894,9443
CR_06_S175.trim02,3,278013,277987,238370,39617,24287,9195
CR_07_S186.trim02,3,293720,293700,245877,47823,25097,9468
CR_08_S104.trim02,3,276057,276038,216518,59520,24627,8646
CR_09_S116.trim02,3,289102,289086,237655,51431,24989,9513
CR_10_S128.trim02,3,274193,274176,216553,57623,23929,8813


In [12]:
## run step 4
ref_gen_rob_trim02.run("4", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:03:40 | inferring [H, E]     | s4 |
Parallel connection closed.


In [13]:
ref_gen_rob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est
CR_01_S115.trim02,4,315211,315194,266591,48603,25024,9993,0.01638,0.002155
CR_02_S127.trim02,4,325350,325316,260306,65010,25137,9861,0.016669,0.002143
CR_03_S139.trim02,4,278113,278091,223657,54434,24360,8932,0.01585,0.002189
CR_04_S151.trim02,4,312728,312672,253692,58980,25943,9870,0.01607,0.002117
CR_05_S163.trim02,4,308404,308357,244388,63969,24894,9443,0.017033,0.002142
CR_06_S175.trim02,4,278013,277987,238370,39617,24287,9195,0.015838,0.00213
CR_07_S186.trim02,4,293720,293700,245877,47823,25097,9468,0.0163,0.002147
CR_08_S104.trim02,4,276057,276038,216518,59520,24627,8646,0.016483,0.002157
CR_09_S116.trim02,4,289102,289086,237655,51431,24989,9513,0.016742,0.002131
CR_10_S128.trim02,4,274193,274176,216553,57623,23929,8813,0.016643,0.002317


In [14]:
## run step 5
ref_gen_rob_trim02.run("5", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:08 | calculating depths   | s5 |
[####################] 100% 0:00:38 | chunking clusters    | s5 |
[####################] 100% 0:14:45 | consens calling      | s5 |
[####################] 100% 0:00:34 | indexing alleles     | s5 |
Parallel connection closed.


In [15]:
ref_gen_rob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim02,5,315211,315194,266591,48603,25024,9993,0.01638,0.002155,8498
CR_02_S127.trim02,5,325350,325316,260306,65010,25137,9861,0.016669,0.002143,8342
CR_03_S139.trim02,5,278113,278091,223657,54434,24360,8932,0.01585,0.002189,7673
CR_04_S151.trim02,5,312728,312672,253692,58980,25943,9870,0.01607,0.002117,8428
CR_05_S163.trim02,5,308404,308357,244388,63969,24894,9443,0.017033,0.002142,7979
CR_06_S175.trim02,5,278013,277987,238370,39617,24287,9195,0.015838,0.00213,7921
CR_07_S186.trim02,5,293720,293700,245877,47823,25097,9468,0.0163,0.002147,8086
CR_08_S104.trim02,5,276057,276038,216518,59520,24627,8646,0.016483,0.002157,7369
CR_09_S116.trim02,5,289102,289086,237655,51431,24989,9513,0.016742,0.002131,8122
CR_10_S128.trim02,5,274193,274176,216553,57623,23929,8813,0.016643,0.002317,7498


In [16]:
## run step 6
ref_gen_rob_trim02.run("6",auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:08 | concatenating bams   | s6 |
[####################] 100% 0:00:01 | fetching regions     | s6 |
[####################] 100% 0:00:40 | building database    | s6 |
Parallel connection closed.


In [17]:
ref_gen_rob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim02,6,315211,315194,266591,48603,25024,9993,0.01638,0.002155,8498
CR_02_S127.trim02,6,325350,325316,260306,65010,25137,9861,0.016669,0.002143,8342
CR_03_S139.trim02,6,278113,278091,223657,54434,24360,8932,0.01585,0.002189,7673
CR_04_S151.trim02,6,312728,312672,253692,58980,25943,9870,0.01607,0.002117,8428
CR_05_S163.trim02,6,308404,308357,244388,63969,24894,9443,0.017033,0.002142,7979
CR_06_S175.trim02,6,278013,277987,238370,39617,24287,9195,0.015838,0.00213,7921
CR_07_S186.trim02,6,293720,293700,245877,47823,25097,9468,0.0163,0.002147,8086
CR_08_S104.trim02,6,276057,276038,216518,59520,24627,8646,0.016483,0.002157,7369
CR_09_S116.trim02,6,289102,289086,237655,51431,24989,9513,0.016742,0.002131,8122
CR_10_S128.trim02,6,274193,274176,216553,57623,23929,8813,0.016643,0.002317,7498


In [18]:
## Create a branch of the existing assembly 'ref_gen_qrob_trim02'
ref_gen_rob_trim02_1 = ref_gen_rob_trim02.branch("ref_gen_rob_trim02_1")
### Set the clustering threshold parameter to 0.70 for SNP identification
ref_gen_rob_trim02_1.set_params("clust_threshold", "0.70")
### Specify the population assignment file for the analysis
ref_gen_rob_trim02_1.set_params('pop_assign_file', 'popmap_8sites_2zones_trim02.txt')
### Run step 7 of the ipyrad workflow with auto and force options enabled
ref_gen_rob_trim02_1.run("7", auto=True, force=True)

## Create a branch of 'ref_gen_qrob_trim02', set parameters for clustering threshold (0.9) and minimum samples per locus (79), and run step 7 with auto and force options enabled
ref_gen_rob_trim02_2 = ref_gen_rob_trim02.branch("ref_gen_rob_trim02_2")
ref_gen_rob_trim02_2.set_params("min_samples_locus", 79)
ref_gen_rob_trim02_2.set_params("clust_threshold", "0.90")

ref_gen_rob_trim02_2.run("7", auto=True, force=True)



Parallel connection | n311: 12 cores
[####################] 100% 0:00:15 | applying filters     | s7 |
[####################] 100% 0:00:02 | building arrays      | s7 |
[####################] 100% 0:00:00 | writing conversions  | s7 |
[####################] 100% 0:00:03 | indexing vcf depths  | s7 |
[####################] 100% 0:00:01 | writing vcf output   | s7 |
Parallel connection closed.
Parallel connection | n311: 12 cores
[####################] 100% 0:00:15 | applying filters     | s7 |
[####################] 100% 0:00:02 | building arrays      | s7 |
[####################] 100% 0:00:00 | writing conversions  | s7 |
[####################] 100% 0:00:03 | indexing vcf depths  | s7 |
[####################] 100% 0:00:01 | writing vcf output   | s7 |
Parallel connection closed.


In [19]:
ref_gen_rob_trim02_1.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim02,6,315211,315194,266591,48603,25024,9993,0.01638,0.002155,8498
CR_02_S127.trim02,6,325350,325316,260306,65010,25137,9861,0.016669,0.002143,8342
CR_03_S139.trim02,6,278113,278091,223657,54434,24360,8932,0.01585,0.002189,7673
CR_04_S151.trim02,6,312728,312672,253692,58980,25943,9870,0.01607,0.002117,8428
CR_05_S163.trim02,6,308404,308357,244388,63969,24894,9443,0.017033,0.002142,7979
CR_06_S175.trim02,6,278013,277987,238370,39617,24287,9195,0.015838,0.00213,7921
CR_07_S186.trim02,6,293720,293700,245877,47823,25097,9468,0.0163,0.002147,8086
CR_08_S104.trim02,6,276057,276038,216518,59520,24627,8646,0.016483,0.002157,7369
CR_09_S116.trim02,6,289102,289086,237655,51431,24989,9513,0.016742,0.002131,8122
CR_10_S128.trim02,6,274193,274176,216553,57623,23929,8813,0.016643,0.002317,7498


In [20]:
ref_gen_rob_trim02_2.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim02,6,315211,315194,266591,48603,25024,9993,0.01638,0.002155,8498
CR_02_S127.trim02,6,325350,325316,260306,65010,25137,9861,0.016669,0.002143,8342
CR_03_S139.trim02,6,278113,278091,223657,54434,24360,8932,0.01585,0.002189,7673
CR_04_S151.trim02,6,312728,312672,253692,58980,25943,9870,0.01607,0.002117,8428
CR_05_S163.trim02,6,308404,308357,244388,63969,24894,9443,0.017033,0.002142,7979
CR_06_S175.trim02,6,278013,277987,238370,39617,24287,9195,0.015838,0.00213,7921
CR_07_S186.trim02,6,293720,293700,245877,47823,25097,9468,0.0163,0.002147,8086
CR_08_S104.trim02,6,276057,276038,216518,59520,24627,8646,0.016483,0.002157,7369
CR_09_S116.trim02,6,289102,289086,237655,51431,24989,9513,0.016742,0.002131,8122
CR_10_S128.trim02,6,274193,274176,216553,57623,23929,8813,0.016643,0.002317,7498
