# Ensamble con genoma de referencia

### Genoma de referencia: _Quercus robur_ 

El genoma de _Quercus robur_ se descargó de la página Oak Genome Sequencing. El sitio web pertenece a un proyecto internacional de secuenciación del genoma del roble europeo (_Q. robur_). El proyecto está asociado con instituciones como INRAE (Institut National de la Recherche pour l'Agriculture, l'Alimentation et l'Environnement) en Francia.

Este genoma fue publicado en el trabajo de [Plomion y colaboradores (2018)](https://www.nature.com/articles/s41477-018-0172-3).


- Prueba con Trimmomatic03

- 79 archivos fastq de *Quercus macdougallii*

Después del paso 6 se crearon las ramas (_branches_) **ref_gen_qrob_trim03_1** y **ref_gen_qrob_trim03_2**, en las cuales se establecieron parámetros específicos para los umbrales de agrupamiento, se agregó un archivo de asignación de población y se determinó el número mínimo de muestras por locus.

In [1]:
#conda install ipyrad -c ipyrad
#conda install toytree -c eaton-lab
#conda install entrez-direct -c bioconda
#conda install sratools -c bioconda

In [2]:
#Importar librerias
import ipyrad as ip
import ipyparallel as ipp

In [3]:
## Crear un objeto Ensamble llamado data1. 
ref_gen_rob_trim03 = ip.Assembly("ref_gen_rob_trim03")


New Assembly: ref_gen_rob_trim03


In [4]:
## prints the parameters to the screen
ref_gen_rob_trim03.get_params()

0   assembly_name               ref_gen_rob_trim03                           
1   project_dir                 /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/bin
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path                                                        
5   assembly_method             denovo                                       
6   reference_sequence                                                       
7   datatype                    rad                                          
8   restriction_overhang        ('TGCAG', '')                                
9   max_low_qual_bases          5                                            
10  phred_Qscore_offset         33                                           
11  mindepth_statistical        6                                            
12  mindepth_majrule            6

In [5]:
## setting/modifying parameters for this Assembly object
ref_gen_rob_trim03.set_params('project_dir', '../../data/1.3.assembly_variant_calling/ref_gen_qrob_trim03')
ref_gen_rob_trim03.set_params('sorted_fastq_path', "../../data/1.1.filter/*trim03.fastq.gz")
ref_gen_rob_trim03.set_params('assembly_method', "reference")
ref_gen_rob_trim03.set_params('reference_sequence', '../../data/reference_genomes/Qrob_PM1N.fa')
ref_gen_rob_trim03.set_params('datatype', 'ddrad')
ref_gen_rob_trim03.set_params('output_formats', ['p', 's', 'k', 'g', 'v'])

## prints the parameters to the screen
ref_gen_rob_trim03.get_params()

0   assembly_name               ref_gen_rob_trim03                           
1   project_dir                 /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/1.3.assemble_variant_calling/1.3.2.ipyrad/ref_gen_qrob_trim03
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path           /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/1.1.filter/*trim03.fastq.gz
5   assembly_method             reference                                    
6   reference_sequence          /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/reference_genomes/Qrob_PM1N.fa
7   datatype                    ddrad                                        
8   restriction_overhang        ('TGCAG', '')                                
9   max_low_qual_bases          5                                            
10 

In [6]:
## run step 1 to create Samples objects
ref_gen_rob_trim03.run("1", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:54 | loading reads        | s1 |
Parallel connection closed.


In [7]:
ref_gen_rob_trim03.stats

Unnamed: 0,state,reads_raw
CR_01_S115.trim03,1,637962
CR_02_S127.trim03,1,646858
CR_03_S139.trim03,1,574664
CR_04_S151.trim03,1,641301
CR_05_S163.trim03,1,619059
CR_06_S175.trim03,1,564596
CR_07_S186.trim03,1,601967
CR_08_S104.trim03,1,604989
CR_09_S116.trim03,1,612353
CR_10_S128.trim03,1,581934


In [8]:
## run step 2 to create Samples objects
ref_gen_rob_trim03.run("2", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:04:08 | processing reads     | s2 |
Parallel connection closed.


In [9]:
ref_gen_rob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter
CR_01_S115.trim03,2,637962,637946
CR_02_S127.trim03,2,646858,646822
CR_03_S139.trim03,2,574664,574644
CR_04_S151.trim03,2,641301,641250
CR_05_S163.trim03,2,619059,619011
CR_06_S175.trim03,2,564596,564564
CR_07_S186.trim03,2,601967,601950
CR_08_S104.trim03,2,604989,604973
CR_09_S116.trim03,2,612353,612337
CR_10_S128.trim03,2,581934,581914


In [10]:
## run step 3
ref_gen_rob_trim03.run("3", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:03 | indexing reference   | s3 |
[####################] 100% 0:03:00 | dereplicating        | s3 |
[####################] 100% 0:32:11 | mapping reads        | s3 |
[####################] 100% 0:05:30 | building clusters    | s3 |
[####################] 100% 0:00:08 | calc cluster stats   | s3 |
Parallel connection closed.


In [11]:
ref_gen_rob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth
CR_01_S115.trim03,3,637962,637946,533469,104477,30680,14948
CR_02_S127.trim03,3,646858,646822,512825,133997,31084,14590
CR_03_S139.trim03,3,574664,574644,455291,119353,30186,14074
CR_04_S151.trim03,3,641301,641250,514080,127170,32094,15254
CR_05_S163.trim03,3,619059,619011,483313,135698,30808,14360
CR_06_S175.trim03,3,564596,564564,478994,85570,30090,14128
CR_07_S186.trim03,3,601967,601950,498160,103790,31452,14556
CR_08_S104.trim03,3,604989,604973,469597,135376,31168,14456
CR_09_S116.trim03,3,612353,612337,497077,115260,31240,14986
CR_10_S128.trim03,3,581934,581914,452898,129016,30001,13920


In [12]:
## run step 4
ref_gen_rob_trim03.run("4", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:03:48 | inferring [H, E]     | s4 |
Parallel connection closed.


In [13]:
ref_gen_rob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est
CR_01_S115.trim03,4,637962,637946,533469,104477,30680,14948,0.014172,0.002459
CR_02_S127.trim03,4,646858,646822,512825,133997,31084,14590,0.014546,0.002397
CR_03_S139.trim03,4,574664,574644,455291,119353,30186,14074,0.014156,0.002499
CR_04_S151.trim03,4,641301,641250,514080,127170,32094,15254,0.014439,0.002527
CR_05_S163.trim03,4,619059,619011,483313,135698,30808,14360,0.014435,0.002514
CR_06_S175.trim03,4,564596,564564,478994,85570,30090,14128,0.013709,0.002483
CR_07_S186.trim03,4,601967,601950,498160,103790,31452,14556,0.014092,0.002512
CR_08_S104.trim03,4,604989,604973,469597,135376,31168,14456,0.014368,0.002666
CR_09_S116.trim03,4,612353,612337,497077,115260,31240,14986,0.014291,0.00248
CR_10_S128.trim03,4,581934,581914,452898,129016,30001,13920,0.014625,0.002573


In [14]:
## run step 5
ref_gen_rob_trim03.run("5", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:09 | calculating depths   | s5 |
[####################] 100% 0:00:34 | chunking clusters    | s5 |
[####################] 100% 0:22:32 | consens calling      | s5 |
[####################] 100% 0:00:30 | indexing alleles     | s5 |
Parallel connection closed.


In [15]:
ref_gen_rob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim03,5,637962,637946,533469,104477,30680,14948,0.014172,0.002459,13052
CR_02_S127.trim03,5,646858,646822,512825,133997,31084,14590,0.014546,0.002397,12686
CR_03_S139.trim03,5,574664,574644,455291,119353,30186,14074,0.014156,0.002499,12298
CR_04_S151.trim03,5,641301,641250,514080,127170,32094,15254,0.014439,0.002527,13344
CR_05_S163.trim03,5,619059,619011,483313,135698,30808,14360,0.014435,0.002514,12479
CR_06_S175.trim03,5,564596,564564,478994,85570,30090,14128,0.013709,0.002483,12439
CR_07_S186.trim03,5,601967,601950,498160,103790,31452,14556,0.014092,0.002512,12761
CR_08_S104.trim03,5,604989,604973,469597,135376,31168,14456,0.014368,0.002666,12644
CR_09_S116.trim03,5,612353,612337,497077,115260,31240,14986,0.014291,0.00248,13079
CR_10_S128.trim03,5,581934,581914,452898,129016,30001,13920,0.014625,0.002573,12143


In [16]:
## run step 6
ref_gen_rob_trim03.run("6",auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:11 | concatenating bams   | s6 |
[####################] 100% 0:00:02 | fetching regions     | s6 |
[####################] 100% 0:00:43 | building database    | s6 |
Parallel connection closed.


In [17]:
ref_gen_rob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim03,6,637962,637946,533469,104477,30680,14948,0.014172,0.002459,13052
CR_02_S127.trim03,6,646858,646822,512825,133997,31084,14590,0.014546,0.002397,12686
CR_03_S139.trim03,6,574664,574644,455291,119353,30186,14074,0.014156,0.002499,12298
CR_04_S151.trim03,6,641301,641250,514080,127170,32094,15254,0.014439,0.002527,13344
CR_05_S163.trim03,6,619059,619011,483313,135698,30808,14360,0.014435,0.002514,12479
CR_06_S175.trim03,6,564596,564564,478994,85570,30090,14128,0.013709,0.002483,12439
CR_07_S186.trim03,6,601967,601950,498160,103790,31452,14556,0.014092,0.002512,12761
CR_08_S104.trim03,6,604989,604973,469597,135376,31168,14456,0.014368,0.002666,12644
CR_09_S116.trim03,6,612353,612337,497077,115260,31240,14986,0.014291,0.00248,13079
CR_10_S128.trim03,6,581934,581914,452898,129016,30001,13920,0.014625,0.002573,12143


In [18]:
## Create a branch of the existing assembly 'ref_gen_qrob_trim03'
ref_gen_rob_trim03_1 = ref_gen_rob_trim03.branch("ref_gen_rob_trim03_1")
### Set the clustering threshold parameter to 0.70 for SNP identification
ref_gen_rob_trim03_1.set_params("clust_threshold", "0.70")
### Specify the population assignment file for the analysis
ref_gen_rob_trim03_1.set_params('pop_assign_file', 'popmap_8sites_2zones_trim03.txt')
### Run step 7 of the ipyrad workflow with auto and force options enabled
ref_gen_rob_trim03_1.run("7", auto=True, force=True)

## Create a branch of 'ref_gen_qrob_trim03', set parameters for clustering threshold (0.9) and minimum samples per locus (79), and run step 7 with auto and force options enabled
ref_gen_rob_trim03_2 = ref_gen_rob_trim03.branch("ref_gen_rob_trim03_2")
ref_gen_rob_trim03_2.set_params("min_samples_locus", 79)
ref_gen_rob_trim03_2.set_params("clust_threshold", "0.90")

ref_gen_rob_trim03_2.run("7", auto=True, force=True)



Parallel connection | n311: 12 cores
[####################] 100% 0:00:17 | applying filters     | s7 |
[####################] 100% 0:00:03 | building arrays      | s7 |
[####################] 100% 0:00:01 | writing conversions  | s7 |
[####################] 100% 0:00:05 | indexing vcf depths  | s7 |
[####################] 100% 0:00:03 | writing vcf output   | s7 |
Parallel connection closed.
Parallel connection | n311: 12 cores
[####################] 100% 0:00:17 | applying filters     | s7 |
[####################] 100% 0:00:03 | building arrays      | s7 |
[####################] 100% 0:00:01 | writing conversions  | s7 |
[####################] 100% 0:00:05 | indexing vcf depths  | s7 |
[####################] 100% 0:00:03 | writing vcf output   | s7 |
Parallel connection closed.


In [19]:
ref_gen_rob_trim03_1.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim03,6,637962,637946,533469,104477,30680,14948,0.014172,0.002459,13052
CR_02_S127.trim03,6,646858,646822,512825,133997,31084,14590,0.014546,0.002397,12686
CR_03_S139.trim03,6,574664,574644,455291,119353,30186,14074,0.014156,0.002499,12298
CR_04_S151.trim03,6,641301,641250,514080,127170,32094,15254,0.014439,0.002527,13344
CR_05_S163.trim03,6,619059,619011,483313,135698,30808,14360,0.014435,0.002514,12479
CR_06_S175.trim03,6,564596,564564,478994,85570,30090,14128,0.013709,0.002483,12439
CR_07_S186.trim03,6,601967,601950,498160,103790,31452,14556,0.014092,0.002512,12761
CR_08_S104.trim03,6,604989,604973,469597,135376,31168,14456,0.014368,0.002666,12644
CR_09_S116.trim03,6,612353,612337,497077,115260,31240,14986,0.014291,0.00248,13079
CR_10_S128.trim03,6,581934,581914,452898,129016,30001,13920,0.014625,0.002573,12143


In [20]:
ref_gen_rob_trim03_2.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim03,6,637962,637946,533469,104477,30680,14948,0.014172,0.002459,13052
CR_02_S127.trim03,6,646858,646822,512825,133997,31084,14590,0.014546,0.002397,12686
CR_03_S139.trim03,6,574664,574644,455291,119353,30186,14074,0.014156,0.002499,12298
CR_04_S151.trim03,6,641301,641250,514080,127170,32094,15254,0.014439,0.002527,13344
CR_05_S163.trim03,6,619059,619011,483313,135698,30808,14360,0.014435,0.002514,12479
CR_06_S175.trim03,6,564596,564564,478994,85570,30090,14128,0.013709,0.002483,12439
CR_07_S186.trim03,6,601967,601950,498160,103790,31452,14556,0.014092,0.002512,12761
CR_08_S104.trim03,6,604989,604973,469597,135376,31168,14456,0.014368,0.002666,12644
CR_09_S116.trim03,6,612353,612337,497077,115260,31240,14986,0.014291,0.00248,13079
CR_10_S128.trim03,6,581934,581914,452898,129016,30001,13920,0.014625,0.002573,12143
