# Ensamble con genoma de referencia

### Genoma de referencia: _Quercus lobata_ 

El genoma de _Q. lobata_ se descargó en la página [Genomic Resources de Valley Oak](https://valleyoak.ucla.edu/genomic-resources/), The University of California, Los Angeles (UCLA).

Este genoma fue publicado en el trabajo de [Sork y colaboradores (2022)](https://www.nature.com/articles/s41467-022-29584-y).

- Prueba con **Trimmomatic 03** 

- 79 archivos fastq de *Quercus macdougallii*

Después del paso 6 se crearon las ramas (_branches_) **ref_gen_qlob_trim03_1** y **ref_gen_qlob_trim03_2**, en las cuales se establecieron parámetros específicos para los umbrales de agrupamiento, se agregó un archivo de asignación de población y se determinó el número mínimo de muestras por locus.

In [1]:
#conda install ipyrad -c ipyrad
#conda install toytree -c eaton-lab
#conda install entrez-direct -c bioconda
#conda install sratools -c bioconda

In [2]:
#Importar librerias
import ipyrad as ip
import ipyparallel as ipp

In [3]:
## Crear un objeto Ensamble llamado data1. 
ref_gen_qlob_trim03 = ip.Assembly("ref_gen_qlob_trim03")


New Assembly: ref_gen_qlob_trim03


In [4]:
## prints the parameters to the screen
ref_gen_qlob_trim03.get_params()

0   assembly_name               ref_gen_qlob_trim03                          
1   project_dir                 /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/bin
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path                                                        
5   assembly_method             denovo                                       
6   reference_sequence                                                       
7   datatype                    rad                                          
8   restriction_overhang        ('TGCAG', '')                                
9   max_low_qual_bases          5                                            
10  phred_Qscore_offset         33                                           
11  mindepth_statistical        6                                            
12  mindepth_majrule            6

In [5]:
## setting/modifying parameters for this Assembly object
ref_gen_qlob_trim03.set_params('project_dir', '../../data/1.3.assembly_variant_calling/ref_gen_qlob_trim03')
ref_gen_qlob_trim03.set_params('sorted_fastq_path', "../../data/1.1.filter/*trim03.fastq.gz")
ref_gen_qlob_trim03.set_params('assembly_method', "reference")
ref_gen_qlob_trim03.set_params('reference_sequence', '../../data/reference_genomes/Qlobata.v3.0.RptMsk4.0.6.on-RptMdl1.0.8.softmasked.fasta')
ref_gen_qlob_trim03.set_params('datatype', 'ddrad')
ref_gen_qlob_trim03.set_params('output_formats', ['p', 's', 'k', 'g', 'v'])

## prints the parameters to the screen
ref_gen_qlob_trim03.get_params()

0   assembly_name               ref_gen_qlob_trim03                          
1   project_dir                 /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/1.3.assemble_variant_calling/1.3.2.ipyrad/ref_gen_qlob_trim03
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path           /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/1.1.filter/*trim03.fastq.gz
5   assembly_method             reference                                    
6   reference_sequence          /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/reference_genomes/Qlobata.v3.0.RptMsk4.0.6.on-RptMdl1.0.8.softmasked.fasta
7   datatype                    ddrad                                        
8   restriction_overhang        ('TGCAG', '')                                
9   max_low_qual_bases          5    

In [6]:
## run step 1 to create Samples objects
ref_gen_qlob_trim03.run("1", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:55 | loading reads        | s1 |
Parallel connection closed.


In [7]:
ref_gen_qlob_trim03.stats

Unnamed: 0,state,reads_raw
CR_01_S115.trim03,1,637962
CR_02_S127.trim03,1,646858
CR_03_S139.trim03,1,574664
CR_04_S151.trim03,1,641301
CR_05_S163.trim03,1,619059
CR_06_S175.trim03,1,564596
CR_07_S186.trim03,1,601967
CR_08_S104.trim03,1,604989
CR_09_S116.trim03,1,612353
CR_10_S128.trim03,1,581934


In [8]:
## run step 2 to create Samples objects
ref_gen_qlob_trim03.run("2", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:04:06 | processing reads     | s2 |
Parallel connection closed.


In [9]:
ref_gen_qlob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter
CR_01_S115.trim03,2,637962,637946
CR_02_S127.trim03,2,646858,646822
CR_03_S139.trim03,2,574664,574644
CR_04_S151.trim03,2,641301,641250
CR_05_S163.trim03,2,619059,619011
CR_06_S175.trim03,2,564596,564564
CR_07_S186.trim03,2,601967,601950
CR_08_S104.trim03,2,604989,604973
CR_09_S116.trim03,2,612353,612337
CR_10_S128.trim03,2,581934,581914


In [10]:
## run step 3
ref_gen_qlob_trim03.run("3", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:00 | indexing reference   | s3 |
[####################] 100% 0:03:06 | dereplicating        | s3 |
[####################] 100% 0:29:22 | mapping reads        | s3 |
[####################] 100% 0:06:54 | building clusters    | s3 |
[####################] 100% 0:00:10 | calc cluster stats   | s3 |
Parallel connection closed.


In [11]:
ref_gen_qlob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth
CR_01_S115.trim03,3,637962,637946,541604,96342,31562,15267
CR_02_S127.trim03,3,646858,646822,520850,125972,31623,14827
CR_03_S139.trim03,3,574664,574644,461864,112780,31010,14291
CR_04_S151.trim03,3,641301,641250,522068,119182,33019,15617
CR_05_S163.trim03,3,619059,619011,491241,127770,31639,14676
CR_06_S175.trim03,3,564596,564564,486888,77676,30631,14398
CR_07_S186.trim03,3,601967,601950,505248,96702,32034,14856
CR_08_S104.trim03,3,604989,604973,476311,128662,31944,14775
CR_09_S116.trim03,3,612353,612337,504966,107371,31901,15262
CR_10_S128.trim03,3,581934,581914,459759,122155,30746,14124


In [12]:
## run step 4
ref_gen_qlob_trim03.run("4", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:03:39 | inferring [H, E]     | s4 |
Parallel connection closed.


In [13]:
ref_gen_qlob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est
CR_01_S115.trim03,4,637962,637946,541604,96342,31562,15267,0.013961,0.002604
CR_02_S127.trim03,4,646858,646822,520850,125972,31623,14827,0.014529,0.002567
CR_03_S139.trim03,4,574664,574644,461864,112780,31010,14291,0.01398,0.002685
CR_04_S151.trim03,4,641301,641250,522068,119182,33019,15617,0.013717,0.002631
CR_05_S163.trim03,4,619059,619011,491241,127770,31639,14676,0.014121,0.002603
CR_06_S175.trim03,4,564596,564564,486888,77676,30631,14398,0.013615,0.002576
CR_07_S186.trim03,4,601967,601950,505248,96702,32034,14856,0.014242,0.002604
CR_08_S104.trim03,4,604989,604973,476311,128662,31944,14775,0.014405,0.002801
CR_09_S116.trim03,4,612353,612337,504966,107371,31901,15262,0.014303,0.002682
CR_10_S128.trim03,4,581934,581914,459759,122155,30746,14124,0.014181,0.002804


In [14]:
## run step 5
ref_gen_qlob_trim03.run("5", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:09 | calculating depths   | s5 |
[####################] 100% 0:00:23 | chunking clusters    | s5 |
[####################] 100% 0:22:37 | consens calling      | s5 |
[####################] 100% 0:00:27 | indexing alleles     | s5 |
Parallel connection closed.


In [15]:
ref_gen_qlob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim03,5,637962,637946,541604,96342,31562,15267,0.013961,0.002604,13384
CR_02_S127.trim03,5,646858,646822,520850,125972,31623,14827,0.014529,0.002567,12949
CR_03_S139.trim03,5,574664,574644,461864,112780,31010,14291,0.01398,0.002685,12550
CR_04_S151.trim03,5,641301,641250,522068,119182,33019,15617,0.013717,0.002631,13755
CR_05_S163.trim03,5,619059,619011,491241,127770,31639,14676,0.014121,0.002603,12922
CR_06_S175.trim03,5,564596,564564,486888,77676,30631,14398,0.013615,0.002576,12763
CR_07_S186.trim03,5,601967,601950,505248,96702,32034,14856,0.014242,0.002604,13063
CR_08_S104.trim03,5,604989,604973,476311,128662,31944,14775,0.014405,0.002801,12957
CR_09_S116.trim03,5,612353,612337,504966,107371,31901,15262,0.014303,0.002682,13368
CR_10_S128.trim03,5,581934,581914,459759,122155,30746,14124,0.014181,0.002804,12411


In [16]:
## run step 6
ref_gen_qlob_trim03.run("6",auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:14 | concatenating bams   | s6 |
[####################] 100% 0:00:02 | fetching regions     | s6 |
[####################] 100% 0:00:34 | building database    | s6 |
Parallel connection closed.


In [17]:
ref_gen_qlob_trim03.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim03,6,637962,637946,541604,96342,31562,15267,0.013961,0.002604,13384
CR_02_S127.trim03,6,646858,646822,520850,125972,31623,14827,0.014529,0.002567,12949
CR_03_S139.trim03,6,574664,574644,461864,112780,31010,14291,0.01398,0.002685,12550
CR_04_S151.trim03,6,641301,641250,522068,119182,33019,15617,0.013717,0.002631,13755
CR_05_S163.trim03,6,619059,619011,491241,127770,31639,14676,0.014121,0.002603,12922
CR_06_S175.trim03,6,564596,564564,486888,77676,30631,14398,0.013615,0.002576,12763
CR_07_S186.trim03,6,601967,601950,505248,96702,32034,14856,0.014242,0.002604,13063
CR_08_S104.trim03,6,604989,604973,476311,128662,31944,14775,0.014405,0.002801,12957
CR_09_S116.trim03,6,612353,612337,504966,107371,31901,15262,0.014303,0.002682,13368
CR_10_S128.trim03,6,581934,581914,459759,122155,30746,14124,0.014181,0.002804,12411


In [18]:
## Create a branch of the existing assembly 'ref_gen_qlob_trim03'
ref_gen_qlob_trim03_1 = ref_gen_qlob_trim03.branch("ref_gen_qlob_trim03_1")
### Set the clustering threshold parameter to 0.70 for SNP identification
ref_gen_qlob_trim03_1.set_params("clust_threshold", "0.70")
### Specify the population assignment file for the analysis
ref_gen_qlob_trim03_1.set_params('pop_assign_file', 'popmap_9sites_2zones_trim03.txt')
### Run step 7 of the ipyrad workflow with auto and force options enabled
ref_gen_qlob_trim03_1.run("7", auto=True, force=True)

## Create a branch of 'ref_gen_qlob_trim03', set parameters for clustering threshold (0.9) and minimum samples per locus (79), and run step 7 with auto and force options enabled
ref_gen_qlob_trim03_2 = ref_gen_qlob_trim03.branch("ref_gen_qlob_trim03_2")
ref_gen_qlob_trim03_2.set_params("min_samples_locus", 79)
ref_gen_qlob_trim03_2.set_params("clust_threshold", "0.90")

ref_gen_qlob_trim03_2.run("7", auto=True, force=True)



Parallel connection | n311: 12 cores
[####################] 100% 0:00:17 | applying filters     | s7 |
[####################] 100% 0:00:03 | building arrays      | s7 |
[####################] 100% 0:00:01 | writing conversions  | s7 |
[####################] 100% 0:00:06 | indexing vcf depths  | s7 |
[####################] 100% 0:00:03 | writing vcf output   | s7 |
Parallel connection closed.
Parallel connection | n311: 12 cores
[####################] 100% 0:00:17 | applying filters     | s7 |
[####################] 100% 0:00:03 | building arrays      | s7 |
[####################] 100% 0:00:01 | writing conversions  | s7 |
[####################] 100% 0:00:05 | indexing vcf depths  | s7 |
[####################] 100% 0:00:03 | writing vcf output   | s7 |
Parallel connection closed.


In [19]:
ref_gen_qlob_trim03_1.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim03,6,637962,637946,541604,96342,31562,15267,0.013961,0.002604,13384
CR_02_S127.trim03,6,646858,646822,520850,125972,31623,14827,0.014529,0.002567,12949
CR_03_S139.trim03,6,574664,574644,461864,112780,31010,14291,0.01398,0.002685,12550
CR_04_S151.trim03,6,641301,641250,522068,119182,33019,15617,0.013717,0.002631,13755
CR_05_S163.trim03,6,619059,619011,491241,127770,31639,14676,0.014121,0.002603,12922
CR_06_S175.trim03,6,564596,564564,486888,77676,30631,14398,0.013615,0.002576,12763
CR_07_S186.trim03,6,601967,601950,505248,96702,32034,14856,0.014242,0.002604,13063
CR_08_S104.trim03,6,604989,604973,476311,128662,31944,14775,0.014405,0.002801,12957
CR_09_S116.trim03,6,612353,612337,504966,107371,31901,15262,0.014303,0.002682,13368
CR_10_S128.trim03,6,581934,581914,459759,122155,30746,14124,0.014181,0.002804,12411


In [20]:
ref_gen_qlob_trim03_2.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim03,6,637962,637946,541604,96342,31562,15267,0.013961,0.002604,13384
CR_02_S127.trim03,6,646858,646822,520850,125972,31623,14827,0.014529,0.002567,12949
CR_03_S139.trim03,6,574664,574644,461864,112780,31010,14291,0.01398,0.002685,12550
CR_04_S151.trim03,6,641301,641250,522068,119182,33019,15617,0.013717,0.002631,13755
CR_05_S163.trim03,6,619059,619011,491241,127770,31639,14676,0.014121,0.002603,12922
CR_06_S175.trim03,6,564596,564564,486888,77676,30631,14398,0.013615,0.002576,12763
CR_07_S186.trim03,6,601967,601950,505248,96702,32034,14856,0.014242,0.002604,13063
CR_08_S104.trim03,6,604989,604973,476311,128662,31944,14775,0.014405,0.002801,12957
CR_09_S116.trim03,6,612353,612337,504966,107371,31901,15262,0.014303,0.002682,13368
CR_10_S128.trim03,6,581934,581914,459759,122155,30746,14124,0.014181,0.002804,12411
