# Ensamble con genoma de referencia

### Genoma de referencia: _Quercus lobata_ 

El genoma de _Q. lobata_ se descargó en la página [Genomic Resources de Valley Oak](https://valleyoak.ucla.edu/genomic-resources/), The University of California, Los Angeles (UCLA).

Este genoma fue publicado en el trabajo de [Sork y colaboradores (2022)](https://www.nature.com/articles/s41467-022-29584-y).

- Prueba con **Trimmomatic 02** 

- 79 archivos fastq de *Quercus macdougallii*

Después del paso 6 se crearon las ramas (_branches_) **ref_gen_qlob_trim02_1** y **ref_gen_qlob_trim02_2**, en las cuales se establecieron parámetros específicos para los umbrales de agrupamiento, se agregó un archivo de asignación de población y se determinó el número mínimo de muestras por locus.

In [1]:
#conda install ipyrad -c ipyrad
#conda install toytree -c eaton-lab
#conda install entrez-direct -c bioconda
#conda install sratools -c bioconda

In [2]:
#Importar librerias
import ipyrad as ip
import ipyparallel as ipp

In [3]:
## Crear un objeto Ensamble llamado data1. 
ref_gen_qlob_trim02 = ip.Assembly("ref_gen_qlob_trim02")

New Assembly: ref_gen_qlob_trim02


In [4]:
## prints the parameters to the screen
ref_gen_qlob_trim02.get_params()

0   assembly_name               ref_gen_qlob_trim02                          
1   project_dir                 /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/bin
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path                                                        
5   assembly_method             denovo                                       
6   reference_sequence                                                       
7   datatype                    rad                                          
8   restriction_overhang        ('TGCAG', '')                                
9   max_low_qual_bases          5                                            
10  phred_Qscore_offset         33                                           
11  mindepth_statistical        6                                            
12  mindepth_majrule            6

In [5]:
## setting/modifying parameters for this Assembly object
ref_gen_qlob_trim02.set_params('project_dir', '../../data/1.3.assemble_variant_calling/1.3.2.ipyrad/ref_gen_qlob_trim02')
ref_gen_qlob_trim02.set_params('sorted_fastq_path', "../../data/1.1.filter/*trim02.fastq.gz")
ref_gen_qlob_trim02.set_params('assembly_method', "reference")
ref_gen_qlob_trim02.set_params('reference_sequence', '../../data/reference_genomes/Qlobata.v3.0.RptMsk4.0.6.on-RptMdl1.0.8.softmasked.fasta')
ref_gen_qlob_trim02.set_params('datatype', 'ddrad')
ref_gen_qlob_trim02.set_params('output_formats', ['p', 's', 'k', 'g', 'v'])

## prints the parameters to the screen
ref_gen_qlob_trim02.get_params()

0   assembly_name               ref_gen_qlob_trim02                          
1   project_dir                 /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/1.3.assemble_variant_calling/1.3.2.ipyrad/ref_gen_qlob_trim02
2   raw_fastq_path                                                           
3   barcodes_path                                                            
4   sorted_fastq_path           /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/1.1.filter/*trim02.fastq.gz
5   assembly_method             reference                                    
6   reference_sequence          /media/jaz/n311y_pc/Bioinformatic/Qmacdougallii_genomics_and_environment/data/reference_genomes/Qlobata.v3.0.RptMsk4.0.6.on-RptMdl1.0.8.softmasked.fasta
7   datatype                    ddrad                                        
8   restriction_overhang        ('TGCAG', '')                                
9   max_low_qual_bases          5    

In [6]:
## run step 1 to create Samples objects
ref_gen_qlob_trim02.run("1", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:25 | loading reads        | s1 |
Parallel connection closed.


In [7]:
ref_gen_qlob_trim02.stats

Unnamed: 0,state,reads_raw
CR_01_S115.trim02,1,315211
CR_02_S127.trim02,1,325350
CR_03_S139.trim02,1,278113
CR_04_S151.trim02,1,312728
CR_05_S163.trim02,1,308404
CR_06_S175.trim02,1,278013
CR_07_S186.trim02,1,293720
CR_08_S104.trim02,1,276057
CR_09_S116.trim02,1,289102
CR_10_S128.trim02,1,274193


In [8]:
## run step 2 to create Samples objects
ref_gen_qlob_trim02.run("2", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:01:47 | processing reads     | s2 |
Parallel connection closed.


In [9]:
ref_gen_qlob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter
CR_01_S115.trim02,2,315211,315194
CR_02_S127.trim02,2,325350,325316
CR_03_S139.trim02,2,278113,278091
CR_04_S151.trim02,2,312728,312672
CR_05_S163.trim02,2,308404,308357
CR_06_S175.trim02,2,278013,277987
CR_07_S186.trim02,2,293720,293700
CR_08_S104.trim02,2,276057,276038
CR_09_S116.trim02,2,289102,289086
CR_10_S128.trim02,2,274193,274176


In [10]:
## run step 3
ref_gen_qlob_trim02.run("3", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:00 | indexing reference   | s3 |
[####################] 100% 0:02:20 | dereplicating        | s3 |
[####################] 100% 0:24:51 | mapping reads        | s3 |
[####################] 100% 0:07:53 | building clusters    | s3 |
[####################] 100% 0:00:07 | calc cluster stats   | s3 |
Parallel connection closed.


In [11]:
ref_gen_qlob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth
CR_01_S115.trim02,3,315211,315194,269863,45331,25440,10097
CR_02_S127.trim02,3,325350,325316,263571,61745,25450,9889
CR_03_S139.trim02,3,278113,278091,226225,51866,24948,9069
CR_04_S151.trim02,3,312728,312672,256581,56091,26483,10025
CR_05_S163.trim02,3,308404,308357,247417,60940,25379,9497
CR_06_S175.trim02,3,278013,277987,241385,36602,24686,9356
CR_07_S186.trim02,3,293720,293700,248538,45162,25471,9583
CR_08_S104.trim02,3,276057,276038,218923,57115,25069,8774
CR_09_S116.trim02,3,289102,289086,240578,48508,25416,9672
CR_10_S128.trim02,3,274193,274176,219146,55030,24182,8902


In [12]:
## run step 4
ref_gen_qlob_trim02.run("4", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:03:27 | inferring [H, E]     | s4 |
Parallel connection closed.


In [13]:
ref_gen_qlob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est
CR_01_S115.trim02,4,315211,315194,269863,45331,25440,10097,0.014905,0.002223
CR_02_S127.trim02,4,325350,325316,263571,61745,25450,9889,0.015689,0.002126
CR_03_S139.trim02,4,278113,278091,226225,51866,24948,9069,0.014974,0.002173
CR_04_S151.trim02,4,312728,312672,256581,56091,26483,10025,0.0153,0.002209
CR_05_S163.trim02,4,308404,308357,247417,60940,25379,9497,0.015915,0.002132
CR_06_S175.trim02,4,278013,277987,241385,36602,24686,9356,0.015022,0.002069
CR_07_S186.trim02,4,293720,293700,248538,45162,25471,9583,0.015367,0.002133
CR_08_S104.trim02,4,276057,276038,218923,57115,25069,8774,0.015264,0.002185
CR_09_S116.trim02,4,289102,289086,240578,48508,25416,9672,0.015596,0.002113
CR_10_S128.trim02,4,274193,274176,219146,55030,24182,8902,0.015651,0.002327


In [14]:
## run step 5
ref_gen_qlob_trim02.run("5", auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:08 | calculating depths   | s5 |
[####################] 100% 0:00:41 | chunking clusters    | s5 |
[####################] 100% 0:14:32 | consens calling      | s5 |
[####################] 100% 0:00:34 | indexing alleles     | s5 |
Parallel connection closed.


In [15]:
ref_gen_qlob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim02,5,315211,315194,269863,45331,25440,10097,0.014905,0.002223,8703
CR_02_S127.trim02,5,325350,325316,263571,61745,25450,9889,0.015689,0.002126,8488
CR_03_S139.trim02,5,278113,278091,226225,51866,24948,9069,0.014974,0.002173,7879
CR_04_S151.trim02,5,312728,312672,256581,56091,26483,10025,0.0153,0.002209,8684
CR_05_S163.trim02,5,308404,308357,247417,60940,25379,9497,0.015915,0.002132,8147
CR_06_S175.trim02,5,278013,277987,241385,36602,24686,9356,0.015022,0.002069,8184
CR_07_S186.trim02,5,293720,293700,248538,45162,25471,9583,0.015367,0.002133,8313
CR_08_S104.trim02,5,276057,276038,218923,57115,25069,8774,0.015264,0.002185,7603
CR_09_S116.trim02,5,289102,289086,240578,48508,25416,9672,0.015596,0.002113,8349
CR_10_S128.trim02,5,274193,274176,219146,55030,24182,8902,0.015651,0.002327,7686


In [16]:
## run step 6
ref_gen_qlob_trim02.run("6",auto=True, force=True)

Parallel connection | n311: 12 cores
[####################] 100% 0:00:08 | concatenating bams   | s6 |
[####################] 100% 0:00:01 | fetching regions     | s6 |
[####################] 100% 0:00:37 | building database    | s6 |
Parallel connection closed.


In [17]:
ref_gen_qlob_trim02.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim02,6,315211,315194,269863,45331,25440,10097,0.014905,0.002223,8703
CR_02_S127.trim02,6,325350,325316,263571,61745,25450,9889,0.015689,0.002126,8488
CR_03_S139.trim02,6,278113,278091,226225,51866,24948,9069,0.014974,0.002173,7879
CR_04_S151.trim02,6,312728,312672,256581,56091,26483,10025,0.0153,0.002209,8684
CR_05_S163.trim02,6,308404,308357,247417,60940,25379,9497,0.015915,0.002132,8147
CR_06_S175.trim02,6,278013,277987,241385,36602,24686,9356,0.015022,0.002069,8184
CR_07_S186.trim02,6,293720,293700,248538,45162,25471,9583,0.015367,0.002133,8313
CR_08_S104.trim02,6,276057,276038,218923,57115,25069,8774,0.015264,0.002185,7603
CR_09_S116.trim02,6,289102,289086,240578,48508,25416,9672,0.015596,0.002113,8349
CR_10_S128.trim02,6,274193,274176,219146,55030,24182,8902,0.015651,0.002327,7686


In [18]:
## Create a branch of the existing assembly 'ref_gen_qlob_trim02'
ref_gen_qlob_trim02_1 = ref_gen_qlob_trim02.branch("ref_gen_qlob_trim02_1")
### Set the clustering threshold parameter to 0.70 for SNP identification
ref_gen_qlob_trim02_1.set_params("clust_threshold", "0.70")
### Specify the population assignment file for the analysis
ref_gen_qlob_trim02_1.set_params('pop_assign_file', 'popmap_9sites_2zones_trim02.txt')
### Run step 7 of the ipyrad workflow with auto and force options enabled
ref_gen_qlob_trim02_1.run("7", auto=True, force=True)

## Create a branch of 'ref_gen_qlob_trim02', set parameters for clustering threshold (0.9) and minimum samples per locus (79), and run step 7 with auto and force options enabled
ref_gen_qlob_trim02_2 = ref_gen_qlob_trim02.branch("ref_gen_qlob_trim02_2")
ref_gen_qlob_trim02_2.set_params("min_samples_locus", 79)
ref_gen_qlob_trim02_2.set_params("clust_threshold", "0.90")

ref_gen_qlob_trim02_2.run("7", auto=True, force=True)



Parallel connection | n311: 12 cores
[####################] 100% 0:00:15 | applying filters     | s7 |
[####################] 100% 0:00:02 | building arrays      | s7 |
[####################] 100% 0:00:01 | writing conversions  | s7 |
[####################] 100% 0:00:03 | indexing vcf depths  | s7 |
[####################] 100% 0:00:01 | writing vcf output   | s7 |
Parallel connection closed.
Parallel connection | n311: 12 cores
[####################] 100% 0:00:15 | applying filters     | s7 |
[####################] 100% 0:00:02 | building arrays      | s7 |
[####################] 100% 0:00:01 | writing conversions  | s7 |
[####################] 100% 0:00:03 | indexing vcf depths  | s7 |
[####################] 100% 0:00:01 | writing vcf output   | s7 |
Parallel connection closed.


In [19]:
ref_gen_qlob_trim02_1.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim02,6,315211,315194,269863,45331,25440,10097,0.014905,0.002223,8703
CR_02_S127.trim02,6,325350,325316,263571,61745,25450,9889,0.015689,0.002126,8488
CR_03_S139.trim02,6,278113,278091,226225,51866,24948,9069,0.014974,0.002173,7879
CR_04_S151.trim02,6,312728,312672,256581,56091,26483,10025,0.0153,0.002209,8684
CR_05_S163.trim02,6,308404,308357,247417,60940,25379,9497,0.015915,0.002132,8147
CR_06_S175.trim02,6,278013,277987,241385,36602,24686,9356,0.015022,0.002069,8184
CR_07_S186.trim02,6,293720,293700,248538,45162,25471,9583,0.015367,0.002133,8313
CR_08_S104.trim02,6,276057,276038,218923,57115,25069,8774,0.015264,0.002185,7603
CR_09_S116.trim02,6,289102,289086,240578,48508,25416,9672,0.015596,0.002113,8349
CR_10_S128.trim02,6,274193,274176,219146,55030,24182,8902,0.015651,0.002327,7686


In [20]:
ref_gen_qlob_trim02_2.stats

Unnamed: 0,state,reads_raw,reads_passed_filter,refseq_mapped_reads,refseq_unmapped_reads,clusters_total,clusters_hidepth,hetero_est,error_est,reads_consens
CR_01_S115.trim02,6,315211,315194,269863,45331,25440,10097,0.014905,0.002223,8703
CR_02_S127.trim02,6,325350,325316,263571,61745,25450,9889,0.015689,0.002126,8488
CR_03_S139.trim02,6,278113,278091,226225,51866,24948,9069,0.014974,0.002173,7879
CR_04_S151.trim02,6,312728,312672,256581,56091,26483,10025,0.0153,0.002209,8684
CR_05_S163.trim02,6,308404,308357,247417,60940,25379,9497,0.015915,0.002132,8147
CR_06_S175.trim02,6,278013,277987,241385,36602,24686,9356,0.015022,0.002069,8184
CR_07_S186.trim02,6,293720,293700,248538,45162,25471,9583,0.015367,0.002133,8313
CR_08_S104.trim02,6,276057,276038,218923,57115,25069,8774,0.015264,0.002185,7603
CR_09_S116.trim02,6,289102,289086,240578,48508,25416,9672,0.015596,0.002113,8349
CR_10_S128.trim02,6,274193,274176,219146,55030,24182,8902,0.015651,0.002327,7686
