### Running ipyrad in IPython (or Jupyter notebooks)
All of the code in this notebook is written in IPython. We assume here that you have started an ipcluster instance in a separate terminal to start up a number of parallel engines with a command similar to the one commented below. 

In [None]:
## 
##   ipcluster start --n 4 --daemonize
##
##

### import ipyrad in IPython


In [10]:
import ipyrad as ip
print ip.__version__

0.3.32


### Let's assemble some data

In [11]:
## create a new Assembly class object
data1 = ip.Assembly("test")

  New Assembly: test


In [12]:
## set new params 
data1.set_params("project_dir", "api-test")
data1.set_params("raw_fastq_path", "ipsimdata/rad_example_R1_.fastq.gz")
data1.set_params("barcodes_path", "ipsimdata/rad_example_barcodes.txt")

## print the params
data1.get_params()

  0   assembly_name               test                                         
  1   project_dir                 ./api-test                                   
  2   raw_fastq_path              ./ipsimdata/rad_example_R1_.fastq.gz         
  3   barcodes_path               ./ipsimdata/rad_example_barcodes.txt         
  4   sorted_fastq_path                                                        
  5   assembly_method             denovo                                       
  6   reference_sequence                                                       
  7   datatype                    rad                                          
  8   restriction_overhang        ('TGCAG', '')                                
  9   max_low_qual_bases          5                                            
  10  phred_Qscore_offset         33                                           
  11  mindepth_statistical        6                                            
  12  mindepth_majrule            6     

### assemble the data set
... we have to work on making the spacing more uniform in the progress bars... 

In [13]:
data1.run('1234567')


  Assembly: test

  [####################] 100%  chunking large files  | 0:00:00 
  [####################] 100%  sorting reads         | 0:00:05 
  [####################] 100%  writing/compressing   | 0:00:04 

  [####################] 100%  processing reads      | 0:00:44 

  [####################] 100%  dereplicating         | 0:00:00 
  [####################] 100%  clustering            | 0:00:00 
  [####################] 100%  chunking              | 0:00:00 
  [####################] 100%  aligning              | 0:00:46 
  [####################] 100%  concatenating         | 0:00:00 

  [####################] 100%  inferring [H, E]      | 0:01:08 

  [####################] 100%  consensus calling     | 0:00:35 
  [####################] 100%  concat/shuffle input  | 0:00:00 
  [####################] 100%  clustering across     | 0:00:01 
  [####################] 100%  building clusters     | 0:00:02 
  [####################] 100%  aligning clusters     | 0:00:08 
  [##############

### branch the data set


In [18]:
## create new branch
data2 = data1.branch("data2")

## modify the branch params
data2.set_params("clust_threshold", "0.90")

## run steps, use force b/c we rewinded to step 3
data2.run("67", force=True)


  Assembly: data2
  [####################] 100%  concat/shuffle input  | 0:00:00 
  [####################] 100%  clustering across     | 0:00:01 
  [####################] 100%  building clusters     | 0:00:02 
  [####################] 100%  aligning clusters     | 0:00:09 
  [####################] 100%  indexing clusters     | 0:00:19 
  [####################] 100%  building database     | 0:00:05 
  [####################] 100%  filtering loci        | 0:00:00 
  [####################] 100%  building loci/stats   | 0:00:01 
  [####################] 100%  building vcf file     | 0:00:10 
  [####################] 100%  writing outfiles      | 0:00:01 
  Outfiles written to: ~/Documents/ipyrad/tests/api-test/data2_outfiles


### View assembly stats

In [19]:
print data1.stats

      state  reads_raw  reads_filtered  clusters_total  clusters_hidepth  \
1A_0      6      20046           20046            1000              1000   
1B_0      6      19932           19932            1000              1000   
1C_0      6      20007           20007            1000              1000   
1D_0      6      19946           19946            1000              1000   
2E_0      6      19839           19839            1000              1000   
2F_0      6      19950           19950            1000              1000   
2G_0      6      19844           19844            1000              1000   
2H_0      6      20102           20102            1000              1000   
3I_0      6      20061           20061            1000              1000   
3J_0      6      19961           19961            1000              1000   
3K_0      6      20188           20188            1000              1000   
3L_0      6      20012           20012            1000              1000   

      heter

### Access statistics
You can see all of the attributes and functions available to Assembly class objects by using tab-completion after the object. 

In [25]:
## data1.<tab>

### Plot stats
The toyplot package is very cool library for making interactive figures.
Click on the figure below to see an example. 

In [47]:
import toyplot

## plot some of the stats results
canvas = toyplot.Canvas(width=300, height=300)
axes = canvas.cartesian(xlabel="N filtered reads", 
                        ylabel="heterozygosity")
axes.y.label.style = {"font-size": "14px"}
axes.x.label.style = {"font-size": "14px"}
axes.scatterplot(data1.stats.error_est,
                 data1.stats.hetero_est,
                 size=7.5)

<toyplot.mark.Scatterplot at 0x7f43cefe8190>