### Running ipyrad in IPython (or Jupyter notebooks)
All of the code in this notebook is written in IPython. We assume here that you have started an ipcluster instance in a separate terminal to start up a number of parallel engines with a command similar to the one commented below. 

In [None]:
## 
##   ipcluster start --n 4 --daemonize
##
##

### import ipyrad in IPython


In [1]:
import ipyrad as ip
print ip.__version__

0.5.15


### Let's assemble some data

In [2]:
## create a new Assembly class object
data1 = ip.Assembly("data1")

  New Assembly: data1


In [3]:
## set new params 
data1.set_params("project_dir", "api-test")
data1.set_params("raw_fastq_path", "ipsimdata/rad_example_R1_.fastq.gz")
data1.set_params("barcodes_path", "ipsimdata/rad_example_barcodes.txt")

## print the params
data1.get_params()

  0   assembly_name               data1                                        
  1   project_dir                 ./api-test                                   
  2   raw_fastq_path              ./ipsimdata/rad_example_R1_.fastq.gz         
  3   barcodes_path               ./ipsimdata/rad_example_barcodes.txt         
  4   sorted_fastq_path                                                        
  5   assembly_method             denovo                                       
  6   reference_sequence                                                       
  7   datatype                    rad                                          
  8   restriction_overhang        ('TGCAG', '')                                
  9   max_low_qual_bases          5                                            
  10  phred_Qscore_offset         33                                           
  11  mindepth_statistical        6                                            
  12  mindepth_majrule            6     

### assemble the data set
... we have to work on making the spacing more uniform in the progress bars... 

In [4]:
data1.run('1234567', force=True)


  Assembly: data1
  [####################] 100%  chunking large files  | 0:00:00 | s1 | 
  [####################] 100%  sorting reads         | 0:00:02 | s1 | 
  [####################] 100%  writing/compressing   | 0:00:00 | s1 | 
  [####################] 100%  processing reads      | 0:00:02 | s2 | 
  [####################] 100%  dereplicating         | 0:00:00 | s3 | 
  [####################] 100%  clustering            | 0:00:01 | s3 | 
  [####################] 100%  building clusters     | 0:00:00 | s3 | 
  [####################] 100%  chunking              | 0:00:00 | s3 | 
  [####################] 100%  aligning              | 0:00:08 | s3 | 
  [####################] 100%  concatenating         | 0:00:00 | s3 | 
  [####################] 100%  inferring [H, E]      | 0:00:03 | s4 | 
  [####################] 100%  calculating depths    | 0:00:00 | s5 | 
  [####################] 100%  chunking clusters     | 0:00:00 | s5 | 
  [####################] 100%  consens calling       | 0:0

### branch the data set


In [6]:
## create new branch
data2 = data1.branch("data2", ["1A_0", "1B_0", "1C_0", "1D_0"])

## modify the branch params
data2.set_params("clust_threshold", "0.90")

## run steps, use force b/c we rewinded to step 3
data2.run("67", force=True)


  Assembly: data2
  [####################] 100%  concat/shuffle input  | 0:00:00 
  [####################] 100%  clustering across     | 0:00:00 
  [####################] 100%  building clusters     | 0:00:00 
  [####################] 100%  aligning clusters     | 0:00:01 
  [####################] 100%  indexing clusters     | 0:00:01 
  [####################] 100%  building database     | 0:00:00 
  [####################] 100%  filtering loci        | 0:00:00 
  [####################] 100%  building loci/stats   | 0:00:01 
  [####################] 100%  building vcf file     | 0:00:00 
  [####################] 100%  writing outfiles      | 0:00:01 
  Outfiles written to: ~/Documents/ipyrad/tests/api-test/data2_outfiles


### View assembly stats

In [5]:
print data1.stats

      state  reads_raw  reads_passed_filter  clusters_total  clusters_hidepth  \
1A_0      6      19862                19862            1000              1000   
1B_0      6      20043                20043            1000              1000   
1C_0      6      20136                20136            1000              1000   
1D_0      6      19966                19966            1000              1000   
2E_0      6      20017                20017            1000              1000   
2F_0      6      19933                19933            1000              1000   
2G_0      6      20030                20030            1000              1000   
2H_0      6      20199                20199            1000              1000   
3I_0      6      19885                19885            1000              1000   
3J_0      6      19822                19822            1000              1000   
3K_0      6      19965                19965            1000              1000   
3L_0      6      20008      

### Access statistics
You can see all of the attributes and functions available to Assembly class objects by using tab-completion after the object. 

In [8]:
## data1.<tab>

In [22]:
print data1.stats_dfs.s5.describe()

       clusters_total  filtered_by_depth  filtered_by_maxH  filtered_by_maxN  \
count            12.0               12.0              12.0              12.0   
mean           1000.0                0.0               0.0               0.0   
std               0.0                0.0               0.0               0.0   
min            1000.0                0.0               0.0               0.0   
25%            1000.0                0.0               0.0               0.0   
50%            1000.0                0.0               0.0               0.0   
75%            1000.0                0.0               0.0               0.0   
max            1000.0                0.0               0.0               0.0   

       reads_consens        nsites     nhetero  heterozygosity  
count           12.0     12.000000   12.000000       12.000000  
mean          1000.0  91016.333333  164.333333        0.001806  
std              0.0      8.927316    9.237604        0.000102  
min           1000.

### Plot stats
The toyplot package is very cool library for making interactive figures.
Click on the figure below to see an example. 

In [28]:
import toyplot

## set canvas size/type
canvas = toyplot.Canvas(width=300, height=300)

## set the axes
axes = canvas.cartesian(xlabel="N filtered reads", 
                        ylabel="heterozygosity", 
                        gutter=75)                     
axes.y.label.style = {"font-size": "16px"}
axes.x.label.style = {"font-size": "16px"}

## plot the data points
axes.scatterplot(data1.stats.error_est,
                 data1.stats.hetero_est,
                 size=7.5)

<toyplot.mark.Scatterplot at 0x7f20cc0fddd0>