# Plotly Demonstration
## Data Science Meetup, September 28th 2016

If you have't already, get set up using these steps:
1. Register for Plotly: https://plot.ly/
2. Generate API Keys: https://plot.ly/settings/api
3. Install Plotly  
   ```$ sudo pip install plotly```
   
4. Store API keys to ~/.plotly/.credentials  
   ```$ ipython
   In [1] import plotly 
   In [2] plotly.tools.set_credentials_file(username='YourAccount', api_key='YourKey')
   In [3] quit()```



In [1]:
# Import one of these two.
#import plotly.plotly as py        # plotly.plotly handles all connections to Plotly servers.
import plotly.offline as pyo      # plotly.offline handles offline plotting requests.

import plotly.graph_objs as go    # plotly.graph_objs holds components used for construcing plots.

pyo.init_notebook_mode()          # initialize offline ipython plotting.

In [2]:
import pandas as pd        # for creating dataframes.
from math import log10     # for log transforming values.

## Example: Interactive Mutations Ratio Plot

### Introduction
When translating RNA (four bases) to protein (20 amino acids/AA), RNA is read in sets of 3, where each triplet encodes one AA (or a start/stop signal). You might immediately recognize a discrepency here - 4^3 = 64 unique triplets, but only 20 amino acids. As such, some triplets encode the same AA. If a DNA mutation occurs such that the new triplet encodes the SAME amino acid, its know as 'synonymous' (or 'silent'), because the final product remains unchanged. Conversely, if a mutation changes the encoding to a NEW amino acid, it is known as 'non-synonymous' ('Kn'). In theory, because synonymous mutations do not change the final product, they are not subject to selection, and can accumulate in a manner roughly linear with elapsed time. By comparing two duplicate genes ('homologs'), it is possible to roughly assess the age of the duplication event by counting the number of synonymous mutations per synonymous site. Additionally, asessing the Kn to Ks ratio of a gene pair, it is possible to determine the type of selection acting on a gene pair. 

### Goals:
1. Extracting data from a text file into a Pandas dataframe using Python.
2. Use Plotly components to create an interactive histogram for exploring different mutation ratios/transformations.
3. Use Plotly's offline, ipython module to insert our histogram inline.

### Initial Steps:
1. Visit https://genomevolution.org/r/lg3t
2. Download Raw Data  
   "__Links and Downloads__ click here to see more..." -> "	Results with synonymous/non-synonymous rate values"
3. Rename file to "mutation-data.txt" (for simplicity's sake)

In [3]:
# Load our Ks & Kn values from raw data file using panda's 'read_table' function.
vals = pd.read_table('mutation-data.txt',      # input filepath
                     sep='\t',                 # value separator
                     header=None,              # no header
                     usecols=[0,1],            # use only first two columns
                     names=["Ks", "Kn"],       # name our columns for easy access
                     skip_blank_lines=True,    # ignore any empty lines
                     comment='#')              # ignore any content prefaced with a '#'
vals.head()

Unnamed: 0,Ks,Kn
0,1.3716,0.2046
1,0.1777,0.069
2,0.4467,0.0684
3,0.2366,0.0602
4,0.1075,0.018


In [4]:
vals.shape

(21475, 2)

In [5]:
# Create a 'trace' for each of our datasets of interest.
ks_trace = go.Histogram(name="Ks",           # give the dataset a descriptive name.
                        x=vals['Ks'],        # select our 'Ks' column from the dataframe as our X-values.
                        histnorm='percent')  # normalize by percentage.

kn_trace = go.Histogram(name="Kn", 
                        x=vals['Kn'], 
                        histnorm='percent', 
                        visible=False)

logks_trace = go.Histogram(name='log10(Ks)', 
                           x=[log10(k) for k in vals['Ks'] if k != 0],  # log-transform values.
                           histnorm='percent', 
                           visible=False)

logkn_trace = go.Histogram(name='log10(Kn)', 
                           x=[log10(k) for k in vals['Kn'] if k != 0], 
                           histnorm='percent', 
                           visible=False)

# Compile all traces into a list.
data = [ks_trace, kn_trace, logks_trace, logkn_trace]

In [6]:
# Define our plot's Layout()
layout = go.Layout(
    title = 'Mutation Ratios of Syntenic Gene Pairs<br>Maize & Sorghum',   # give our plot a title.
    xaxis = {'title': 'Ratio'},                                            # give our x-axis a label.
    yaxis = {'title': 'Percentage of Gene Pairs'},                         # give our y-axis a label.
    updatemenus=list([                                                     # create a dropdown menu.
        dict(
            x=-0.05,          # define our dropdown menu coordinates.
            y=1,
            yanchor='top',    # anchor coordinates at top.
            buttons=list([
                dict(
                    args=['visible', [True, False, False, False]],  # set an argument & the values for each trace.
                    label='Ks',                                     # set the display name on our dropdown.
                    method='restyle'                                # use the 'restyle' function (updates data).
                ),
                dict(
                    args=['visible', [False, True, False, False]],
                    label='Kn',
                    method='restyle'
                ),
                dict(
                    args=['visible', [False, False, True, False]],
                    label='log(Ks)',
                    method='restyle'
                ),
                dict(
                    args=['visible', [False, False, False, True]],
                    label='log(Kn)',
                    method='restyle'
                )
            ])
        )
    ]),
)

In [7]:
fig = go.Figure(data=data, layout=layout)  # compile our data & layout into a figure.
pyo.iplot(fig, filename='datasci-demo')    # using the offline module, create an inline plot.