# HW2
When collecting and analyzing data, sometimes you do not get a full set of data nor have all the tools readily at your disposal. This homework is to introduce being flexible and how to look up different functions in Python and implement them. In this case, we will be introducing pandas as a method to read the incomplete data.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%config InlineBackend.figure_format = 'retina'  # use this for hi-dpi displays
%matplotlib inline

In [2]:
sns.set_style('whitegrid')  # set the plotting style

# Dataset description

> This experimental dataset has been acquired by **Yazan Alhadid** (yalhadid@ucla.edu).
> For more information on the science behind this kind of measurement see:
> - A Novel Initiation Pathway in Escherichia Coli Transcription, (2016), Lerner/Chung et al. 
> doi:[10.1101/042432](http://dx.doi.org/10.1101/042432) 

The dataset below contains two experimental kinetic curves representing DNA transcription by RNA polymerase
starting from two different initial states ITC2 and ITC7 (meaning that the RNAP has transcribed
2 or 7 nucleotides respectively). It is interesting to note that
the ITC7 kinetics is slower than the ITC2.

Each number in the table is the result of a smFRET measurements and represents 
a transcription efficiency measured after a certain amount of time.
Note that, for some time points the measurement has been performed only in
one configuration (either ITC2 or ITC7) so there are missing data points (NaN, not-a-number).

When fitting this dataset we need to make sure we correctly handle the NaN.


## Data format

The data has been saved in CSV (comma separated values) in Excel. 
We will load this data in a pandas DataFrame, a special python object
to hold tabular data.

### Difference between DataFrame and array

A numpy array is a contained for uniform type of data (for example integers, floats, etc.).
A numpy array can be 1-D (a vector), 2-D or N-D.
Conversely, a pandas DataFrame is a table (similar to a 2-D array)
in which each column can be of different type. 

Taking a 2-D array, you can access the columns (and rows) with a 
with a numerical index (0, 1, ...). So you need to know the meaning of
each row/column.

Conversely, in pandas DataFrame each column has a name, and you can select
a column by name. The first column of a DataFrame is a special column called the Index.
The index is used to select rows in the table. In our example the index is the time axis
that is common to the two columns representing the two set of measurements.

In [3]:
#Please read in the csv file provided and show the data table using pandas.
#If you are having trouble, try googling the answer.




## Making plots from DataFrames

### Method1: use DataFrame.plot

A nice feature of DataFrames is that they can be quickly plotted:

In [4]:
#Try the [imported data file].plot function below. E.g. if "d" was my data I imported, simply put "d.plot"

Note that the data of the two columns are plotted against the time axis (the index)
and the two columns are automatically labeled in the legend. Also the x-axis has
been labeled with the name associated with the index column.


> **Note for advanced users**
>
>In the command above, we used `df.plot`, meaning that the function `plot` comes from
>the DataFrame itself. This kind of functions contained in an object (and doing some
>operation on the object) are called methods. In other words, DataFrame has a plot method
>so it knows how to plot itself.

# Fitting the data

To fit the data we use the `lmfit` library.

In [5]:
import lmfit
print('lmfit version:', lmfit.__version__)

lmfit version: 0.9.5


## Defining the model

We want to fit the data with an exponential curve that start at 0 for $t=0$ and goes
asymptotically to a value > 0 for $t=\infty$.

What kind of function satisfies this? Can you find one in the literature?

In [6]:
#Define your fit funtion here and fit it to each of the data sets.
#Plot the data, the fited function, and your residuals
#Don't forget, you have some NaN values, how will you omit those?

## Fit comparison

As a last step we create a single plot which shows the two datasets and
the fitted curves.

In [7]:
#Plot both data sets with their residuals

Voila! You have just done science!