# Introduction to Data Analysis in Python
---
## R. Burke Squires

### NIAID Bioinformatics and Computational Biosciences Branch

# Outline:

- Pandas
    - Importing Data
    - Removing missing values
    - Fun with Columns
    - Filtering
    - Grouping
    - Plotting
    - Getting data out
    - Reading and writing to Excel

---

# An Introduction to Pandas

** Presentation originally developed by Michael Hansen, modified slightly by Jeff Shelton **

**pandas** is a Python package providing fast, flexible, and expressive data structures designed to work with *relational* or *labeled* data both. It is a fundamental high-level building block for doing practical, real world data analysis in Python. 

pandas is well suited for:

- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

## Key features:
    
- Easy handling of **missing data**
- **Size mutability**: columns can be inserted and deleted from DataFrame and higher dimensional objects
- Automatic and explicit **data alignment**: objects can be explicitly aligned to a set of labels, or the data can be aligned automatically
- Powerful, flexible **group by functionality** to perform split-apply-combine operations on data sets
- Intelligent label-based **slicing, fancy indexing, and subsetting** of large data sets
- Intuitive **merging and joining** data sets
- Flexible **reshaping and pivoting** of data sets
- **Hierarchical labeling** of axes
- Robust **IO tools** for loading data from flat files, Excel files, databases, and HDF5
- **Time series functionality**: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

## Data Source

When dealing with numeric matrices and vectors in Python, [NumPy](http://www.numpy.org/) makes life a lot easier.
For more complex data, however, it leaves a lot to be desired.
If you're used to working with [data frames in R](http://www.r-tutor.com/r-introduction/data-frame), doing data analysis directly with NumPy feels like a step back.

Fortunately, some nice folks have written the [Python Data Analysis Library](http://pandas.pydata.org/) (a.k.a. pandas).
Pandas provides an R-like `DataFrame`, produces high quality plots with [matplotlib](http://matplotlib.org/), and integrates nicely with other libraries that expect NumPy arrays.

In this tutorial, we'll go through the basics of pandas using a year's worth of weather data from [Weather Underground](http://www.wunderground.com/).
Pandas has a **lot** of functionality, so we'll only be able to cover a small fraction of what you can do.
Check out the (very readable) [pandas docs](http://pandas.pydata.org/pandas-docs/stable/) if you want to learn more.

In [None]:
from IPython.core.display import HTML
HTML("<iframe src=http://pandas.pydata.org width=800 height=350></iframe>")

## Getting Started

OK, let's get started by importing the pandas library.

In [None]:
!~/anaconda/bin/conda install -y pandas

In [None]:
import pandas

Next, let's read in [our data](data/weather_year.csv).
Because it's in a CSV file, we can use pandas' `read_csv` function to pull it directly into a [DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe).

In [None]:
!head data/weather_year.csv

In [None]:
data = pandas.read_csv("data/weather_year.csv")

In [None]:
type(data)

    data = pandas.read_csv?

We can get a summary of the DataFrame by asking for some information:

In [None]:
help(pandas.read_csv)

In [None]:
data.info()

This gives us a lot of insight. 

- First, we can see that there are 366 rows (entries) -- a year and a day's worth of weather. Each column is printed along with however many "non-null" values are present.
- We'll talk more about [null (or missing) values in pandas](http://pandas.pydata.org/pandas-docs/stable/missing_data.html) later, but for now we can note that only the "Max Gust SpeedMPH" and "Events" columns have fewer than 366 non-null values.
- Lastly, the data types (dtypes) of the columns are printed at the very bottom. We can see that there are 4 `float64`, 16 `int64`, and 3 `object` columns.

In [None]:
len(data)

Using `len` on a DataFrame will give you the number of rows. You can get the column names using the `columns` property.

In [None]:
data.columns

Columns can be accessed in two ways. The first is using the DataFrame like a dictionary with string keys:

In [None]:
data["EDT"]

You can get multiple columns out at the same time by passing in a list of strings.

In [None]:
data[["EDT", "Mean TemperatureF"]]

## Dot Notation

The second way to access columns is using the dot syntax. This only works if:
- your column name could also be a Python variable name (i.e., no spaces), and 
- if it doesn't collide with another DataFrame property or function name (e.g., count, sum).

In [None]:
data.EDT

We'll be mostly using the dot syntax here because you can auto-complete the names in IPython. The first pandas function we'll learn about is `head()`. This gives us the first 5 items in a column (or the first 5 rows in the DataFrame).

In [None]:
data.EDT.head()

Passing in a number `n` gives us the first `n` items in the column. There is also a corresponding `tail()` method that gives the *last* `n` items or rows.

In [None]:
data[['EDT', 'Max TemperatureF']].max()

This also works with the dictionary syntax.

In [None]:
data["Mean TemperatureF"].head()

In [None]:
data.info()

In [None]:
data[" Mean Humidity"]

## Exercise 1:

How would we get the second to last date (EDT) in the dataset?

In [None]:
data.EDT[len(data) -1]

If the data in the column is numeric, you can use `describe()` to get some stats on it.

In [None]:
data["Mean TemperatureF"].describe()

In [None]:
data.describe()

## Fun with Columns

The column names in `data` are a little unwieldy, so we're going to rename them. At first we will automate the renaming but then to make them easier to read we will assigning a new list of column names to the `columns` property of the DataFrame.

In [None]:
data.columns

In [None]:
def rename_dataframe_columns(df):
    """
    This functions renames columns by replacing spaces with underscores 
    and making everything lower case
    
    df: pandas dataframe as input
    """
    cols = df.columns
    new_column_names = []

    for col in cols:
        new_col = col.lstrip().lower().replace(" ", "_") #strip beginning spaces, makes lowercase, add underscpre
        new_column_names.append(new_col)

    df.columns = new_column_names

In [None]:
rename_dataframe_columns(data)

In [None]:
data.columns

Instead of these names we will just rename all columns with a shorter name:

In [None]:
data.columns = ["date", "max_temp", "mean_temp", "min_temp", "max_dew",
                "mean_dew", "min_dew", "max_humidity", "mean_humidity",
                "min_humidity", "max_pressure", "mean_pressure",
                "min_pressure", "max_visibilty", "mean_visibility",
                "min_visibility", "max_wind", "mean_wind", "min_wind",
                "precipitation", "cloud_cover", "events", "wind_dir"]

To rename one or more DataFrame columns

    data = data.rename(columns = {

        'col1 old name':'col1 new name',
        'col2 old name':'col2 new name',
        'col3 old name':'col3 new name'
    })

These should be in the same order as the original columns. Let's take another look at our DataFrame summary.

In [None]:
data.info()

Now our columns can all be accessed using the dot syntax!

In [None]:
data.max_temp.head()

There are lots useful methods on columns, such as `std()` to get the standard deviation. Most of pandas' methods will happily ignore missing values like `NaN`.

In [None]:
data.mean_temp.std()

Some methods, like `plot()` and `hist()` produce plots using [matplotlib](http://matplotlib.org/).

To make plots using Matplotlib, you must first enable IPython's matplotlib mode. To do this, run the `%matplotlib inline` magic command to enable plotting in the current Notebook. \[If that doesn't work (because you have an older version of IPython), try `%pylab inline`. You may also have to restart the IPython kernel.\]

We'll go over plotting in more detail later.

In [None]:
# %matplotlib inline
data.mean_temp.hist()

If you want to add labels and save the plot as a `png` file that is sized 800 pixels by 600 pixels:

In [None]:
ax = data.mean_temp.hist()   # get plot axes object

ax.set_xlabel('Daily Mean Temperature (F)')
ax.set_ylabel('# of Occurances')
ax.set_title('Mean Temperature Histogram')

fig = ax.get_figure()        # get plot figure object
fig.set_size_inches(8,6)     # set plot size
fig.savefig('MeanTempHistogram.jpg', dpi=100)

In [None]:
!ls -l

In [None]:
type(ax)

In [None]:
data.mean_temp.hist?

In [None]:
ax = data.mean_temp.hist(bins=20)

ax.set_xlabel("Daily Mean Termperature (F)")
ax.set_ylabel("# of Occurances")
ax.set_title("Mean Temperature")

fig = ax.get_figure()
fig.set_size_inches(8,6)
fig.savefig('MeanTemperature.jpg', dpi=300)

By the way, many of the column-specific methods also work on the entire DataFrame. Instead of a single number, you'll get a result for each column.

In [None]:
data.max_wind.hist()

## Exercise 2:

What is the range of temperatures in the dataset?

*Hint: columns have `max()` and `min()` methods.*

In [None]:
data.max_temp.max()

## Bulk Operations with `apply()`

Methods like `sum()` and `std()` work on entire columns. We can run our own functions across all values in a column (or row) using `apply()`.

To give you an idea of how this works, let's consider the "date" column in our DataFrame (formally "EDT").

In [None]:
data.date.head()

We can use the `values` property of the column to get a list of values for the column. Inspecting the first value reveals that these are strings with a particular format.

In [None]:
first_date = data.date.values[0]
first_date

In [None]:
type(first_date)

The `strptime` function from the `datetime` module will make quick work of this date string. There are many [more shortcuts available](http://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior) for `strptime`.

In [None]:
# Import the datetime class from the datetime module
from datetime import datetime

# Convert date string to datetime object
datetime.strptime(first_date, "%Y-%m-%d")

Using the `apply()` method, which takes a function (**without** the parentheses), we can apply `strptime` to each value in the column. We'll overwrite the string date values with their Python `datetime` equivalents.

In [None]:
# Define a function to convert strings to dates
def string_to_date(date_string):
    return datetime.strptime(date_string, "%Y-%m-%d")

# Run the function on every date string and overwrite the column
data.date = data.date.apply(string_to_date)
data.date.head()

In [None]:
data.info()

In [None]:
def string_to_date(date_string):
    return datetime.strptime(date_string, "%Y-%m-%d")

In [None]:
data.date.head()

Let's go one step futher. Each row in our DataFrame represents the weather from a single day. Each row in a DataFrame is associated with an *index*, which is a label that uniquely identifies a row.

Our row indices up to now have been auto-generated by pandas, and are simply integers from 0 to 365. If we use dates instead of integers for our index, we will get some extra benefits from pandas when plotting later on. Overwriting the index is as easy as assigning to the `index` property of the DataFrame.

In [None]:
data.index = data.date
data.info()

In [None]:
data.index = data.date

In [None]:
data.info()

In [None]:
data = data.drop("date", axis=1)
data.columns

Now we can quickly look up a row by its date with the `loc[]` property \[[see docs](http://pandas.pydata.org/pandas-docs/stable/indexing.html)], which locates records by label.

In [None]:
data.loc[datetime(2012, 8, 19)]

We can also access a row (or range of rows) with the `iloc[]` property, which locates records by integer index.

In [None]:
data.max_temp.iloc[7:15]

With all of the dates in the index now, we no longer need the "date" column. Let's drop it.

In [None]:
data.columns

Note that we need to pass in `axis=1` in order to drop a column. For more details, check out the [documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html) for `drop`. The index values can now be accessed as `data.index.values`.

## Exercise 3:

Print out the cloud cover for each day in May.

*Hint: you can make datetime objects with the `datetime(year, month, day)` function*

*For extra credit, try using the `date_range()` function; see [pandas.date_range](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html)*

In [None]:
datetime(2012, 5, 1)  # May 1st of 2012
print(type(data.index))

In [None]:
pandas.date_range?

In [None]:
#data.cloud_cover.loc[datetime(2012, 5, 1)]
rng = pandas.date_range('5/1/2012','5/31/2012',freq='D') 
data.cloud_cover.loc[rng]

## Handing Missing Values

Pandas considers values like `NaN` and `None` to represent missing data. The `count()` function [[see docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html)] can be used to tell whether values are missing. We use the parameter `axis=0` to indicate that we want to perform the count by rows, rather than columns.

In [None]:
data.count(axis=0)

It is pretty obvious that there are a lot of `NaN` entrys for the `events` column; 204 to be exact. Let's take a look at a few values from the `events` column:

In [None]:
data.events.head(10)

This isn't exactly what we want. One option is to drop all rows in the DataFrame with missing "events" values using the `dropna()` function \[[see docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html)].

In [None]:
data.dropna(subset=["events"]).info()

Note that this didn't affect `data`; we're just looking at a copy.

Instead of dropping rows with missing values, let's fill them with empty strings (you'll see why in a moment). This is easily done with the `fillna()` function. We'll go ahead and overwrite the "events" column with empty string missing values instead of `NaN`.

In [None]:
data.events = data.events.fillna("")
data.events.head(10)

In [None]:
data.info()

In [None]:
data.events.head(10)

Now we repeat the `count` function for the `events` column:

In [None]:
data.events.count()

As desired, there are no longer any empty entries in the `events` column. Why did we not need the `axis=0` parameter this time?

## Iteratively Accessing Rows

You can iterate over each row in the DataFrame with `iterrows()`. Note that this function returns **both** the index and the row. Also, you must access columns in the row you get back from `iterrows()` with the dictionary syntax.

In [None]:
num_rain = 0
for idx, row in data.iterrows():
    if "Rain" in row["events"]:
        num_rain += 1

"Days with rain: {0}".format(num_rain)

In [None]:
num_rain = 0
for idx, row in data.iterrows():
    if "Rain" in row["events"]:
        num_rain += 1

print("Days with rain: {0}".format(num_rain))

## Exercise 4:

Was there any November rain?

*Hint*: check out the `strftime()` function on `datetime` objects and the [documentation](http://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior)

## Filtering

Most of your time using pandas will likely be devoted to selecting rows of interest from a DataFrame. In addition to strings, the dictionary syntax accepts requests like:

In [None]:
data.max_temp <= 32

In [None]:
freezing_days = data[data.max_temp <= 32]
freezing_days.info()

In [None]:
type(freezing_days)

We get back another DataFrame with fewer rows (21 in this case). This DataFrame can be filtered down even more by adding a constrain that the temperature be greater than 20 degrees, in addition to being below freezing.

In [None]:
freezing_days.min_temp >= 20

In [None]:
cold_days = freezing_days[freezing_days.min_temp >= 20]
cold_days.info()

To see the high and low temperatures for the selected days:

In [None]:
cold_days[["max_temp","min_temp"]]

Using boolean operations, we could have chosen to apply both filters to the original DataFrame at the same time.

In [None]:
data[(data.max_temp <= 32) & (data.min_temp >= 20)]

In [None]:
genes_of_interest = [32, 31]
data[data.max_temp.isin(genes_of_interest)]


In [None]:
data[data.max_temp.isin([31, 29, 32]) & data.min_temp.isin([21,22,23])]

It's important to understand what's really going on underneath with filtering. Let's look at what kind of object we actually get back when creating a filter.

In [None]:
temp_max = data.max_temp <= 32
type(temp_max)

This is a pandas `Series` object, which is the one-dimensional equivalent of a DataFrame. Because our DataFrame uses datetime objects for the index, we have a specialized `TimeSeries` object.

What's inside the filter?

In [None]:
temp_max

Our filter is nothing more than a `Series` with a *boolean value for every item in the index*. When we "run the filter" as so:

In [None]:
data[temp_max].info()

pandas lines up the rows of the DataFrame and the filter using the index, and then keeps the rows with a `True` filter value. That's it.

Let's create another filter.

In [None]:
temp_min = data.min_temp >= 20
temp_min

Now we can see what the boolean operations are doing. Something like `&` (**not** `and`)...

In [None]:
temp_min & temp_max

...is just lining up the two filters using the index, performing a boolean AND operation, and returning the result as another `Series`.

We can do other boolean operations too, like OR:

In [None]:
temp_min | temp_max

Because the result is just another `Series`, we have all of the regular pandas functions at our disposal. The `any()` function returns `True` if any value in the `Series` is `True`.

In [None]:
temp_both = temp_min & temp_max
temp_both.any()

We can wrap it up in an `apply()` call fairly easily, though:

In [None]:
data[data.events.apply(lambda e: "Rain" in e)].info()

## Apply a function to your data

Before starting the exercise, let's convert the precipitation column in the dataset to floating point numbers. It's currently full of strings because of the "T" value, which stands for "trace amount of precipitation."

In [None]:
data.precipitation.head()

We'll replace "T" with a very small number, and convert the rest of the strings to floats:

In [None]:
# Convert precipitation to floating point number
# "T" means "trace of precipitation"
def precipitation_to_float(precip_str):
    if precip_str == "T":
        return 1e-10  # Very small value
    return float(precip_str)

data.precipitation = data.precipitation.apply(precipitation_to_float)
data.precipitation.head()

In [None]:
def precipitation_to_float(precip_str):
    if precip_str == "T":
        return 1e-10
    return float(precip_str)

In [None]:
data.precipitation = data.precipitation.apply(precipitation_to_float)
data.precipitation.head()

## Exercise 5:

What was the coldest and hottest it ever got when there was no cloud cover and no precipitation?

## Grouping

Besides `apply()`, another great DataFrame function is `groupby()`.
It will group a DataFrame by one or more columns, and let you iterate through each group.

As an example, let's group our DataFrame by the "cloud_cover" column (a value ranging from 0 to 8).

In [None]:
cover_temps = {}
for cover, cover_data in data.groupby("cloud_cover"):
    cover_temps[cover] = cover_data.mean_temp.mean()  # The mean mean temp!
cover_temps

When you iterate through the result of `groupby()`, you will get a tuple.
The first item is the column value, and the second item is a filtered DataFrame (where the column equals the first tuple value).

You can group by more than one column as well.
In this case, the first tuple item returned by `groupby()` will itself be a tuple with the value of each column.

In [None]:
for (cover, events), group_data in data.groupby(["cloud_cover", "events"]):
    
    print("Cover: {0}, Events: {1}, Count: {2}".format(cover, events, len(group_data)))

## Creating New Columns

Weather events in our DataFrame are stored in strings like "Rain-Thunderstorm" to represent that it rained and there was a thunderstorm that day. Let's split them out into boolean "rain", "thunderstorm", etc. columns.

First, let's discover the different kinds of weather events we have with `unique()`.

In [None]:
data.events.unique()

Looks like we have "Rain", "Thunderstorm", "Fog", and "Snow" events. Creating a new column for each of these event kinds is a piece of cake with the dictionary syntax.

In [None]:
for event_kind in ["Rain", "Thunderstorm", "Fog", "Snow"]:
    col_name = event_kind.lower()  # Turn "Rain" into "rain", etc.
    data[col_name] = data.events.apply(lambda e: event_kind in e)
data.info()

Our new columns show up at the bottom. We can access them now with the dot syntax.

In [None]:
data.rain.head()

In [None]:
new_df = data[['rain', 'thunderstorm']]
new_df.sum()

In [None]:
type(new_df)

We can also do cool things like find out how many `True` values there are (i.e., how many days had rain)...

In [None]:
data.rain.sum()

...and get all the days that had both rain and snow!

In [None]:
data[data.rain & data.snow].info()

## Exercise 6:

Was the mean temperature more variable on days with rain and snow than on days with just rain or just snow?

*Hint: don't forget the `std()` function*

## Plotting

We've already seen how the `hist()` function makes generating histograms a snap. Let's look at the `plot()` function now.

In [None]:
data.max_temp.plot()

That one line of code did a **lot** for us. First, it created a nice looking line plot using the maximum temperature column from our DataFrame. Second, because we used `datetime` objects in our index, pandas labeled the x-axis appropriately.

Pandas is smart too. If we're only looking at a couple of days, the x-axis looks different:

In [None]:
data.max_temp.tail(200).head(100).plot()

In [None]:
data.max_temp.tail()

In [None]:
data.max_temp.tail().plot()

Prefer a bar plot? Pandas has got your covered.

In [None]:
data.max_temp.tail().plot(kind="bar", rot=10)

The `plot()` function returns a matplotlib `AxesSubPlot` object. You can pass this object into subsequent calls to `plot()` in order to compose plots.

Although `plot()` takes a variety of parameters to customize your plot, users familiar with matplotlib will feel right at home with the `AxesSubPlot` object.

In [None]:
ax = data.max_temp.plot(title="Min and Max Temperatures")
data.min_temp.plot(style="red", ax=ax)
ax.set_ylabel("Temperature (F)")

## Exercise 7:

Add the mean temperature to the previous plot using a green line. Also, add a legend with the `legend()` method of `ax`.

## Getting Data Out

Writing data out in pandas is as easy as getting data in. To save our DataFrame out to a new csv file, we can just do this:

In [None]:
data.to_csv("weather-mod.csv")
#data.to_csv("data/weather-mod.csv")
#data.to_csv("/Users/squiresrb/Documents/BCBB/Seminars/2016/Intro to Data Analysis/weather-mod.csv")

Want to make that tab separated instead? No problem.

In [None]:
data.to_csv("data/weather-mod.tsv", sep="\t")

There's also support for [reading and writing Excel files](http://pandas.pydata.org/pandas-docs/stable/io.html#excel-files), if you need it.

## Reading Excel Files

The read_excel() method can read Excel 2003 (.xls) and Excel 2007 (.xlsx) files using the xlrd Python module and use the same parsing code as the above to convert tabular data into a DataFrame. See the cookbook for some advanced strategies

Besides read_excel you can also read Excel files using the ExcelFile class. The following two commands are equivalent:

In [None]:
# using the read_excel function
pandas.read_excel('../data/gapminder_life_expectancy_at_birth.xlsx', 0, index_col=None, na_values=[''])

In [None]:
# Using the sheet name:
# mock_dataframe = pandas.read_excel('data/mock_data.xls', 'Sheet1', index_col=None, na_values=['NA'])

# Using the sheet index:
pandas.read_excel('../data/gapminder_life_expectancy_at_birth.xlsx', 0, index_col=None, na_values=['NA'])

# Using all default values:
pandas.read_excel('../data/gapminder_life_expectancy_at_birth.xlsx')

## Miscellanea

We've only covered a small fraction of the pandas library here.
Before I wrap up, however, there are a few miscellaneous tips I'd like to go over.

First, it can be confusing to know when an operation will modify a DataFrame and when it will return a copy to you.
Pandas behavior here is entirely dictated by NumPy, and some situations are unintuitive.

For example, what do you think will happen here?

In [None]:
for idx, row in data.iterrows():
    row["max_temp"] = 0
data.max_temp.head()

Contrary to what you might expect, modifying `row` did **not** modify `data`!
This is because `row` is a copy, and does not point back to the original DataFrame.

Here's the right way to do it:

In [None]:
for idx, row in data.iterrows():
    data.max_temp.loc[idx] = 0
any(data.max_temp != 0)  # Any rows with max_temp not equal to zero?

When using `apply()`, the default behavior is to go over columns.

In [None]:
data.apply(lambda c: c.name)

You can make `apply()` go over rows by passing `axis=1`

In [None]:
data['pressure_diff'] = data.apply(lambda r: r["max_pressure"] - r["min_pressure"], axis=1)

When you call `drop()`, though, it's flipped. To drop a column, you need to pass `axis=1`

In [None]:
data.drop("events", axis=1).columns

Resources

- [Learn Pandas](https://bitbucket.org/hrojas/learn-pandas)
- [Compute](http://nbviewer.ipython.org/urls/bitbucket.org/hrojas/learn-pandas/raw/master/lessons/Cookbook%20-%20Compute.ipynb)
- [Merge](http://nbviewer.ipython.org/urls/bitbucket.org/hrojas/learn-pandas/raw/master/lessons/Cookbook%20-%20Merge.ipynb)
- [Select](http://nbviewer.ipython.org/urls/bitbucket.org/hrojas/learn-pandas/raw/master/lessons/Cookbook%20-%20Select.ipynb)
- [Sort](http://nbviewer.ipython.org/urls/bitbucket.org/hrojas/learn-pandas/raw/master/lessons/Cookbook%20-%20Sort.ipynb)


- [Intro to Pandas](https://bitbucket.org/hrojas/learn-pandas)
- [Timeseries](http://nbviewer.ipython.org/github/changhiskhan/talks/blob/master/pydata2012/pandas_timeseries.ipynb)
- [Statistics in Python](http://www.randalolson.com/2012/08/06/statistical-analysis-made-easy-in-python/)

http://datacommunitydc.org/blog/2013/07/python-for-data-analysis-the-landscape-of-tutorials/

## Questions and Answers?

My email address: richard dot squires at nih dot gov
    
ScienceApps at niaid dot nih dot gov