# pandas - Python Data Analysis Library
`pandas` is a software library written for data manipulation and analysis. It contains the `DataFrame` object for manipulating numerical tables and time series data. The dataframe in pandas combines aspects of MATLAB indexing with functionality similar to the statistical programming language `R`.
First, since `pandas` is an auxiliary library, you must allows load it.

In [None]:
import pandas as pd

`pandas` provides an easy to use function `read_table`, which is similar to the `readtable` function of MATLAB

In [None]:
# read table
df = pd.read_table('data/GlobalTempbyMonth.txt', header=None, index_col=0, sep='\s+')

In [None]:
# show data
df

Notice that the dataframe has row and column headers that can be either strings or numbers. The index for the rows in the dataframe df is the first column of this file - the dates.

In [None]:
# show index
df.index

## Indexing into a dataframe
The two methods that are essential to accessing information in a dataframe are `loc` and `iloc`. `loc` takes the row and column names as strings. `iloc` takes only integers that label the rows and columns. Using `iloc` allows you to index a dataframe much like a table in MATLAB.

In [None]:
df.loc['2018/03']

In [None]:
df.loc['2018/03',1]

In [None]:
df.iloc[:,3]

## Adding to Dataframes
You can add columns to a dataframe with the following syntax. Notice that `pandas` has a number of methods associated with dataframes, such as `mean`, `min`, `max`. These methods are meant to act directly on the dataframe.

In [None]:
# add new columns
df['average'] = df.mean(axis=1)
df['min'] = df.min(axis=1)
df['max'] = df.max(axis=1)

In [None]:
df['average']

In [None]:
# sort by average
df2 = df.sort_values(by='average')

In [None]:
df2

In [None]:
# get a certain column
df.average

## The Sky's the Limit
Dataframes can be indexed using logical statements, much like MATLAB. This is useful when filtering data.

In [None]:
# boolean indexing
df[df.average>0.9]

You can manipulate, transform, and group components of the dataframe using any of the tools in python. Can you explain what te following lines do?

In [None]:
# group by year and calculate the average temperature
df['year'] = list(map(lambda x:x[:4], df.index))
year_average = df.groupby(df.year).average.mean()
year_average