In [None]:
import pandas as pd

series = 1d array of data

In [None]:
series = pd.Series(range(10))

A series has values and an index.  This index is automatically generated oridinal values.

In [None]:
series

The values in the series are just a `numpy` array

In [None]:
series.values

In [None]:
type(series.values)

Again, the index is by default, oridinal values but is stored differently in `pandas`, similar to the built-in `range` function in Python.

In [None]:
series.index

However, the index can be explictly created too.  Here the index is every other uppercase letter.

In [None]:
import string
series = pd.Series(range(10), index=[string.ascii_uppercase[x] for x in range(0, 20, 2)])

In [None]:
series

In [None]:
series.index

The index property can also be assigned after the fact.

In [None]:
series.index = 'zero one two three four five six seven eight nine'.split(' ')

In [None]:
series

A series can also be created from existing sequences.

In [None]:
mem_rain_avg = [3.98, 4.93, 5.16, 5.50, 5.25, 3.63, 4.59, 2.88, 3.09, 3.98, 5.49, 5.74]
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

In [None]:
mem_rain = pd.Series(mem_rain_avg, index=months)

mem_rain

The series can be accessed by values in the index.

In [None]:
mem_rain['Jun']

It also works similar to slicing in the built-in list.  Note that both endpoints are included.

In [None]:
mem_rain['Mar':'Nov']

Now let's create a dictionary from the rainfall and months.  The `zip` function will create a list of tuples from the arguments.  Iterating over those tuples, we can then create a dictionary using a syntax similar to list comprehensions.

In [None]:
mem_rain_dict = {month:rain for (month, rain) in zip(months, mem_rain_avg)}

In [None]:
mem_rain_dict

And create a `Series` from that dictionary

In [None]:
mem_rain_from_dict = pd.Series(mem_rain_dict)

In [None]:
mem_rain_from_dict

Notice that the dictionary is no longer ordered chronologically by month.  This is because a dictionary is similar to a hashtable.  It's doesn't care about the order of the keys.  It stores then in the order it finds most efficient.  In this case that happens to be alphabetical.

And let's get the snowfall totals in a `Series`.

In [None]:
mem_snow_avg = [1.9, 1.3, 0.4, 0, 0, 0, 0, 0, 0, 0, 0, 0.2]
mem_snow = pd.Series(mem_snow_avg, index=months)

mem_snow

Now I want to see the rain and snowfall together.  I can do this with a `DataFrame` which is a 2-d table of data.  The `DataFrame` can be created in a number of ways.  Here I am using a dictionary.  The keys will be column headers.  The values can be a dictionary or a `Series`.

Notice that the order of the keys in the dictionary and the order of the index in the series is different.

In [None]:
mem_precip = pd.DataFrame({'rain': mem_rain_dict, 'snow': mem_snow})

mem_precip

But the data frame aligns them using the order of the first pair in the dictionary.

In [None]:
mem_precip.index

However I can reorder them in chronological order too.

In [None]:
mem_precip.reindex(months)

Capture the result of the most recent operations with the underscore

In [None]:
mem_precip = _

Summary of some descriptive statistics about the data.

In [None]:
mem_precip.describe()

Or a single column.

In [None]:
mem_precip.rain.mean()

Creating a new column is as easy as:

In [None]:
mem_precip['total'] = mem_precip.rain + mem_precip.snow

In [None]:
mem_precip

The values in the data frame are merely a numpy array and can be treated as such.

In [None]:
mem_precip.values

In [None]:
type(mem_precip.values)

In [None]:
mem_snow_values = mem_precip.values[:,1]

In [None]:
mem_snow_values

In [None]:
mem_snow_values.mean()

To get a row from a data frame, use `iloc`.

In [None]:
mem_precip.iloc[2]

In [None]:
mem_precip.iloc[2, 1]

It also works with slicing

In [None]:
mem_precip.iloc[:3, :2]

And indicies too, with `loc`.

In [None]:
mem_precip.loc['Mar']

In [None]:
mem_precip.loc['Mar', 'snow']

In [None]:
mem_precip.loc[:'Mar', :'snow']

Show which months have snowfall.

In [None]:
mem_precip.snow > 0

And select only those rows

In [None]:
mem_precip[mem_precip.snow > 0]

The average total precipation for months in which there is snow

In [None]:
_.total.mean()

Get only certain columns

In [None]:
mem_precip.loc[mem_precip.snow > 0, ['rain', 'total']]

And sum them

In [None]:
_.sum()

#### Visualizing Data Frames

Our friends from `matplotlib`

In [None]:
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

It's possible to use raw values from the data frame

In [None]:
fig, axs = plt.subplots()
axs.set_xticklabels(months)
plt.plot(np.arange(len(mem_precip.rain)), mem_precip.rain)
plt.plot(np.arange(len(mem_precip.snow)), mem_precip.snow)
plt.xticks(np.arange(12))
plt.legend(['Rainfall', 'Snowfall'])

But it's easier to use the `plot` method

In [None]:
mem_precip[['rain', 'snow']].plot()

In [None]:
mem_precip[['rain', 'snow']].plot.bar()

The `plot` can also take a colormap, just like in `matplotlib`.

In [None]:
mem_precip[['rain', 'snow']].plot.bar(cmap='RdYlBu')

In [None]:
mem_precip.rain.plot.pie()

Add the percentage each month has of the yearly rainfall.

In [None]:
mem_precip.rain.plot.pie(autopct='%1.1f%%')

And we can get really fancy.  Here I'm finding the row with the most rainfall and extracting its index.  Then I want to separate that month from the rest.  This is done with the `callouts` and is just a list/array of floats where the larger values are distanced more from the chart.  I'll give the callout for the rainiest month a larger value.  Then in rendering the chart, I'll again use the percentages, pass the callouts to the `explode` keyword (note it must be a tuple), rotate the chart 90 degrees so that January is on top, and resize the chart to be of equal height and width so that it's a circle and not an oval

In [None]:
rainiest_month = mem_precip.rain.max()
idx = np.where(mem_precip.rain.values == rainiest_month)[0][0]
callouts = np.zeros(12)
callouts[idx] = .4
mem_precip.rain.plot.pie(autopct='%1.1f%%', explode=tuple(callouts), startangle=90, figsize=(5, 5))