# Working with Data Series and Frames

A pandas data frame is essentially a “smart” spreadsheet: a labeled table with columns (variables), observations (rows), and a multitude of built-in operations.


Module pandas adds two new containers to the already rich Python set of data structure: Series and DataFrame. 

A series is a one-dimensional, labeled (in other words, indexed) vector. 

A frame is a table with labeled rows and columns, not unlike an Excel spreadsheet or MySQL table. Each frame column is a series.

With a few exceptions, pandas treats frames and series similarly.


Built-in support for a variety of data wrangling operations:
+ Single-level and hierarchical indexing
+ Handling missing data
+ Arithmetic and Boolean operations on entire columns and tables
+ Database-type operations (such as merging and aggregation)
+ Plotting individual columns and whole tables
+ Reading data from files and writing data to files

## Series

A series is a one-dimensional data vector. Just like numpy arrays, series are homogeneous: all series items must belong to the same data type.

Let’s use a tuple of recent U.S. inflation data to illustrate a pandas data series.

You can create a simple series from any sequence: a list, a tuple, or an array.

### Getting Used to Pandas Data Structure Series

In [2]:
import pandas as pd
import numpy as np

# The last value is wrong, we will fix it later!
inflation = pd.Series((2.2, 3.4, 2.8, 1.6, 2.3, 2.7, 3.4, 3.2, 2.8, 3.8, \
			-0.4, 1.6, 3.2, 2.1, 1.5, 1.5))
inflation

0     2.2
1     3.4
2     2.8
3     1.6
4     2.3
5     2.7
6     3.4
7     3.2
8     2.8
9     3.8
10   -0.4
11    1.6
12    3.2
13    2.1
14    1.5
15    1.5
dtype: float64

In [3]:
inflation.values

array([ 2.2,  3.4,  2.8,  1.6,  2.3,  2.7,  3.4,  3.2,  2.8,  3.8, -0.4,
        1.6,  3.2,  2.1,  1.5,  1.5])

In [4]:
inflation.index

RangeIndex(start=0, stop=16, step=1)

In [5]:
inflation.index.values

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15], dtype=int64)

All these arrays (and the series attributes that they represent) are mutable.

In [6]:
inflation.values[-1] = 1.6
inflation

0     2.2
1     3.4
2     2.8
3     1.6
4     2.3
5     2.7
6     3.4
7     3.2
8     2.8
9     3.8
10   -0.4
11    1.6
12    3.2
13    2.1
14    1.5
15    1.6
dtype: float64

Let’s create a series with a **customized index** by passing a dictionary to the Series constructor. 
You could create another series of years and keep the two series together, but someone taught me that parallel series are a sure path to an eventual
disaster.

The dictionary keys become the series index - an indivisible part of the series:

In [7]:
inflation = pd.Series({1999 : 2.2, 2000 :  3.4, 2001 : 2.8, 2002 : 1.6, 2003: 2.3, 2004 : 2.7, 2005 :  3.4,  2006:  3.2, \
                       2007: 2.8, 2008: 3.8, 2009: -0.4, 2010: 1.6,  2011: 3.2, 2012:  2.1, 2013: 1.5, 2014 : 1.6, \
                       2015 : np.nan})
inflation


1999    2.2
2000    3.4
2001    2.8
2002    1.6
2003    2.3
2004    2.7
2005    3.4
2006    3.2
2007    2.8
2008    3.8
2009   -0.4
2010    1.6
2011    3.2
2012    2.1
2013    1.5
2014    1.6
2015    NaN
dtype: float64

Alternatively, you can create a new index from any sequence and then attach it to an existing series:

In [8]:
inflation.index = pd.Index(range(1999, 2016))
inflation

1999    2.2
2000    3.4
2001    2.8
2002    1.6
2003    2.3
2004    2.7
2005    3.4
2006    3.2
2007    2.8
2008    3.8
2009   -0.4
2010    1.6
2011    3.2
2012    2.1
2013    1.5
2014    1.6
2015    NaN
dtype: float64

In [9]:
inflation[2015] = np.nan
inflation

1999    2.2
2000    3.4
2001    2.8
2002    1.6
2003    2.3
2004    2.7
2005    3.4
2006    3.2
2007    2.8
2008    3.8
2009   -0.4
2010    1.6
2011    3.2
2012    2.1
2013    1.5
2014    1.6
2015    NaN
dtype: float64

Series values and the index can have names, which are accessed and assigned through the namesake attributes:

In [10]:
inflation.index.name = "Year"
inflation.name = "%"
inflation

Year
1999    2.2
2000    3.4
2001    2.8
2002    1.6
2003    2.3
2004    2.7
2005    3.4
2006    3.2
2007    2.8
2008    3.8
2009   -0.4
2010    1.6
2011    3.2
2012    2.1
2013    1.5
2014    1.6
2015    NaN
Name: %, dtype: float64

In [11]:
inflation.head()

Year
1999    2.2
2000    3.4
2001    2.8
2002    1.6
2003    2.3
Name: %, dtype: float64

In [12]:
inflation.tail()

Year
2011    3.2
2012    2.1
2013    1.5
2014    1.6
2015    NaN
Name: %, dtype: float64

An index is an index is an index ...

In [27]:
print (inflation[2011])
# print (inflation[0]) # das geht jetzt nicht mehr
print (inflation[0:2]) # das geht aber noch
print (inflation[2011:2013]) # aber Slicing mit dem neuen Index geht nicht

3.2
Year
1999    2.2
2000    3.4
Name: %, dtype: float64
Series([], Name: %, dtype: float64)
