# Data Structures
In __pandas__, there are two workhorse data strucutures: _Series_ and _DataFrame_.

## Series
A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels called _index_. The __default__ index is from 0 to _N-1_ -- N is the number of elements in the data.

In [1]:
import pandas as pd

In [2]:
arr = pd.Series([-2, 90, 3, 4], index = ['a', 'd', 'b', 'c'])
arr

a    -2
d    90
b     3
c     4
dtype: int64

In [3]:
arr.values

array([-2, 90,  3,  4])

In [4]:
arr.index

Index(['a', 'd', 'b', 'c'], dtype='object')

If you consider _series_ as a dictionary, then following expression makes sense:

In [5]:
'b' in arr

True

Series automatically align differently-indexed data in arithmetic operations.

In [9]:
population1 = pd.Series([500, 45, 345, 55], index=['Nanjing', "Madison", "San Diego", "Twin City"])
population2 = pd.Series([1000, 900, 23, 1500], index=['BeiJing', 'Nanjing', 'Madison', "Shanghai"])

In [10]:
population1

Nanjing      500
Madison       45
San Diego    345
Twin City     55
dtype: int64

In [11]:
population2

BeiJing     1000
Nanjing      900
Madison       23
Shanghai    1500
dtype: int64

In [12]:
population1 + population2

BeiJing         NaN
Madison        68.0
Nanjing      1400.0
San Diego       NaN
Shanghai        NaN
Twin City       NaN
dtype: float64

In [13]:
population1.name = "Population"
population1.index.name = "City"

In [14]:
population1

City
Nanjing      500
Madison       45
San Diego    345
Twin City     55
Name: Population, dtype: int64

## DataFrame
A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which could be a different value type. 

In [16]:
data = {'city': ['Nanjing', 'Shenzhen', 'Madison', 'Seattle', "Santa Cruz"],
        'year': [2007, 2011, 2015, 2017, 2013]}
frame = pd.DataFrame(data)

In [17]:
frame

Unnamed: 0,city,year
0,Nanjing,2007
1,Shenzhen,2011
2,Madison,2015
3,Seattle,2017
4,Santa Cruz,2013


In [18]:
frame.columns

Index(['city', 'year'], dtype='object')

In [19]:
frame['city']

0       Nanjing
1      Shenzhen
2       Madison
3       Seattle
4    Santa Cruz
Name: city, dtype: object

One can access the column as the attribute of the data frame as well. 

In [20]:
frame.city

0       Nanjing
1      Shenzhen
2       Madison
3       Seattle
4    Santa Cruz
Name: city, dtype: object

In [21]:
# Let's get the row
frame.ix[3]

city    Seattle
year       2017
Name: 3, dtype: object

In [22]:
frame.values

array([['Nanjing', 2007],
       ['Shenzhen', 2011],
       ['Madison', 2015],
       ['Seattle', 2017],
       ['Santa Cruz', 2013]], dtype=object)