# Introduction to Data Structures

## Learning Outcomes


[Pandas online reference](http://pandas.pydata.org/pandas-docs/stable/)

At the end of the workshop, students would have gained an appreciate and hand-ons practical experience on the following topics:
* Series
  * From ndarray
  * From dict
  * From scalar value
  * Series is ndarray-like
  * Series is dict-like
  * Vectorized operations and label alignment with Series
  * Name attribute
* DataFrame
  * From dict of Series or dicts
  * From dict of ndarrays / lists
  * From a list of dicts
  * Column selection, addition, deletion

In [1]:
import pandas as pd
import numpy as np
print("Pandas version : {}".format(pd.__version__))
print("Numpy version : {}".format(np.__version__))

Pandas version : 0.22.0
Numpy version : 1.14.3


# Series

## Intro



Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the **index**. The basic method to create a Series is to call:

`s = pd.Series(data, index=index)`

Here, `data` can be many different things:
* a Python dict
* an ndarray
* a scalar value (like 5)

The passed **index** is a list of axis labels. Thus, this separates into a few cases depending on what **data** is:

## From ndarray



If data is an ndarray, **index** must be the same length as **data**. If no index is passed, one will be created having values `[0, ..., len(data) - 1]`.

In [2]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [3]:
s

a   -0.106659
b    0.582483
c   -1.085806
d    1.932296
e   -0.786370
dtype: float64

In [4]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [19]:
s.index[0] = 'z'

TypeError: Index does not support mutable operations

In [21]:
mod_ind = list(s.index)

In [22]:
mod_ind

['a', 'b', 'c', 'd', 'e']

In [23]:
mod_ind[0] = 'z'
mod_ind

['z', 'b', 'c', 'd', 'e']

In [24]:
s.index = mod_ind

In [25]:
s

z   -0.106659
b    0.582483
c   -1.085806
d    1.932296
e   -0.786370
dtype: float64

no index value provided, constructor will create the index for you.

In [26]:
pd.Series(np.random.randn(5))

0   -0.490286
1   -2.152821
2   -0.264440
3   -1.887953
4    0.919149
dtype: float64

## From dict



If data is a dict, if **index** is passed the values in data corresponding to the labels in the index will be pulled out. Otherwise, an index will be constructed from the sorted keys of the dict, if possible.

In [27]:
d = {'a' : 0., 'b' : 1., 'c' : 2.}

In [28]:
type(d)

dict

In [29]:
pd.Series(d)

a    0.0
b    1.0
c    2.0
dtype: float64

In [30]:
pd.Series(d, index=['b', 'c', 'd', 'a']) # arrange index in this order.

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

## From scalar value



If data is a scalar value, an index must be provided. The value will be repeated to match the length of **index**

In [31]:
pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])

a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

## Series is ndarray-like

In [32]:
s

z   -0.106659
b    0.582483
c   -1.085806
d    1.932296
e   -0.786370
dtype: float64

In [33]:
s[0]

-0.10665921464675526

In [34]:
s[:3]

z   -0.106659
b    0.582483
c   -1.085806
dtype: float64

In [35]:
s[s > s.median()]

b    0.582483
d    1.932296
dtype: float64

In [36]:
s[[4, 3, 1]]

e   -0.786370
d    1.932296
b    0.582483
dtype: float64

In [37]:
np.exp(s)

z    0.898832
b    1.790479
c    0.337630
d    6.905348
e    0.455495
dtype: float64

## Series is dict-like

In [39]:
s['z']

-0.10665921464675526

In [40]:
s

z   -0.106659
b    0.582483
c   -1.085806
d    1.932296
e   -0.786370
dtype: float64

In [41]:
s['e'] = 12.

In [42]:
s

z    -0.106659
b     0.582483
c    -1.085806
d     1.932296
e    12.000000
dtype: float64

In [43]:
'e' in s

True

In [44]:
'f' in s

False

If a label is not contained, an exception is raised:

In [45]:
s['f']

KeyError: 'f'

Using the get method, a missing label will return None or specified default:

In [46]:
s.get('f')

In [47]:
s.get('f', 'OMG')

'OMG'

## Vectorized operations and label alignment with Series

In [49]:
s

z    -0.106659
b     0.582483
c    -1.085806
d     1.932296
e    12.000000
dtype: float64

In [48]:
s + s

z    -0.213318
b     1.164966
c    -2.171612
d     3.864592
e    24.000000
dtype: float64

In [50]:
s * 2

z    -0.213318
b     1.164966
c    -2.171612
d     3.864592
e    24.000000
dtype: float64

the following skip the first element

In [52]:
s[1:]

b     0.582483
c    -1.085806
d     1.932296
e    12.000000
dtype: float64

the following skip the last element

In [53]:
s[:-1]

z   -0.106659
b    0.582483
c   -1.085806
d    1.932296
dtype: float64

In [51]:
s[1:] + s[:-1]

b    1.164966
c   -2.171612
d    3.864592
e         NaN
z         NaN
dtype: float64

The result of an operation between unaligned Series will have the **union** of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing `NaN`. Being able to write code without doing any explicit data alignment grants immense freedom and flexibility in interactive data analysis and research. The integrated data alignment features of the pandas data structures set pandas apart from the majority of related tools for working with labeled data.

## Name attribute

In [54]:
s = pd.Series(np.random.randn(5), name='something')

In [55]:
s

0    0.504974
1    0.333896
2   -0.183734
3    0.475643
4   -0.812643
Name: something, dtype: float64

In [56]:
s.name

'something'

# DataFrame

## Intro



**DataFrame** is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:
* Dict of 1D ndarrays, lists, dicts, or Series
* 2-D numpy.ndarray
* Structured or record ndarray
* A Series
* Another DataFrame

Along with the data, you can optionally pass **index** (row labels) and **columns** (column labels) arguments. If you pass an index and / or columns, you are guaranteeing the index and / or columns of the resulting DataFrame. Thus, a dict of Series plus a specific index will discard all data not matching up to the passed index.

If axis labels are not passed, they will be constructed from the input data based on common sense rules.


## From dict of Series or dicts



The result **index** will be the **union** of the indexes of the various Series. If there are any nested dicts, these will be first converted to Series. If no columns are passed, the columns will be the sorted list of dict keys.

In [57]:
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

In [58]:
d

{'one': a    1.0
 b    2.0
 c    3.0
 dtype: float64, 'two': a    1.0
 b    2.0
 c    3.0
 d    4.0
 dtype: float64}

In [59]:
df = pd.DataFrame(d)

In [60]:
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [61]:
pd.DataFrame(d, index=['d', 'b', 'a'])

Unnamed: 0,one,two
d,,4.0
b,2.0,2.0
a,1.0,1.0


In [62]:
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

Unnamed: 0,two,three
d,4.0,
b,2.0,
a,1.0,


The row and column labels can be accessed respectively by accessing the **index** and **columns** attributes:

In [63]:
df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [64]:
df.columns

Index(['one', 'two'], dtype='object')

## From dict of ndarrays / lists

In [65]:
d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}

In [66]:
pd.DataFrame(d)

Unnamed: 0,one,two
0,1.0,4.0
1,2.0,3.0
2,3.0,2.0
3,4.0,1.0


In [67]:
pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

Unnamed: 0,one,two
a,1.0,4.0
b,2.0,3.0
c,3.0,2.0
d,4.0,1.0


## From a list of dicts

In [68]:
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

In [71]:
pd.DataFrame(data2)

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [72]:
pd.DataFrame(data2, index=['first', 'second'])

Unnamed: 0,a,b,c
first,1,2,
second,5,10,20.0


In [73]:
pd.DataFrame(data2, columns=['a', 'b'])

Unnamed: 0,a,b
0,1,2
1,5,10


There are many other ways to construct a DataFrame. Do consult the online [guide](http://pandas.pydata.org/pandas-docs/stable/dsintro.html).

## Column selection, addition, deletion



You can treat a DataFrame semantically like a dict of like-indexed Series objects. Getting, setting, and deleting columns works with the same syntax as the analogous dict operations:

In [74]:
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [75]:
df['three'] = df['one'] * df['two']

In [76]:
df

Unnamed: 0,one,two,three
a,1.0,1.0,1.0
b,2.0,2.0,4.0
c,3.0,3.0,9.0
d,,4.0,


In [77]:
df['flag'] = df['one'] > 2

In [78]:
df

Unnamed: 0,one,two,three,flag
a,1.0,1.0,1.0,False
b,2.0,2.0,4.0,False
c,3.0,3.0,9.0,True
d,,4.0,,False


Columns can be deleted or popped like with a dict:

In [79]:
del df['two']

In [80]:
df

Unnamed: 0,one,three,flag
a,1.0,1.0,False
b,2.0,4.0,False
c,3.0,9.0,True
d,,,False


In [81]:
three = df.pop('three')

In [82]:
df

Unnamed: 0,one,flag
a,1.0,False
b,2.0,False
c,3.0,True
d,,False


In [83]:
three

a    1.0
b    4.0
c    9.0
d    NaN
Name: three, dtype: float64

In [84]:
type(three)

pandas.core.series.Series

When inserting a scalar value, it will naturally be propagated to fill the column:

In [85]:
df['foo'] = 'bar'
df

Unnamed: 0,one,flag,foo
a,1.0,False,bar
b,2.0,False,bar
c,3.0,True,bar
d,,False,bar


When inserting a Series that does not have the same index as the DataFrame, it will be conformed to the DataFrame’s index:

In [86]:
df['one_trunc'] = df['one'][:2]
df

Unnamed: 0,one,flag,foo,one_trunc
a,1.0,False,bar,1.0
b,2.0,False,bar,2.0
c,3.0,True,bar,
d,,False,bar,


You can insert raw ndarrays but their length must match the length of the DataFrame’s index.

By default, columns get inserted at the end. The insert function is available to insert at a particular location in the columns:

In [87]:
df.insert(1, 'bar', df['one'] * 1.5)  # df.insert(loc, column, value, allow_duplicates=False)
df

Unnamed: 0,one,bar,flag,foo,one_trunc
a,1.0,1.5,False,bar,1.0
b,2.0,3.0,False,bar,2.0
c,3.0,4.5,True,bar,
d,,,False,bar,


## Indexing / Selection




| Operation	| Syntax	| Result| 
| ----------| ----------| ------| 
| Select column	| df[col]	| Series | 
| Select row by label	| df.loc[label]	| Series | 
| Select row by integer location	| df.iloc[loc]	| Series | 
| Slice rows	| df[5:10]	| DataFrame | 
| Select rows by boolean vector	| df[bool_vec]	| DataFrame | 

Row selection, for example, returns a Series whose index is the columns of the DataFrame:

In [88]:
df

Unnamed: 0,one,bar,flag,foo,one_trunc
a,1.0,1.5,False,bar,1.0
b,2.0,3.0,False,bar,2.0
c,3.0,4.5,True,bar,
d,,,False,bar,


In [89]:
df.loc['b']

one              2
bar              3
flag         False
foo            bar
one_trunc        2
Name: b, dtype: object

In [90]:
df.iloc[2]

one             3
bar           4.5
flag         True
foo           bar
one_trunc     NaN
Name: c, dtype: object

## Data alignment and arithmetic



Data alignment between DataFrame objects automatically align on **both the columns and the index (row labels)**. Again, the resulting object will have the union of the column and row labels.

In [91]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

In [92]:
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])

In [93]:
df

Unnamed: 0,A,B,C,D
0,-0.383076,-1.233236,-2.065057,0.546332
1,-0.257518,1.937704,-1.696795,-0.872099
2,1.174938,-0.541473,-0.329249,1.106674
3,0.493192,-0.661042,-0.921277,-1.294797
4,0.441714,1.699518,-0.890056,1.514154
5,0.617807,0.613376,-1.224374,-1.075437
6,-0.260978,-0.465404,-0.349489,-1.25224
7,-0.599106,-1.269217,0.618379,-0.798279
8,0.586456,-0.647067,-1.336722,0.896365
9,-0.079082,0.686816,1.000471,-0.377698


In [94]:
df2

Unnamed: 0,A,B,C
0,0.152927,-0.534944,1.126506
1,0.469932,-0.32913,-1.326366
2,-1.20917,0.034882,-0.75161
3,0.075705,1.191429,-0.417639
4,-1.23092,-0.510479,-0.941617
5,1.091165,0.763735,-0.407439
6,0.343113,0.153749,0.726411


In [95]:
df + df2

Unnamed: 0,A,B,C,D
0,-0.23015,-1.768179,-0.938551,
1,0.212414,1.608574,-3.023161,
2,-0.034232,-0.506592,-1.080859,
3,0.568897,0.530387,-1.338916,
4,-0.789205,1.18904,-1.831674,
5,1.708973,1.377111,-1.631813,
6,0.082135,-0.311655,0.376922,
7,,,,
8,,,,
9,,,,


When doing an operation between DataFrame and Series, the default behavior is to align the Series **index** on the DataFrame **columns**, thus broadcasting row-wise. For example:

In [97]:
df

Unnamed: 0,A,B,C,D
0,-0.383076,-1.233236,-2.065057,0.546332
1,-0.257518,1.937704,-1.696795,-0.872099
2,1.174938,-0.541473,-0.329249,1.106674
3,0.493192,-0.661042,-0.921277,-1.294797
4,0.441714,1.699518,-0.890056,1.514154
5,0.617807,0.613376,-1.224374,-1.075437
6,-0.260978,-0.465404,-0.349489,-1.25224
7,-0.599106,-1.269217,0.618379,-0.798279
8,0.586456,-0.647067,-1.336722,0.896365
9,-0.079082,0.686816,1.000471,-0.377698


In [96]:
df - df.iloc[0]

Unnamed: 0,A,B,C,D
0,0.0,0.0,0.0,0.0
1,0.125558,3.170939,0.368263,-1.418431
2,1.558014,0.691762,1.735808,0.560342
3,0.876268,0.572194,1.14378,-1.841128
4,0.824791,2.932754,1.175001,0.967822
5,1.000884,1.846612,0.840683,-1.621769
6,0.122099,0.767832,1.715568,-1.798572
7,-0.216029,-0.035981,2.683437,-1.344611
8,0.969533,0.586169,0.728335,0.350033
9,0.303994,1.920052,3.065529,-0.92403


For explicit control over the matching and broadcasting behavior, see the section on flexible binary operations.

Operations with scalars are just as you would expect:

In [106]:
df

Unnamed: 0,A,B,C
2000-01-01,0.291732,0.943954,-1.324885
2000-01-02,-1.28931,2.585367,0.384785
2000-01-03,1.976148,1.837944,-2.517211
2000-01-04,0.449238,-0.430547,-0.42154
2000-01-05,-1.660431,-1.235621,-0.167141
2000-01-06,-0.735085,0.07064,-1.037234
2000-01-07,0.200631,1.028939,0.204247
2000-01-08,-0.359582,1.043468,-0.278628


In [107]:
df * 5 + 2

Unnamed: 0,A,B,C
2000-01-01,3.45866,6.71977,-4.624427
2000-01-02,-4.446548,14.926837,3.923925
2000-01-03,11.880741,11.18972,-10.586056
2000-01-04,4.246188,-0.152734,-0.1077
2000-01-05,-6.302156,-4.178105,1.164295
2000-01-06,-1.675427,2.353199,-3.186168
2000-01-07,3.003156,7.144697,3.021236
2000-01-08,0.202089,7.217342,0.606862


In [108]:
1 / df

Unnamed: 0,A,B,C
2000-01-01,3.427805,1.059374,-0.754782
2000-01-02,-0.775609,0.386792,2.598854
2000-01-03,0.506035,0.544086,-0.397265
2000-01-04,2.225993,-2.322627,-2.372254
2000-01-05,-0.602253,-0.80931,-5.98297
2000-01-06,-1.360386,14.156333,-0.964103
2000-01-07,4.984267,0.971875,4.896027
2000-01-08,-2.781005,0.958342,-3.589021


In [109]:
df ** 4

Unnamed: 0,A,B,C
2000-01-01,0.007243,0.793969,3.081153
2000-01-02,2.763305,44.677515,0.021922
2000-01-03,15.250289,11.411144,40.14936
2000-01-04,0.040729,0.034362,0.031576
2000-01-05,7.601225,2.330994,0.00078
2000-01-06,0.291979,2.5e-05,1.157461
2000-01-07,0.00162,1.12088,0.00174
2000-01-08,0.016718,1.185543,0.006027


Boolean operators work as well:

In [117]:
df1 = pd.DataFrame({'a' : [1, 0, 1], 'b' : [0, 1, 0] }, dtype=bool)

In [118]:
df2 = pd.DataFrame({'a' : [0, 1, 1], 'b' : [1, 1, 0] }, dtype=bool)

In [119]:
df1

Unnamed: 0,a,b
0,True,False
1,False,True
2,True,False


In [120]:
df2

Unnamed: 0,a,b
0,False,True
1,True,True
2,True,False


In [121]:
df1 & df2

Unnamed: 0,a,b
0,False,False
1,False,True
2,True,False


[XOR operations](https://stackoverflow.com/questions/14526584/what-does-the-xor-operator-do/14526640)

In [122]:
df1 ^ df2

Unnamed: 0,a,b
0,True,True
1,True,False
2,False,False


In [115]:
df1 | df2

Unnamed: 0,a,b
0,True,True
1,True,True
2,True,True


In [123]:
-df1

Unnamed: 0,a,b
0,False,True
1,True,False
2,False,True


## Transposing

In [126]:
df[:5]

Unnamed: 0,A,B,C
2000-01-01,0.291732,0.943954,-1.324885
2000-01-02,-1.28931,2.585367,0.384785
2000-01-03,1.976148,1.837944,-2.517211
2000-01-04,0.449238,-0.430547,-0.42154
2000-01-05,-1.660431,-1.235621,-0.167141


In [127]:
df[:5].T

Unnamed: 0,2000-01-01 00:00:00,2000-01-02 00:00:00,2000-01-03 00:00:00,2000-01-04 00:00:00,2000-01-05 00:00:00
A,0.291732,-1.28931,1.976148,0.449238,-1.660431
B,0.943954,2.585367,1.837944,-0.430547,-1.235621
C,-1.324885,0.384785,-2.517211,-0.42154,-0.167141


## DataFrame interoperability with NumPy functions



Elementwise NumPy ufuncs (log, exp, sqrt, ...) and various other NumPy functions can be used with no issues on DataFrame, assuming the data within are numeric:

In [128]:
np.exp(df)

Unnamed: 0,A,B,C
2000-01-01,1.338744,2.570124,0.265833
2000-01-02,0.275461,13.268162,1.469298
2000-01-03,7.214899,6.283606,0.080684
2000-01-04,1.567117,0.650153,0.656036
2000-01-05,0.190057,0.290654,0.84608
2000-01-06,0.479464,1.073195,0.354434
2000-01-07,1.222174,2.798097,1.226601
2000-01-08,0.697968,2.839047,0.756822


In [129]:
np.asarray(df)

array([[ 0.29173192,  0.94395401, -1.32488537],
       [-1.28930955,  2.58536731,  0.38478499],
       [ 1.97614824,  1.83794409, -2.5172112 ],
       [ 0.44923764, -0.43054688, -0.42154   ],
       [-1.66043122, -1.23562109, -0.16714106],
       [-0.73508544,  0.07063976, -1.03723361],
       [ 0.20063129,  1.02893939,  0.20424725],
       [-0.35958225,  1.04346836, -0.27862751]])

The dot method on DataFrame implements matrix multiplication:

In [130]:
df.T.dot(df)

Unnamed: 0,A,B,C
A,9.321338,2.211637,-4.865223
B,2.211637,14.817874,-4.648149
C,-4.865223,-4.648149,9.640569


Similarly, the dot method on Series implements dot product:

In [131]:
s1 = pd.Series(np.arange(5,10))

In [132]:
s1.dot(s1)

255

## Console display



Very large DataFrames will be truncated to display them in the console. You can also get a summary using info(). 

In [133]:
df = pd.DataFrame(np.random.randn(500, 4), columns=['A', 'B', 'C', 'D'])

In [134]:
print(df)

            A         B         C         D
0   -0.856843 -1.068949 -0.833794 -0.288582
1   -0.159036  0.184452 -0.820928  1.174150
2   -1.171890 -0.685827  1.263608  0.000914
3    1.477287  0.066760  0.053417 -1.127408
4    0.676262  0.024737  1.768461  1.457240
5    0.092648 -0.123026 -1.782111  2.100467
6   -1.199846 -0.256040  0.557455 -1.348760
7    1.875909 -0.272914 -0.800270 -0.154401
8   -0.108462  1.181654 -0.545205  0.563896
9   -0.119332 -1.437341  0.107047 -1.035811
10  -0.013148  0.320056 -2.513591  1.259411
11   2.517169  0.214897 -1.021986 -0.262164
12   2.405141  0.191959 -2.020024  0.113615
13  -0.009625  1.078980  0.643264 -0.676977
14   0.721228 -0.017914 -1.356100  1.945069
15   0.533011  0.338964 -0.350769 -1.505536
16   0.808081  0.661511  0.458973  0.500839
17   0.591187 -0.151879  0.005175  2.052637
18   0.300000 -0.423149  0.802199 -0.565240
19   0.096239 -0.970986  0.116054  0.342849
20  -1.644028  0.103736  0.156602  0.549821
21   0.982177  0.565276 -0.62562

In [135]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 4 columns):
A    500 non-null float64
B    500 non-null float64
C    500 non-null float64
D    500 non-null float64
dtypes: float64(4)
memory usage: 15.7 KB


In [136]:
print(df.iloc[-20:, :12].to_string())

            A         B         C         D
480 -1.132031 -0.195468  0.059605  0.778816
481 -1.562094 -1.651011 -0.230847 -0.971640
482  0.042592  0.723569 -2.044183 -0.549077
483 -1.024983 -0.830365 -0.876432  0.120420
484  1.107465 -1.454727 -1.000799 -0.428882
485  1.182665 -0.861935 -3.342719  2.298219
486 -0.336618  1.721098 -0.551971 -0.267299
487 -0.654249  0.105189  0.039562  0.275228
488 -0.744251 -0.373349  0.252721  0.909697
489 -0.359583 -0.307848  1.689361 -0.694235
490 -0.280900  1.133732 -0.929365  0.471828
491  0.682179  0.887725  2.536359  0.790019
492  0.065653  0.375410 -0.651512 -0.534771
493  0.970626 -0.312228  0.824873  0.060058
494  0.594176 -0.472459 -0.356829 -0.676115
495  0.339018 -0.962056 -0.766998 -0.658631
496 -1.541919  1.344653  0.270481 -1.445945
497  0.399963 -0.196979  0.404211 -0.584701
498  0.410004  0.998686 -2.605658  0.505163
499 -0.379931  0.336479 -2.220719 -0.071166


## DataFrame column attribute access and IPython completion



If a DataFrame column label is a valid Python variable name, the column can be accessed like attributes:

In [137]:
df = pd.DataFrame({'foo1' : np.random.randn(5),
                   'foo2' : np.random.randn(5)})

In [138]:
df

Unnamed: 0,foo1,foo2
0,-0.932314,-1.489616
1,-1.595774,1.541437
2,0.408341,0.337898
3,0.081706,0.552303
4,1.25289,-0.57279


In [139]:
df.foo1

0   -0.932314
1   -1.595774
2    0.408341
3    0.081706
4    1.252890
Name: foo1, dtype: float64

In [140]:
df['foo1']

0   -0.932314
1   -1.595774
2    0.408341
3    0.081706
4    1.252890
Name: foo1, dtype: float64

***