## DataFrame 对象的访问

In [1]:
import pandas as pd
data = {'one':pd.Series([1.,2.,3.],index=["a","b","c"]),
        "two":pd.Series([2.,3.,4.],index=["b","c","d"])}
df = pd.DataFrame(data)
print(df)

   one  two
a  1.0  NaN
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0


In [2]:
# 访问行索引
df[0:2]

Unnamed: 0,one,two
a,1.0,
b,2.0,2.0


In [3]:
# 访问列索引
df['one']

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

## 总结
#### <font color="red">dataframe 加 [ ] 方式 访问数据，没办法直接访问行值，必须用切片，还必须是数字标号。访问列用label标志。</font>

In [4]:
# 选取多列
df[["one","two"]]

Unnamed: 0,one,two
a,1.0,
b,2.0,2.0
c,3.0,3.0
d,,4.0


## <font color="red">选取元素</font>
- loc

In [5]:
print(df.loc[:,["one"]])

   one
a  1.0
b  2.0
c  3.0
d  NaN


In [6]:
print(df.loc["a","one"])

1.0


In [7]:
print(df.index)
print(df.columns)

Index(['a', 'b', 'c', 'd'], dtype='object')
Index(['one', 'two'], dtype='object')


In [8]:
help(df.loc)

Help on _LocIndexer in module pandas.core.indexing object:

class _LocIndexer(_LocationIndexer)
 |  Purely label-location based indexer for selection by label.
 |  
 |  ``.loc[]`` is primarily label based, but may also be used with a
 |  boolean array.
 |  
 |  Allowed inputs are:
 |  
 |  - A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
 |    interpreted as a *label* of the index, and **never** as an
 |    integer position along the index).
 |  - A list or array of labels, e.g. ``['a', 'b', 'c']``.
 |  - A slice object with labels, e.g. ``'a':'f'`` (note that contrary
 |    to usual python slices, **both** the start and the stop are included!).
 |  - A boolean array.
 |  - A ``callable`` function with one argument (the calling Series, DataFrame
 |    or Panel) and that returns valid output for indexing (one of the above)
 |  
 |  ``.loc`` will raise a ``KeyError`` when the items are not found.
 |  
 |  See more at :ref:`Selection by Label <indexing.label>`
 |  
 |  Method 

## df.iloc 按标号访问

In [18]:
help(df.iloc)

Help on _iLocIndexer in module pandas.core.indexing object:

class _iLocIndexer(_LocationIndexer)
 |  Purely integer-location based indexing for selection by position.
 |  
 |  ``.iloc[]`` is primarily integer position based (from ``0`` to
 |  ``length-1`` of the axis), but may also be used with a boolean
 |  array.
 |  
 |  Allowed inputs are:
 |  
 |  - An integer, e.g. ``5``.
 |  - A list or array of integers, e.g. ``[4, 3, 0]``.
 |  - A slice object with ints, e.g. ``1:7``.
 |  - A boolean array.
 |  - A ``callable`` function with one argument (the calling Series, DataFrame
 |    or Panel) and that returns valid output for indexing (one of the above)
 |  
 |  ``.iloc`` will raise ``IndexError`` if a requested indexer is
 |  out-of-bounds, except *slice* indexers which allow out-of-bounds
 |  indexing (this conforms with python/numpy *slice* semantics).
 |  
 |  See more at :ref:`Selection by Position <indexing.integer>`
 |  
 |  Method resolution order:
 |      _iLocIndexer
 |     

In [24]:
print(df.iloc[0,0])
print(df.iloc[0,1])
print(df.iloc[0:1,1:2])
print(df.iloc[[0,2],[0,1]])

1.0
nan
   two
a  NaN
   one  two
a  1.0  NaN
c  3.0  3.0


## df.ix 结合label 和 index 访问两种方式

In [29]:
print(df.ix[1,1])
print(df.ix[1,"one"])
print(df.ix[1,"two"])
print(df.ix[0:2,["one","two"]])

2.0
2.0
2.0
   one  two
a  1.0  NaN
b  2.0  2.0


In [27]:
df

Unnamed: 0,one,two
a,1.0,
b,2.0,2.0
c,3.0,3.0
d,,4.0


## 条件选取 df.loc

In [30]:
print(df.loc[df.one>1])

   one  two
b  2.0  2.0
c  3.0  3.0


In [31]:
print(df.iloc[df.one>1])

ValueError: iLocation based boolean indexing cannot use an indexable as a mask

In [32]:
print(df.ix[df.one>1])

   one  two
b  2.0  2.0
c  3.0  3.0
