# Data Indexing and Selection

## Data Selection in Series


**Series as dictionary**

In [2]:
import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
'a' in data
data.keys()
list(data.items())

[('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]

Series objects can even be modified with a dictionary-like syntax. Just as you can extend a dictionary by assigning to a new key, you can extend a Series by assigning to a new index value:

In [3]:
data['e'] = 1.25

**Series as one-dimensional array**

In [None]:
data['a': 'c']
data[0: 2]
data[(data > 0.3) & (data <0.8)]
data[['a', 'e']]

**Pay Attention!**:
Among these, slicing may be the source of the most confusion. Notice that when slicing with an explicit index (i.e., data['a': 'c']), the final index is *included* in the slice, while when slicing with an implicit index (i.e., data[0: 2]), the final index is excluded from the slice.

**Indexers: loc, iloc, and ix**  
* The **loc** attribute allows indexing and slicing that always references the explicit index. 
* The **iloc** attribute allows indexing and slicing that always references the implicit Python-style index  
* The **ix** is a hybrid of the two

## Data Selection in DataFrame

**DataFrame as a dictionary**

In [4]:
area = pd.Series({'California': 423967, 'Texas': 695662, 'New York': 141297, 'Florida': 170312, 'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})

In [6]:
data['area']
data.area
data['density'] = data['pop'] / data['area']

If the column names are not strings, or if the column names conflict with methods of the DataFrame, this attribute-style access is not possible.  
In particular, you should avoid the temptation to try column assignment via attribute (i.e., use data['pop'] = z rather than data.pop = z).

**DataFrame as two-dimensional array**

In [None]:
data.values
data.T    # Transpose
data.iloc[:3, :2]
data.loc[: 'Illinois', : 'pop']
data.ix[: 3, :'pop']

Any of the familiar NumPy-style data access patterns can be used with these indexers. For example, in the **loc** indexer we can combine masking and fancy indexing as in the following:

In [None]:
data.loc[data.density > 100, ['pop', 'density']]
data.iloc[0, 2] = 90

**Additional indexing conventions**

While indexing refers to columns, slicing refers to rows!!!

In [33]:
data['Florida': 'Illinois']

Unnamed: 0,area,pop,density
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


Such slices can also refer to rows by number rahter than by index

In [34]:
data[1: 3]

Unnamed: 0,area,pop,density
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746


Similarly, direct masking operations are also interpreted row-wise rather than column-wise:

In [35]:
data[data.density > 100]

Unnamed: 0,area,pop,density
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121


**Index Alignment**

For binary operations on two Serise or DataFrame objects, Pandas will align indices in the process of performing the operation.

In [10]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B
A.add(B, fill_value=0)

In [26]:
A = pd.DataFrame(rng.randint(0, 20, (2, 2)), columns=list('AB'))
B = pd.DataFrame(rng.randint(0, 10, (3, 3)), columns=list('BAC'))
A + B
fill = A.stack().mean()
A.add(B, fill_value=0)

Unnamed: 0,A,B,C
0,18.0,20.0,8.0
1,18.0,20.0,2.0
2,4.0,6.0,8.0


**Operations Between DataFrame and Series**

In [57]:
df = pd.DataFrame(A, columns=list('QRST'))
df - df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,-5,8,5,1
2,-2,0,0,-2


If you would instead like to operate column-wise, you can use the object methods mentioned earlier, while specifying the _axis_ keyword:

In [71]:
df.subtract(df['R'], axis=0)

Unnamed: 0,Q,R,S,T
0,5,0,2,7
1,-8,0,-1,0
2,3,0,2,5


Note that these _DataFrame/Series_ operations, like the operations discussed above, will automatically align indices between the two elements:

In [72]:
halfrow = df.iloc[0, ::2]
df - halfrow

Unnamed: 0,Q,R,S,T
0,0.0,,0.0,
1,-5.0,,5.0,
2,-2.0,,0.0,
