# Data Indexing and Selection

We'll look at similar means of accessing and modifying values in Pandas `Series` and `DataFrame` objects. If you have used the NumPy patterns, the corresponding patterns in Pandas will feel very familiar, though there are a few quirks to be aware of.

## Data Selection in Series

As we saw in the previous section, a `Series` object acts in many ways like one-dimensional NumPy array, and in many ways like a standard Python dictionary. If we keep these two overlapping analogies in mind, it will help us to understand the patterns of data indexing and selection in these arrays.

### Series as dictionary

Like a dictionary, the `Series` object provides a mapping form a collection of keys to a collection of values:

In [2]:
import pandas as pd

data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])

data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [3]:
data['b']

0.5

In [4]:
'a' in data

True

In [8]:
data.keys()

Index(['a', 'b', 'c', 'd'], dtype='object')

In [9]:
data.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [7]:
list(data.items())


[('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]

`Series` objects can even be modified with a dictionary-like syntax. Just as you can extend a dictionary by assigning to a new key, you can extend a `Series` by assigning to a new index value:

In [12]:
data['e'] = 1.25
data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.25
dtype: float64

This easy mutability of the objects is a convenient feature: under the hood, Pandas is making decision about memory layout and data copying that might need to take place; the user generally does not need to worry about these issues.

### Series as one-dimensional array

A `Series` builds on this dictionary-like interface and provides array-style item selection via the same basic mechanisms as NumPy arrays - that is, *slices*, *masking*, and *fancy indexing*. Examples of these are as follows:

In [13]:
# slicing by explicit index

data['a': 'c']

a    0.25
b    0.50
c    0.75
dtype: float64

In [14]:
# slicing by implicit integer index
data[0:2]

a    0.25
b    0.50
dtype: float64

In [17]:
# masking 
data[(data > 0.3) & (data < 0.8)]

b    0.50
c    0.75
dtype: float64

In [18]:
# fancy indexing
data[['a', 'e']]

a    0.25
e    1.25
dtype: float64

In [20]:
data[[0, 4]]

a    0.25
e    1.25
dtype: float64

Among these, slicing may be the source of the most confusion. Notice that when slicing with an explicit index(ie., `data['a',:'c']`), the fina lindex is *included* in the slice, while when slicing with an implicit index(i.e `data[0:2]`), the final index is *excluded* from the slice.

### Indexers: loc, iloc, and ix 
These slicing and indexing conventiosn can be a source of confusion. For example, if you `Series` has an explicit integer index, and indexing operation such as `data[1]` will use the explicit indices, while a slicing operation like `data[1:3]` will use the implicit Python-style index.

In [21]:
data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])
data

1    a
3    b
5    c
dtype: object

In [22]:
# explicit index when indexing
data[1]

'a'

In [24]:
# implicit index when slicing
data[1:3]

3    b
5    c
dtype: object

Because of this potential confusion in the case of integer indexes, Pandas provides some special *indexer* attributes that explicity expose certain indexing schemes. These are not functional methods, but attributes that expose a particular slicing interface otthe data in the `Series`.

First, the `loc` attribute allows indexing and slicing that always references the explicit index:

In [30]:
# Explicit indexing
data.loc[1:2]

1    a
dtype: object

In [29]:
# Implicit indexing
data[1:2]

3    b
dtype: object

The `iloc` attribute allows indexing and slicing that always references the implicit Python-style index( 0 as the first item and so)

In [32]:
data.iloc[1]

'b'

In [37]:
data[1:2]

3    b
dtype: object

In [38]:
# implicit indexing
data.iloc[1:3]

3    b
5    c
dtype: object

In [40]:
# explicit indexing
data.loc[1:3]

1    a
3    b
dtype: object

A third indexing attribute, `ix`, is a hybrid of the two, and for `Series` objects is equivalent to standard `[]` base indexing. The purpose of the `ix` indexer will become more apparent in the context of `DataFrame` objects, which we will discuss in a moment.

One guiding principle of Python code is that "explicit is better that implicit." The explicit nature of `loc` and `iloc` make them very useful in maintaining clean and readable code; especially in the case of integer indexes, I recommend using these both to make code easier to rad and understand, and to prevent subtle bugs due to the mixed indexing/slicing convention.