# Selection

## Indexing and selecting data

The axis labeling information in pandas objects serves many purposes:

* Identifies data
* Enables automatic and explicit data alignment.
* Getting and setting of subsets of the data set.

Learn about how to slice, dice, and generally get and set subsets of pandas objects.
<br><br>
Explore!!

## Selection by label

.loc is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. <br><br>Allowed inputs are:

* A single label, e.g. 5 or 'a' <br>(Note that 5 is interpreted as a label of the index. This use is not an integer position along the index.).
<br>
* A list or array of labels e.g. ['a', 'b', 'c'].
<br>
* A slice object with labels e.g. 'a':'f' <br>(Note that contrary to usual python slices, both the start and the stop are included, when present in the index! See Slicing with labels.).
<br>
* A boolean array.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df = pd.read_csv('../data/ted.csv')

In [None]:
# infomation about rows
df.index

In [None]:
# [row number]
df.loc[0]

In [None]:
# What happens?  --> error, but why?
# df.loc[-1]

In [None]:
# Let's look at the top & bottom rows
df.head(3)

In [None]:
df.tail(3)

In [None]:
# fix
df.loc[2549]

In [None]:
# fix to get DataFrame
df.loc[[2549], :]

In [None]:
# [multiple rows]
df.loc[[0, 2549]]

In [None]:
# explicit [multiple rows]
df.loc[[0, 2549], :]  # [[rows], columns]

In [None]:
# But what if row indices is not numbers?
...

In [None]:
df.columns

In [None]:
# [rows, [columns]]
df.loc[:, ['comments', 'event']]

In [None]:
df['event'] == 'TED2017'

In [None]:
# a boolean array
df.loc[df['event'] == 'TED2017']

In [None]:
# mask dataframe  & only return the value is equal to true
mask = (df['num_speaker'] < 7) & (df['event'] == 'TED2017')

df.loc[mask]

## Selection by position

.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. 
<br>
.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. <br>(this conforms with Python/NumPy slice semantics). <br><br>Allowed inputs are:

* An integer e.g. 5.
<br>
* A list or array of integers e.g. [4, 3, 0].
<br>
* A slice object with ints e.g. 1:7.
<br>
* A boolean array.  # TODO: find examples

In [None]:
# row
row = df.iloc[0]
row

In [None]:
type(row)

In [None]:
df.iloc[-1]

In [None]:
# multiple rows
rows = df.iloc[0:3]
rows

In [None]:
# By integer slices, acting similar to numpy/python
df.iloc[3:5, 0:2]

In [None]:
# [row, [columns]]
df.iloc[:, [0, 3]]