## Subsetting Data
### Working with pandas
*Curtis Miller*

In this notebook we will subset `Series` and `DataFrame`s in a variety of ways.

We start by creating some `Series` and `DataFrame`s to work with.

In [1]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

In [3]:
srs = Series(np.arange(5),
             index=["alpha", "beta", "gamma", "delta", "epsilon"])
srs

alpha      0
beta       1
gamma      2
delta      3
epsilon    4
dtype: int32

In [4]:
srs[:2]

alpha    0
beta     1
dtype: int32

In [5]:
srs[["beta", "delta"]]

beta     1
delta    3
dtype: int32

In [6]:
srs["beta":"delta"]     # Select everything BETWEEN (and
                        # including) beta and delta

beta     1
gamma    2
delta    3
dtype: int32

In [7]:
srs[srs > 3]    # Select elements of srs greater than 3

epsilon    4
dtype: int32

In [8]:
srs > 3    # A look at the indexing object

alpha      False
beta       False
gamma      False
delta      False
epsilon     True
dtype: bool

Consider the following code `Series`. Notice the index is the numbers between 0 and 4 rearranged.

In [9]:
srs2 = Series(["zero", "one", "two", "three", "four"],
              index=[3, 2, 4, 0, 1])

srs2

3     zero
2      one
4      two
0    three
1     four
dtype: object

What will the following do?

In [10]:
srs2[2:4]    # Ambiguous

4      two
0    three
dtype: object

In [11]:
srs2.iloc[2:4]

4      two
0    three
dtype: object

In [12]:
srs2.loc[2:4]

2    one
4    two
dtype: object

Now let's work with `DataFrame`s.

In [13]:
df = DataFrame(np.arange(21).reshape(7, 3),
               columns=['AAA', 'BBB', 'CCC'],
               index=["alpha", "beta", "gamma", "delta",
                      "epsilon", "zeta", "eta"])
df

Unnamed: 0,AAA,BBB,CCC
alpha,0,1,2
beta,3,4,5
gamma,6,7,8
delta,9,10,11
epsilon,12,13,14
zeta,15,16,17
eta,18,19,20


In [14]:
df.AAA

alpha       0
beta        3
gamma       6
delta       9
epsilon    12
zeta       15
eta        18
Name: AAA, dtype: int32

In [None]:
df['AAA']

In [15]:
df[['BBB', 'CCC']]

Unnamed: 0,BBB,CCC
alpha,1,2
beta,4,5
gamma,7,8
delta,10,11
epsilon,13,14
zeta,16,17
eta,19,20


In [16]:
df.iloc[1:3, 1:2]

Unnamed: 0,BBB
beta,4
gamma,7


In [17]:
df.loc['beta':'delta', 'BBB':'CCC']

Unnamed: 0,BBB,CCC
beta,4,5
gamma,7,8
delta,10,11


In [18]:
df.iloc[:, 1:3]

Unnamed: 0,BBB,CCC
alpha,1,2
beta,4,5
gamma,7,8
delta,10,11
epsilon,13,14
zeta,16,17
eta,19,20


In [19]:
df.iloc[:, 1:3].loc[['alpha', 'gamma', 'zeta']]    # Mixing

Unnamed: 0,BBB,CCC
alpha,1,2
gamma,7,8
zeta,16,17


In [20]:
df2 = df.iloc[:, 1:3].loc[['alpha', 'gamma', 'zeta']].copy()

df2

Unnamed: 0,BBB,CCC
alpha,1,2
gamma,7,8
zeta,16,17


Let's now look at changing the contents of `DataFrame`s.

In [21]:
df2['CCC'] = Series({'alpha': 11, 'gamma': 18, 'zeta': 5})

df2

Unnamed: 0,BBB,CCC
alpha,1,11
gamma,7,18
zeta,16,5


In [22]:
df2.iloc[1, 1] = 2
df2

Unnamed: 0,BBB,CCC
alpha,1,11
gamma,7,2
zeta,16,5


In [23]:
df2.iloc[:, 1] = 0
df2

Unnamed: 0,BBB,CCC
alpha,1,0
gamma,7,0
zeta,16,0
