In [1]:
import pandas as pd

## Selection
We often want to select specific columns or rows from the DataFrame for processing. The selection can be done based on column or row label, or based on a positional index (number). 

Let's make an example DataFrame to experiment with:

In [2]:
example_df = pd.DataFrame({"A":[1.1,2.1,3.1],"B":[12,22,32]},
                    index=['obj1','obj2','obj3']) #example DataFrame
print(example_df)

        A   B
obj1  1.1  12
obj2  2.1  22
obj3  3.1  32


The columns of `example_df` can be accessed like dictionary fields: 

In [3]:
example_df["A"]

obj1    1.1
obj2    2.1
obj3    3.1
Name: A, dtype: float64

Note that the row labels are maintained when selecting a column, and the output can be considered a one-dimensional DataFrame. (However, the actual Pandas class of the output is a `Series`.) 

You can also select multiple columns at once:

In [4]:
example_df[["B","A"]]

Unnamed: 0,B,A
obj1,12,1.1
obj2,22,2.1
obj3,32,3.1


The rows can be selected by slicing, as we've done with lists: 

In [5]:
example_df[0:2] # selects rows 0 and 1

Unnamed: 0,A,B
obj1,1.1,12
obj2,2.1,22


Note that this selection is based on position and thus the row labels are ignored. 
There is a difference with list-style indexing: using an integer to select a single column is **not** allowed.

In [6]:
example_df[2] # not allowed, use instead: example_df[2:3]

KeyError: 2

Pandas recommends using the `loc` property of the DataFrame for label-based selection, and the `iloc` properties for position-based selection. 
For these properties you can specify both the desired row(s) and desired column(s):

In [7]:
example_df.loc['obj3']

A     3.1
B    32.0
Name: obj3, dtype: float64

This does require that the row labels are strings. 

You can also slice with labels. However, note that in contrast to slicing with positions, the final element is included in the result:

In [8]:
example_df.loc[:'obj2'] # slice until obj2

Unnamed: 0,A,B
obj1,1.1,12
obj2,2.1,22


With `loc` you can also select rows and columns at the same time:

In [9]:
example_df.loc[:'obj2', 'B'] # slice until obj2, take column B

obj1    12
obj2    22
Name: B, dtype: int64

Property `iloc` works the same as `loc`, except it takes positional integers instead of string labels.

In [10]:
example_df.iloc[2] # take third row

A     3.1
B    32.0
Name: obj3, dtype: float64

In [11]:
example_df.iloc[:2] # take first two rows

Unnamed: 0,A,B
obj1,1.1,12
obj2,2.1,22


In [12]:
example_df.iloc[:2,1] # take first two rows and second column

obj1    12
obj2    22
Name: B, dtype: int64

You can also select rows and columns by combining selections:

In [13]:
example_df["A"][1:3] # select column A, rows 2 and 3

obj2    2.1
obj3    3.1
Name: A, dtype: float64

In [14]:
example_df.iloc[0]["B"] # select row 1, column B

12.0

In [15]:
example_df.loc["obj2":].iloc[:,0]# select every row from obj2, and first column

obj2    2.1
obj3    3.1
Name: A, dtype: float64

We've now seen many ways of indexing. Let's rehearse:

In [16]:
print(example_df)

        A   B
obj1  1.1  12
obj2  2.1  22
obj3  3.1  32


##### Which code **does not** return column B?

In [17]:
%%mc
example_df["B"]
example_df.iloc[:,1]
example_df.loc["B"]
example_df.loc[:,"B"]

RadioButtons(index=1, layout=Layout(width='max-content'), options=('example_df["B"]', 'example_df.iloc[:,1]', …

In [18]:
%%check 
hashresult == 3487754587

0
That is the wrong answer


##### Which **does not** return the first row?

In [19]:
%%mc
example_df[0]
example_df[:1]
example_df.iloc[:1]
example_df.loc["obj1"]

RadioButtons(index=1, layout=Layout(width='max-content'), options=('example_df[0]', 'example_df[:1]', 'example…

In [20]:
%%check 
hashresult == 783618061

0
That is the wrong answer


##### Which command(s) return the value `3.1`?

In [21]:
%%mmc
example_df[2:3]["A"]
example_df["A"][2:3]
example_df.iloc[2,0]
example_df.loc["obj3",'A']

VBox(children=(Checkbox(value=False, description='example_df[2:3]["A"]', layout=Layout(width='max-content')), …

In [22]:
%%check 
hashresult == 121948600

0
That is the wrong answer


##### How many cells are returned by `example_df["B"][:2]`?

In [23]:
%%slider 
1 6

IntSlider(value=6, max=6, min=1)

In [24]:
%%check 
hashresult==3306475475

0
That is the wrong answer
