<img src="images/Project_logos.png" width="500" height="300" align="center">

## Indexing and selecting data

In [None]:
# Set up an example DataFrame

import pandas as pd
import numpy as np

index_array = np.array(['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5', 'Row 6'])
column_list = ('A', 'B', 'C', 'D')
df = pd.DataFrame(np.random.randn(6, 4), index=index_array, columns=column_list)

### Selecting columns

You can select a column using the column name:

In [None]:
df['A']

In [None]:
df.A

In [None]:
df[['B', 'C']]

In [None]:
df.loc[:, ["A", "B"]]

# loc can also be used to select rows at the same time, that is why : is included to indicate all rows

### Selecting rows

In [None]:
# Select based on row label
df.loc['Row 2']

In [None]:
# Select based on row position
df.iloc[2]

### Slicing

In [None]:
# Slice over rows based on position
df[1:3] # End index is not inclusive

In [None]:
# Slice over rows based on labels
df.loc['Row 2':'Row 4'] # End index is inclusive

In [None]:
# Slice over rows and columns based on position
df.iloc[3:5, 0:2]

In [None]:
# Select all rows for the 2nd-4th columns
df.iloc[:, 1:4]

### Selecting by criteria

You can select rows based on whether they meet a condition.

In [None]:
df[df["C"] > 0]

In [None]:
df2 = df.copy()

# You can add a colum like this

df2["E"] = ["one", "one", "two", "three", "four", "three"]
df2[df2["E"].isin(["two", "four"])]

You can select values in a DataFrame based on whether they meet a condition.

In [None]:
df[df < 0]

### Exercise
From DataFrame `df`:<br>
a) Select column C<br>
b) Select the 4th and 5th rows<br>
c) Select rows where column B is less than 0<br>
d) Select values that are more than 0.5


In [None]:
# Space to complete the exercise

### Identifying smallest or largest values
We can find the index of the smallest value using `idxmin` and the index of the largest value using `idxmax`.

In [None]:
# Find index of the smallest value in each column
df.idxmin()

In [None]:
# Find index of the largest value in each row
df.idxmax(axis=1)

We can extract the rows with the n smallest or largest values in a `Series` or specified column of a `DataFrame`.

In [None]:
# Find the rows with the 3 smallest values in column B
df.nsmallest(3, 'B')

In [None]:
# Find the rows with the 4 largest values in column C.
# keep='all' ensures that if the 5th largest value is equal to the 4th largest value, that row will also be included.
df.nsmallest(4, 'C', keep='all')

In most cases, DataFrame operations are not usually inplace and in order to use a result it must be assigned to a variable.