# 9. DataFrame Selection, covering `[]`, `.loc[]`, `.iloc[]`, `.at[]`, and `.iat[]`.

-----

This is the *most important* fundamental skill in Pandas. Selecting data from a DataFrame has multiple tools, and using the right one is key to writing clean, fast, and bug-free code.

Think of your DataFrame as a spreadsheet.

  * **`[]` (Square Brackets):** This is a "convenience" tool. It's great for one simple taskâ€”grabbing **columns** by name (e.g., `df['Age']`). It's *terrible* and *ambiguous* for almost everything else.
  * **`.loc[]`:** This is your "by **L**abel" selector. You use it when you want to select data based on the *names* of the rows (the index) and columns. This is the most common and intuitive method.
  * **`.iloc[]`:** This is your "by **I**nteger **Loc**ation" selector. You use it when you want to select data based on its *position* (row 0, row 1, column 0, etc.), just like a Python list or NumPy array.
  * **`.at[]` / `.iat[]`:** These are high-speed, "turbo-charged" versions of `.loc` and `.iloc` that *only* get or set a *single cell*.

**How It Works in Memory**: When you use `.loc['Row_A']`, Pandas uses the index (which is like a high-speed dictionary or hash map) to instantly find the memory location for that row. When you use `.iloc[0]`, Pandas goes directly to the 0th position in the underlying NumPy arrays. Because `.loc` and `.iloc` are explicit, Pandas can optimize them. The `[]` operator is *not* optimized and has to guess what you mean, which is why it's slower.

**When to Use This**:

  * Use `df['ColumnName']` to quickly select one column.
  * Use `df[['Col1', 'Col2']]` to select multiple columns.
  * Use **`.loc`** for *all other selections* where you are using **labels** (e.g., `df.loc['Row_A']`, `df.loc[df['Age'] > 25]`).
  * Use **`.iloc`** when you need to select by **position** (e.g., "get the first 5 rows and first 2 columns": `df.iloc[0:5, 0:2]`).
  * Use **`.at`** and **`.iat`** when you need to get or set a *single value* as fast as possible.

-----

### 0\. Syntax & Parameters (MUST COME FIRST)

The main accessors are `.loc` (for labels) and `.iloc` (for integer position).

```python
# 1. Square Brackets []
dataframe['col_name']                # Selects one column (returns a Series)
dataframe[['col1', 'col2']]          # Selects multiple columns (returns a DataFrame)
dataframe[start:end]                 # Slices ROWS by position or label (ambiguous!)

# 2. .loc[row_indexer, column_indexer]
# Both indexers use LABELS
dataframe.loc[row_label, col_label]
# row_indexer/column_indexer can be:
# - A single label: 'Row_A'
# - A list of labels: ['Row_A', 'Row_C']
# - A slice of labels: 'Row_A':'Row_D' (this slice is INCLUSIVE)
# - A boolean mask: df['Age'] > 30

# 3. .iloc[row_indexer, column_indexer]
# Both indexers use integer POSITIONS
dataframe.iloc[row_pos, col_pos]
# row_indexer/column_indexer can be:
# - A single integer: 0
# - A list of integers: [0, 2]
# - A slice of integers: 0:5 (this slice is EXCLUSIVE, just like Python)

# 4. .at[row_label, col_label]
# Optimized .loc for a SINGLE cell
dataframe.at[row_label, col_label]

# 5. .iat[row_pos, col_pos]
# Optimized .iloc for a SINGLE cell
dataframe.iat[row_pos, col_pos]
```

-----

### 1\. Basic Example

We'll use a sample DataFrame. Note the *labels* of the index (`'a'`, `'b'`, `'c'`) vs. the *positions* (`0`, `1`, `2`).

```python
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {'Name': ['Alice', 'Bob', 'Clara'],
     'Age': [25, 30, 22],
     'City': ['New York', 'Los Angeles', 'Chicago']},
    index=['a', 'b', 'c']
)
print("--- Original DataFrame ---")
print(df)

# --- Basic `[]` selection ---

# Example 1: Select a single column
col_age = df['Age']
print("\n--- Example 1: df['Age'] (Selects a Series) ---")
print(col_age)

# Example 2: Select multiple columns
cols_name_city = df[['Name', 'City']]
print("\n--- Example 2: df[['Name', 'City']] (Selects a DataFrame) ---")
print(cols_name_city)

# --- Basic .loc and .iloc ---

# Example 3: Select a single row by LABEL
row_a = df.loc['a']
print("\n--- Example 3: df.loc['a'] (Selects a Series) ---")
print(row_a)

# Example 4: Select a single row by POSITION
row_0 = df.iloc[0]
print("\n--- Example 4: df.iloc[0] (Selects a Series) ---")
print(row_0)
```

**Output:**

```
--- Original DataFrame ---
    Name  Age         City
a  Alice   25     New York
b    Bob   30  Los Angeles
c  Clara   22      Chicago

--- Example 1: df['Age'] (Selects a Series) ---
a    25
b    30
c    22
Name: Age, dtype: int64

--- Example 2: df[['Name', 'City']] (Selects a DataFrame) ---
    Name         City
a  Alice     New York
b    Bob  Los Angeles
c  Clara      Chicago

--- Example 3: df.loc['a'] (Selects a Series) ---
Name       Alice
Age           25
City    New York
Name: a, dtype: object

--- Example 4: df.iloc[0] (Selects a Series) ---
Name       Alice
Age           25
City    New York
Name: a, dtype: object
```

**Explanation:**

  * `[]` with a single string `df['Age']` selects that column.
  * `[]` with a *list of strings* `df[['Name', 'City']]` selects those columns.
  * `.loc['a']` finds the row with **label** `'a'`.
  * `.iloc[0]` finds the row at **position** `0` (which happens to be 'a').

-----

### 2\. Intermediate Example

Now we use the full `[row, column]` syntax for `.loc` and `.iloc`.

```python
# --- Select a single cell ---

# Example 5: Select cell by LABEL
cell_loc = df.loc['b', 'Name']
print("\n--- Example 5: df.loc['b', 'Name'] ---")
print(cell_loc)

# Example 6: Select cell by POSITION
cell_iloc = df.iloc[1, 0] # Row 1, Column 0
print("\n--- Example 6: df.iloc[1, 0] ---")
print(cell_iloc)

# --- Select a subset using lists ---

# Example 7: Select rows 'a', 'c' and columns 'Name', 'Age' by LABEL
subset_loc = df.loc[['a', 'c'], ['Name', 'Age']]
print("\n--- Example 7: .loc[['a', 'c'], ['Name', 'Age']] ---")
print(subset_loc)

# Example 8: Select rows 0, 2 and columns 0, 1 by POSITION
subset_iloc = df.iloc[[0, 2], [0, 1]]
print("\n--- Example 8: .iloc[[0, 2], [0, 1]] ---")
print(subset_iloc)
```

**Output:**

```
--- Example 5: df.loc['b', 'Name'] ---
Bob

--- Example 6: df.iloc[1, 0] ---
Bob

--- Example 7: .loc[['a', 'c'], ['Name', 'Age']] ---
    Name  Age
a  Alice   25
c  Clara   22

--- Example 8: .iloc[[0, 2], [0, 1]] ---
    Name  Age
a  Alice   25
c  Clara   22
```

**Explanation:**
Notice that `df.loc['b', 'Name']` and `df.iloc[1, 0]` both returned `'Bob'`. This shows the two different ways to "point" to the same cell.

-----

### 3\. Advanced or Tricky Case

Slicing is where `.loc` and `.iloc` are very different.

```python
# --- Slicing ---

# Example 9: Slicing with .loc (by LABEL)
# Note: Slicing with .loc is INCLUSIVE of the end label!
slice_loc = df.loc['a':'b', 'Name':'Age']
print("\n--- Example 9: .loc['a':'b', 'Name':'Age'] (INCLUSIVE) ---")
print(slice_loc)

# Example 10: Slicing with .iloc (by POSITION)
# Note: Slicing with .iloc is EXCLUSIVE of the end position (like Python)
slice_iloc = df.iloc[0:2, 0:2] # Rows 0, 1 and Cols 0, 1
print("\n--- Example 10: .iloc[0:2, 0:2] (EXCLUSIVE) ---")
print(slice_iloc)

# --- Using .at and .iat ---

# Example 11: Fast cell access by LABEL
cell_at = df.at['b', 'Name']
print("\n--- Example 11: df.at['b', 'Name'] ---")
print(cell_at)

# Example 12: Fast cell access by POSITION
cell_iat = df.iat[1, 0]
print("\n--- Example 12: df.iat[1, 0] ---")
print(cell_iat)

# --- The "Ambiguous" `[]` slice ---

# Example 13: Slicing with `[]`
# This is confusing! `[]` can also slice ROWS.
slice_bracket = df['a':'c'] # This slices by LABEL (inclusive)
print("\n--- Example 13: df['a':'c'] (Ambiguous row slice) ---")
print(slice_bracket)
```

**Output:**

```
--- Example 9: .loc['a':'b', 'Name':'Age'] (INCLUSIVE) ---
    Name  Age
a  Alice   25
b    Bob   30

--- Example 10: .iloc[0:2, 0:2] (EXCLUSIVE) ---
    Name  Age
a  Alice   25
b    Bob   30

--- Example 11: df.at['b', 'Name'] ---
Bob

--- Example 12: df.iat[1, 0] ---
Bob

--- Example 13: df['a':'c'] (Ambiguous row slice) ---
    Name  Age         City
a  Alice   25     New York
b    Bob   30  Los Angeles
c  Clara   22      Chicago
```

**Explanation:**

  * **`.loc['a':'b']` included `'b'`**.
  * **`.iloc[0:2]` excluded `2`** (it only got 0 and 1).
  * Example 13 shows why `[]` is confusing. `df['Age']` selects a column, but `df['a':'c']` selects *rows*. This ambiguity is why you should **avoid** `[]` for row selection and use `.loc` or `.iloc`.

-----

### 4\. Real-World Use Case

[cite\_start]The most important use of `.loc` is for **setting values** based on a condition (Boolean Indexing, which is the next topic [cite: 27-29]).

**Example 14: Setting a single value**

```python
print("\n--- Example 14: Setting a value ---")
print("Before:\n", df)

# Use .loc to set Alice's age
df.loc['a', 'Age'] = 26
print("\nAfter:\n", df)
```

**Output:**

```
--- Example 14: Setting a value ---
Before:
     Name  Age         City
a  Alice   25     New York
b    Bob   30  Los Angeles
c  Clara   22      Chicago

After:
     Name  Age         City
a  Alice   26     New York
b    Bob   30  Los Angeles
c  Clara   22      Chicago
```

**Example 15: Creating a new column based on a condition (Teaser)**

```python
# This is called Boolean Indexing
print("\n--- Example 15: Setting with a condition ---")
df.loc[df['Age'] > 25, 'Generation'] = 'Older'
df.loc[df['Age'] <= 25, 'Generation'] = 'Younger'
print(df)
```

**Output:**

```
--- Example 15: Setting with a condition ---
    Name  Age         City Generation
a  Alice   26     New York      Older
b    Bob   30  Los Angeles      Older
c  Clara   22      Chicago    Younger
```

**Explanation:**
This is the power of `.loc`. We used `df.loc[row_indexer, col_indexer] = value`. The `row_indexer` was a boolean mask (`df['Age'] > 25`), the `col_indexer` was the new column name, and we set the value.

-----

### 5\. Common Mistakes / Pitfalls

**Mistake 16: The `SettingWithCopyWarning` (The \#1 Mistake in Pandas)**

```python
# Wrong code
df_slice = df[df['Age'] > 25] # 1. Create a slice (might be a view or copy)
try:
    # 2. Try to modify the slice
    df_slice['Status'] = 'Adult' # This triggers the warning!
except Exception as e:
    print(f"\n--- Mistake 16: SettingWithCopyWarning ---")
    
print("\nOriginal df is NOT changed:")
print(df)
```

**Why it happens:** You used "chained indexing" (`[]` then `[]`). Pandas doesn't know if `df_slice` is a new copy or just a "view" of the original `df`. Modifying a "view" is dangerous, so it warns you.
**Example 17: Corrected code:**
Use `.loc` for *both* the row and column selection in a *single operation*.

```python
# Correct code
df.loc[df['Age'] > 25, 'Status'] = 'Adult'
print("\n--- Example 17: Correct way to set values ---")
print(df)
```

**Mistake 18: `df[0]` (Trying to select a row with `[]`)**

```python
# Wrong code
try:
    df[0]
except KeyError as e:
    print(f"\n--- Mistake 18: {e} ---")
```

**Error/Wrong Output:** `KeyError: 0`
**Why it happens:** `df[key]` tries to find a **column** named `key`. Since there's no column named `0`, it fails.
**Correction:** Use `df.iloc[0]` for the first row *by position* or `df.loc['a']` for the row *by label*.

**Mistake 19: Using `.loc` with integer positions**

```python
# Wrong code
try:
    df.loc[0, 'Name'] # Our index is ['a', 'b', 'c'], not [0, 1, 2]
except KeyError as e:
    print(f"\n--- Mistake 19: {e} ---")
```

**Error/Wrong Output:** `KeyError: 0`
**Why it happens:** `.loc` is for **L**abels. The *label* `0` does not exist in our index.
**Correction:** Use `df.iloc[0, 0]` or `df.loc['a', 'Name']`.

-----

### 6\. Key Terms (Explained Simply)

  * **Label-based Selection:** Selecting data using the *names* of the rows/columns (e.g., `'Age'`, `'a'`). Done with **`.loc`**.
  * **Position-based Selection:** Selecting data using the *integer position* (0, 1, 2...). Done with **`.iloc`**.
  * **Indexer:** The "thing" you put inside the brackets (e.g., a single label, a list, a slice).
  * **Slice:** A range of values (e.g., `0:5` or `'a':'d'`).
  * **Boolean Mask:** A Series/list of `True`/`False` values used for filtering (e.g., `df['Age'] > 25`).
  * **View vs. Copy:** A "view" is a window into the original data. A "copy" is a new, separate piece of data. Modifying a view changes the original; modifying a copy does not. This is a complex topic, but using `.loc` properly avoids the common `SettingWithCopyWarning`.

-----

### 7\. Best Practices

  * **Always be explicit.** Use `.loc` or `.iloc` for any row selection.
  * Use `[]` (`df['col']`) *only* for selecting columns. Do not use it for selecting rows (except for slicing, which is ambiguous).
  * **Never use chained indexing to *set* values:**
      * **BAD:** `df[df['Age'] > 25]['Status'] = 'Adult'` (This will 100% fail or warn you).
      * **GOOD:** `df.loc[df['Age'] > 25, 'Status'] = 'Adult'`
  * **Know your index:** Before using `.loc`, check `df.index` to see what your labels are.

-----

### 8\. Mini Summary

  * **`[]`**: For selecting **columns** by name. `df['Col1']` or `df[['Col1', 'Col2']]`.
  * **`.loc[rows, cols]`**: For **L**abels. Slicing is **inclusive**. (e.g., `df.loc['a':'c']` gets a, b, *and* c).
  * **`.iloc[rows, cols]`**: For **I**nteger **Loc**ations. Slicing is **exclusive**. (e.g., `df.iloc[0:3]` gets 0, 1, 2).
  * **`.at` / `.iat`**: Fast, single-cell "get" or "set" only.
  * To avoid 99% of selection errors, use `.loc` for setting values.

-----

### 10\. Practice Tasks

**Data for Tasks:**

```python
df_practice = pd.DataFrame(
    {'Product': ['Apple', 'Banana', 'Carrot', 'Donut'],
     'Price': [0.5, 0.4, 0.2, 1.0],
     'Category': ['Fruit', 'Fruit', 'Veg', 'Bakery']},
    index=['p1', 'p2', 'p3', 'p4']
)
```

**Task 20 (Easy):**
Select the 'Product' and 'Price' columns from `df_practice` into a new DataFrame `df_easy`.

**Task 21 (Medium):**
Select the rows with labels 'p1' and 'p4', and the columns 'Product' and 'Category'. Do this in a single `.loc` command.

**Task 22 (Hard):**
Select the *first two rows* (`p1`, `p2`) and the *last column* (`Category`) using **only `.iloc`**.

**Bonus Task 23:**
Use `.loc` to *set* the 'Price' for 'Donut' (index 'p4') to `1.25`.

-----

### 11\. Recommended Next Topic

You've just seen a teaser for the most powerful part of `.loc`: using boolean masks. The next logical step is to dive deep into that.

[cite\_start]**Recommended:** **Boolean indexing** [cite: 27-29]

-----

### 12\. Quick Reference Card

| Selection Task | `[]` (Avoid for rows) | `.loc[]` (Labels) | `.iloc[]` (Integers) |
| :--- | :--- | :--- | :--- |
| **One Column** | `df['A']` | `df.loc[:, 'A']` | `df.iloc[:, 0]` |
| **Multi Columns**| `df[['A', 'B']]` | `df.loc[:, ['A', 'B']]` | `df.iloc[:, [0, 1]]` |
| **One Row** | `df['a':'a']` (Ugly) | `df.loc['a']` | `df.iloc[0]` |
| **Multi Rows** | `df['a':'c']` (Slice) | `df.loc[['a', 'c']]` | `df.iloc[[0, 2]]` |
| **Row Slice** | `df[0:2]` (Pos) | `df.loc['a':'c']` (Label) | `df.iloc[0:3]` (Pos) |
| **Row + Col** | (Don't) | `df.loc['a', 'A']` | `df.iloc[0, 0]` |
| **Row/Col Slice** | (Don't) | `df.loc['a':'c', 'A':'B']`| `df.iloc[0:3, 0:2]` |
| **Boolean Rows** | `df[df['A'] > 1]` | `df.loc[df['A'] > 1]` | (Don't) |

-----

### 13\. Common Interview Questions

1.  **What's the difference between `.loc` and `.iloc`?**
      * `.loc` selects by **label** (e.g., `'Age'`, `'Row_A'`). Its label-slicing is *inclusive*.
      * `.iloc` selects by **integer position** (e.g., `0`, `1`). Its integer-slicing is *exclusive*, just like in Python.
2.  **What is the `SettingWithCopyWarning` and how do you fix it?**
      * It's a warning that you *might* be trying to modify a *copy* of a DataFrame, not the original.
      * It happens when you use "chained indexing" to set a value (e.g., `df[...][...] = ...`).
      * You fix it by using `.loc` in a *single* operation: `df.loc[row_indexer, col_indexer] = value`.
3.  **How would you select the 1st, 3rd, and 5th rows and the 'Name' and 'Age' columns?**
      * **Using `.iloc` (for rows) and `.loc` (for cols):** `df.loc[df.index[[0, 2, 4]], ['Name', 'Age']]`
      * **Using `.iloc` (if you know column positions):** `df.iloc[[0, 2, 4], [0, 1]]` (assuming 'Name' and 'Age' are cols 0 and 1).

-----

### 14\. Performance Considerations

  * **Time Complexity:**
      * `.at[label]` / `.iat[pos]`: **O(1)** (Constant time). The fastest.
      * `.loc[label]` / `.iloc[pos]`: **O(1)** (Constant time) on average for a single item (using the index's hash map or array position).
      * Slicing (`:`): Very fast, often **O(k)** where k is the slice size.
      * Boolean Mask (`df[df['A'] > 5]`): **O(n)**, where 'n' is the number of rows. It must check the condition for every row.
  * **Memory Usage (View vs. Copy):**
      * **View (Fast, shares memory):** Slicing *often* returns a "view." Modifying a view will change the original DataFrame (this is what `SettingWithCopyWarning` tries to prevent). Examples: `df.iloc[0:5]`, `df.loc['a':'d']`.
      * **Copy (Slower, new memory):** Selecting with a *list* or a *boolean mask* almost *always* returns a "copy." Modifying a copy will *not* change the original. Examples: `df[['A', 'B']]`, `df.loc[['a', 'c']]`, `df[df['A'] > 5]`.

-----

### 15\. When NOT to Use This

  * **Don't use `[]` for row selection.** `df[0]` will fail, looking for a *column* named `0`. The only exception is slicing (e.g., `df[0:5]`), but this is ambiguous and `df.iloc[0:5]` is preferred.
  * **Don't use `.loc` with integer positions.** It's for labels.
  * **Don't use `.iloc` with labels.** It's for integer positions.
  * **Don't use `.at` or `.iat` for slices or multi-cell selection.** They are *only* for single cells.
  * **Don't use chained indexing to *set* data.** (e.g., `df.iloc[0:5]['Age'] = 30`). This is a "chained assignment" and will fail. Use `df.loc[df.iloc[0:5].index, 'Age'] = 30`.