# 2.Indexing
-----

In Pandas, **indexing** is just a fancy word for "selecting" or "looking up" data. Think of your Series as a tall filing cabinet. The **index** is the set of labels on the drawers (`'A'`, `'B'`, `'C'`). **Indexing** is the act of pulling open a specific drawer to get the file inside.

`.loc`, `.iloc`, `.at`, and `.iat` are the specific tools Pandas gives you to do this. They let you be precise, telling Pandas *exactly* how you want to find your data: by its **label** (like `loc` or `at`) or by its **position** (like `iloc` or `iat`).

**How It Works in Memory**: When you use `.loc['A']`, Pandas uses the Series index like a hash map (or a dictionary) to instantly find the memory location associated with the label `'A'`. This is very fast. When you use `.iloc[0]`, Pandas goes directly to the first slot in the underlying NumPy array, which is also instant. `.at` and `.iat` are optimized, stripped-down versions of this for accessing only *one* value, which makes them even faster by skipping some of Pandas's extra safety checks.

**When to Use This**:

  * Use **`.loc`** when you want to select data using the **index labels** (e.g., `s.loc['Mon']`, `s.loc['Alice']`). This is the most common and intuitive way to select data.
  * Use **`.iloc`** when you want to select data using its **integer position** (e.g., `s.iloc[0]` for the 1st item, `s.iloc[5]` for the 6th item), just like a Python list.
  * Use **`.at`** as a high-speed replacement for `.loc` when you need to get or set *only one single value* by its **label**.
  * Use **`.iat`** as a high-speed replacement for `.iloc` when you need to get or set *only one single value* by its **position**.

-----

### 0\. Syntax & Parameters (MUST COME FIRST)

The main accessors are `.loc` (for labels) and `.iloc` (for integer position).

```python
series.loc[label]
series.iloc[position]
series.at[label]
series.iat[position]
```

  * **`series.loc[label]`**

      * **What it does:** Selects data based on its **index label(s)**.
      * **`label`**: Can be a single label (`'a'`), a list of labels (`['a', 'c']`), or a slice of labels (`'a':'c'`).
      * **Default value:** N/A (you must provide a label).
      * **When you would use it:** Always, when you know the index label. This is the preferred method for label-based selection.

  * **`series.iloc[position]`**

      * **What it does:** Selects data based on its **integer position** (starting from 0).
      * **`position`**: Can be a single integer (`0`), a list of integers (`[0, 2]`), or a slice of integers (`0:3`).
      * **Default value:** N/A (you must provide a position).
      * **When you would use it:** When you need to select data by its *order*, regardless of what the index labels are (e.g., "get the first 5 items").

  * **`series.at[label]`**

      * **What it does:** A high-speed, optimized version of `.loc` for accessing a **single scalar value** only.
      * **`label`**: A single index label (e.g., `'a'`). *It cannot be a list or a slice.*
      * **Default value:** N/A.
      * **When you would use it:** In performance-critical code where you need to get or set *one specific item* by its label, often inside a loop (though loops are generally avoided).

  * **`series.iat[position]`**

      * **What it does:** A high-speed, optimized version of `.iloc` for accessing a **single scalar value** only.
      * **`position`**: A single integer position (e.g., `0`). *It cannot be a list or a slice.*
      * **Default value:** N/A.
      * **When you would use it:** In performance-critical code where you need to get or set *one specific item* by its position (e.g., `s.iat[0]`).

-----

### 1\. Basic Example

Let's create a simple Series to see `.loc` and `.iloc` in action.

**Example 1: Basic selection with `.loc` and `.iloc`**

```python
import pandas as pd
import numpy as np

# A Series of student scores
# The index labels are 'Alice', 'Bob', 'Clara'
# The positions are 0, 1, 2
scores = pd.Series([85, 92, 78], index=['Alice', 'Bob', 'Clara'])

print("--- The Series ---")
print(scores)

# --- .loc: Select by LABEL ---
# Get the score for 'Bob'
bob_score_loc = scores.loc['Bob']
print(f"\n.loc['Bob'] -> {bob_score_loc}")

# --- .iloc: Select by POSITION ---
# Get the score at position 1 (the 2nd item)
bob_score_iloc = scores.iloc[1]
print(f"\n.iloc[1] -> {bob_score_iloc}")
```

**Output:**

```
--- The Series ---
Alice    85
Bob      92
Clara    78
dtype: int64

.loc['Bob'] -> 92

.iloc[1] -> 92
```

**Explanation:**
Even though 'Bob' is at position `1`, `.loc` and `.iloc` are used differently. We use `.loc['Bob']` to ask for the value with the *label* 'Bob'. We use `.iloc[1]` to ask for the value at the 2nd *position*. In this case, they return the same value, but they are conceptually different.

**Example 2: Using `.at` and `.iat` for the same thing**

```python
# --- .at: Fast LABEL selection ---
bob_score_at = scores.at['Bob']
print(f"\n.at['Bob'] -> {bob_score_at}")

# --- .iat: Fast POSITION selection ---
bob_score_iat = scores.iat[1]
print(f"\n.iat[1] -> {bob_score_iat}")
```

**Output:**

```
.at['Bob'] -> 92

.iat[1] -> 92
```

**Explanation:**
For selecting a single value, `.at` and `.iat` do the same job as `.loc` and `.iloc` but are faster. You can't use them to select multiple items.

-----

### 2\. Intermediate Example

The biggest difference between `.loc` and `.iloc` appears when *slicing* (selecting a range of values).

**Example 3: Slicing with `.loc` and `.iloc`**

```python
# A Series with a numeric index
s = pd.Series([10, 20, 30, 40, 50], index=[1, 2, 3, 4, 5])

print("--- The Series ---")
print(s)

# --- .loc: Slicing by LABEL ---
# Get all items from label 2 to label 4
# Note: .loc is INCLUSIVE of the end label
s_loc_slice = s.loc[2:4]
print("\n--- .loc[2:4] (Labels 2, 3, 4) ---")
print(s_loc_slice)

# --- .iloc: Slicing by POSITION ---
# Get all items from position 2 to position 4
# Note: .iloc is EXCLUSIVE of the end position (like Python)
s_iloc_slice = s.iloc[2:4]
print("\n--- .iloc[2:4] (Positions 2, 3) ---")
print(s_iloc_slice)
```

**Output:**

```
--- The Series ---
1    10
2    20
3    30
4    40
5    50
dtype: int64

--- .loc[2:4] (Labels 2, 3, 4) ---
2    20
3    30
4    40
dtype: int64

--- .iloc[2:4] (Positions 2, 3) ---
3    30
4    40
dtype: int64
```

**Explanation:**
This is the most critical difference to remember:

1.  **`s.loc[2:4]`** selected *labels* 2, 3, and 4. It **includes** the end label (`4`).
2.  **`s.iloc[2:4]`** selected *positions* 2 and 3. It **excludes** the end position (`4`). This is standard Python slicing behavior. The items at positions 2 and 3 are `30` (label 3) and `40` (label 4).

**Example 4: Slicing with non-numeric labels**

```python
scores = pd.Series([85, 92, 78, 95, 88], index=['Alice', 'Bob', 'Clara', 'David', 'Eva'])

print("--- The Series ---")
print(scores)

# .loc can slice by label
print("\n--- .loc['Bob':'David'] ---")
print(scores.loc['Bob':'David'])
```

**Output:**

```
--- The Series ---
Alice    85
Bob      92
Clara    78
David    95
Eva      88
dtype: int64

--- .loc['Bob':'David'] ---
Bob      92
Clara    78
David    95
dtype: int64
```

**Explanation:**
`.loc` slicing works perfectly with text labels, provided the index is sorted (or at least, the slice "makes sense"). It selected all items from 'Bob' *up to and including* 'David'. You cannot do this with `.iloc`, which only understands integer positions.

-----

### 3\. Advanced or Tricky Case

The most confusing scenario is when a Series has a default integer index (`0, 1, 2, 3...`). Here, the *label* and the *position* are the same, which can be ambiguous.

**Example 5: Ambiguity with default integer index**

```python
s = pd.Series(['a', 'b', 'c', 'd'])
print("--- The Series ---")
print(s)

# Label '1' is also position '1'
print("\n.loc[1] (Label 1):", s.loc[1])
print(".iloc[1] (Position 1):", s.iloc[1])
```

**Output:**

```
--- The Series ---
0    a
1    b
2    c
3    d
dtype: object

.loc[1] (Label 1): b
.iloc[1] (Position 1): b
```

**Explanation:**
In this case, `.loc[1]` and `.iloc[1]` give the same result. But now, watch what happens if we *drop* a label.

**Example 6: The "Gotcha" - `.loc` vs `.iloc` after dropping an index**

```python
s = pd.Series(['a', 'b', 'c', 'd'])
s_dropped = s.drop(0) # Drop the item with index LABEL 0

print("\n--- Dropped Series ---")
print(s_dropped)

# --- .iloc[0] is always the FIRST item ---
# The first item in s_dropped is 'b'
print("\n.iloc[0] (First item):", s_dropped.iloc[0])

# --- .loc[0] looks for LABEL 0 ---
# The label 0 no longer exists!
try:
    s_dropped.loc[0]
except KeyError as e:
    print(f"\n.loc[0]: Error! {e}")
```

**Output:**

```
--- Dropped Series ---
1    b
2    c
3    d
dtype: object

.iloc[0] (First item): b

.loc[0]: Error! 0
```

**Explanation:**
This is why you *must* be explicit.

  * `s_dropped.iloc[0]` worked fine. It means "give me the item at the first *position*," which is now `'b'`.
  * `s_dropped.loc[0]` failed. It means "give me the item with the *label* `0`." Since we dropped that label, Pandas throws a `KeyError`.
    This example perfectly shows the real-world difference. `.iloc` is about *order*, `.loc` is about *labels*.

-----

### 4\. Real-World Use Case

`.loc` and `.at` are fantastic for updating values in your data.

**Example 7: Updating a value by label**

```python
scores = pd.Series([85, 92, 78], index=['Alice', 'Bob', 'Clara'])
print("--- Before Update ---")
print(scores)

# Update Bob's score using his name (the label)
scores.loc['Bob'] = 95 

print("\n--- After .loc Update ---")
print(scores)

# Give Clara 5 extra credit points using .at for speed
scores.at['Clara'] = scores.at['Clara'] + 5

print("\n--- After .at Update ---")
print(scores)
```

**Output:**

```
--- Before Update ---
Alice    85
Bob      92
Clara    78
dtype: int64

--- After .loc Update ---
Alice    85
Bob      95
Clara    78
dtype: int64

--- After .at Update ---
Alice    85
Bob      95
Clara    83
dtype: int64
```

**Explanation:**
We used `.loc['Bob']` on the left side of the `=` to assign a new value (`95`) directly to that label. This is far safer and clearer than trying to find the position. We then used `.at['Clara']` to read and write a single value, which is the fastest way to perform this "get and set" operation.

**Example 8: Getting the Top 3 items**

```python
# .iloc is perfect for positional selection after sorting
sales = pd.Series([100, 50, 200, 75], index=['d', 'a', 'c', 'b'])
sorted_sales = sales.sort_values(ascending=False)

print("--- Sorted Sales ---")
print(sorted_sales)

# Get the top 3 items by position
top_3 = sorted_sales.iloc[0:3]

print("\n--- Top 3 (using .iloc[0:3]) ---")
print(top_3)
```

**Output:**

```
--- Sorted Sales ---
c    200
d    100
b     75
a     50
dtype: int64

--- Top 3 (using .iloc[0:3]) ---
c    200
d    100
b     75
dtype: int64
```

**Explanation:**
After sorting, we don't care what the labels are; we just want the first 3 items. `.iloc[0:3]` ("items from position 0 up to, but not including, position 3") is the perfect tool for this.

-----

### 5\. Common Mistakes / Pitfalls

**Mistake 9: Using `.at` or `.iat` for slicing**

```python
s = pd.Series([1, 2, 3, 4])

try:
    s.at[0:2] # Wrong! .at is for single values only
except TypeError as e:
    print(f"Error with .at[0:2]: {e}")
    
try:
    s.iat[0:2] # Wrong! .iat is for single values only
except TypeError as e:
    print(f"\nError with .iat[0:2]: {e}")
```

**Error/Wrong Output:**

```
Error with .at[0:2]: unhashable type: 'slice'

Error with .iat[0:2]: unhashable type: 'slice'
```

**Why it happens:**
`.at` and `.iat` are *only* for single scalar values. They are not designed to handle slices, lists, or any other complex input.
**Corrected code:** Use `.loc` or `.iloc` for slicing.
`s.loc[0:2]` or `s.iloc[0:2]`

**Mistake 10: Using `.loc` with a position on a text index**

```python
scores = pd.Series([85, 92, 78], index=['Alice', 'Bob', 'Clara'])

try:
    # Trying to get the "first" item using .loc
    scores.loc[0] 
except KeyError as e:
    print(f"Error with .loc[0]: {e}")
```

**Error/Wrong Output:**

```
Error with .loc[0]: 0
```

**Why it happens:**
`.loc` *only* looks at labels. It checked the index `['Alice', 'Bob', 'Clara']` for the *label* `0` and couldn't find it, resulting in a `KeyError`.
**Corrected code:**
`scores.iloc[0]` \# This correctly gets the first item by position.

-----

### 6\. Key Terms (Explained Simply)

  * **Label-based Indexing:** Selecting data using the index *label* (e.g., 'Alice', '2025-11-17'). This is done with **`.loc`** and **`.at`**.
  * **Position-based Indexing:** Selecting data using the integer *position* (e.g., 0, 1, 2). This is done with **`.iloc`** and **`.iat`**.
  * **Index:** The labels for the rows.
  * **Label:** A single value in the index (e.g., 'Alice').
  * **Position (or Integer-location):** The item's place in the Series, starting from 0 (e.g., the 1st item is position 0, the 2nd is position 1).
  * **Slicing:** Selecting a range of data (e.g., `[0:3]` or `['A':'D']`).
  * **Inclusive:** *Includes* the end point (like `.loc` slicing).
  * **Exclusive:** *Excludes* the end point (like `.iloc` slicing).

-----

### 7\. Best Practices

  * **Be Explicit:** Always use `.loc` or `.iloc` for selection. Avoid `series[key]`. Using `[]` alone is ambiguous and can lead to bugs like the one in Example 6.
  * **Use `.loc` for Labels:** Even if your index *is* integers (like `[1, 2, 3]`), use `.loc` when you mean to select by that label.
  * **Use `.iloc` for Position:** Use `.iloc` when you mean "get the 1st item" or "get the last 5 items," regardless of what the labels are.
  * **Use `.at`/`.iat` for Speed:** If you need to get or set a *single value* many times (e.g., in a (rare) loop), use `.at` or `.iat`. They are much faster as they skip safety checks.
  * **Check Your Index:** Be aware of your index type. If it's integers, be extra careful to know if you're using a *label* (`.loc`) or a *position* (`.iloc`).

-----

### 8\. Mini Summary

  * **`.loc`** = **L**abel. (`s.loc['Alice']`). Slicing is **inclusive** (`.loc['A':'C']` includes 'C').
  * **`.iloc`** = **I**nteger **loc**ation (position). (`s.iloc[0]`). Slicing is **exclusive** (`.iloc[0:3]` does *not* include 3).
  * **`.at`** = **A**ccess **T**arget (by label). Fast, single-value `.loc`.
  * **`.iat`** = **I**nteger **A**ccess **T**arget (by position). Fast, single-value `.iloc`.
  * The biggest pitfall is confusing labels and positions, especially on a numeric index.

-----

### 10\. Practice Tasks

**Data for Tasks:**
`s = pd.Series([100, 101, 102, 103, 104], index=['a', 'b', 'c', 'd', 'e'])`

**Task 11 (Easy):**
Using the Series `s`, select the value `102` using its **label**.

**Task 12 (Medium):**
Using the Series `s`, select the items `[101, 102, 103]` using **integer positions**.

**Task 13 (Hard):**
Using the Series `s`, select all items from label `'b'` to label `'d'` (inclusive) using **label-based slicing**.

**Bonus Task 14:**
Using the Series `s`, update the value at label `'e'` to `999` using the *fastest possible method*. Print the resulting Series.

-----

### 11\. Recommended Next Topic

Now that you know how to create a Series and select data from it, the next logical step is to learn about its main properties and how to perform calculations with it.

[cite\_start]**Recommended:** **Series Operations (Arithmetic, Alignment, and `.fillna()`)** [cite: 90, 91]

-----

### 12\. Quick Reference Card

| Method | Selects By... | Use Case | Slicing |
| :--- | :--- | :--- | :--- |
| **`.loc`** | Label | General selection by label | `s.loc['A':'C']` **(Inclusive)** |
| **`.iloc`** | Position (Integer) | General selection by position | `s.iloc[0:3]` **(Exclusive)** |
| **`.at`** | Label | FAST *single value* get/set | N/A |
| **`.iat`** | Position (Integer) | FAST *single value* get/set | N/A |

-----

### 13\. Common Interview Questions

1.  **What is the main difference between `.loc` and `.iloc`?**
      * `.loc` selects data by **label**.
      * `.iloc` selects data by **integer position**.
      * The key difference is in slicing: `.loc['A':'C']` *includes* 'C', while `.iloc[0:3]` *excludes* 3.
2.  **When would you use `.at` instead of `.loc`?**
      * You use `.at` when you need to get or set a **single value** and performance is critical. It's much faster than `.loc` for this specific task because it bypasses some of Pandas's internal checks.
3.  **What happens if you have an integer index `[0, 1, 3]` and you try to use `s.loc[1]` vs `s.iloc[1]`?**
      * `s.loc[1]` will find the item with the *label* `1` and return it.
      * `s.iloc[1]` will find the item at the second *position* (which has the label `1`) and return it.
      * *Follow-up:* What if you try `s.iloc[2]`? It would return the item at the third *position*, which has the label `3`.
      * *Follow-up:* What if you try `s.loc[2]`? It would fail with a `KeyError` because there is no *label* `2`.

-----

### 14\. Performance Considerations

  * **Time Complexity:**
      * `.iloc[i]` and `.iat[i]` are **O(1)** (constant time). They are simple array lookups.
      * `.loc['label']` and `.at['label']` are **O(1)** (constant time) on average if the index is a standard hash-based index.
      * If the index is sorted, `.loc` can perform a binary search, making it **O(log n)**.
      * Slicing (`:`) is generally very fast, often O(k) where k is the slice size.
  * **Memory Usage (Copy vs. View):**
      * Selecting a single value (`.at`, `.iat`, `.loc[i]`, `.iloc[i]`) returns the value itself (a copy).
      * Selecting a *slice* (e.g., `s.iloc[1:5]`) will typically return a **view** of the original Series. This is very memory-efficient as no new data is copied.
      * **CRITICAL:** If you modify a view (e.g., `my_view = s.iloc[1:5]; my_view.iloc[0] = 99`), it *may* change the original `s`. Pandas will often warn you about this (`SettingWithCopyWarning`).
      * Selecting multiple, non-contiguous items (e.g., `s.iloc[[0, 2, 5]]`) will almost always return a **copy**.

-----

### 15\. When NOT to Use This

  * **Do not use `[]` for selection.** Avoid `s[0]` or `s['a']`. It's ambiguous and can lead to bugs. `s[0]` might mean "position 0" or "label 0" depending on the index. `.loc`/`.iloc` remove this ambiguity.
  * **Do not use `.at` or `.iat` to select more than one item.** They will fail. They are *only* for single scalar values.
  * **Do not use `.loc` to select by position.** It will either fail (if the index is text) or give you the wrong data (if the index is numeric but a label is missing).
  * **Do not use `.iloc` to select by label.** It will always fail, as it only accepts integers.