# üêº **df.loc VS df.iloc** 

## **Quick overview (one-line)**

- `.loc` ‚Üí label-based indexing (rows and columns by index/column names). Inclusive slicing.

- `.iloc` ‚Üí integer position-based indexing (rows and columns by integer positions). Exclusive end for slices (like normal Python ranges).

## **1) Basic rules & differences**

- `.loc[row_selector, col_selector]` ‚Äî selectors are labels (index values and column names). Row and column slices are inclusive: `df.loc[2:5]` includes index 5.

- `.iloc[row_selector, col_selector]` ‚Äî selectors are integer positions (0-based). Slicing is half-open: `df.iloc[2:5]` includes positions 2,3,4 (not 5).

- Either axis selector (rows/cols) can be omitted:

    - `df.loc[:, 'colname']` select column by label.

    - `df.iloc[0]` returns the first row (as Series).

- You can pass arrays/lists/boolean masks to both.

## **2) Sample DataFrame ‚Äî use while reading examples**

In [1]:
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        'age': [25, 30, 22, 40, 28],
        'city':['Delhi','Mumbai','Kolkata','Chennai','Bengaluru'],
        'salary': [50000, 60000, 42000, 80000, 55000]
    },
    index = ['a', 'b', 'c', 'd', 'e']
)

# Index labels: ['a','b','c','d','e']. Column labels: ['age','city','salary']


## **3) Selection patterns**

#### **Single column**

- **Label:**

In [2]:
df.loc[:, 'age'] # returns Series of ages [label based]
df['age']   # same as above

a    25
b    30
c    22
d    40
e    28
Name: age, dtype: int64

- **Position**

In [3]:
df.iloc[:, 0] #first column by index

a    25
b    30
c    22
d    40
e    28
Name: age, dtype: int64

#### **Single Row**

- **Label:**

In [5]:
df.loc['b']   #row with index label 'b' (Series)

age           30
city      Mumbai
salary     60000
Name: b, dtype: object

#### **Row and Column both**

- **Single element**

In [6]:
df.loc['c', 'city'] # kolkata

'Kolkata'

In [8]:
df.iloc[2,1]  # also kolkata (2nd row, 1st index col )

'Kolkata'

#### **Multiple rows/columns**

- **Labels:**

In [9]:
df.loc[['a', 'c', 'e'], ['age', 'salary']]

Unnamed: 0,age,salary
a,25,50000
c,22,42000
e,28,55000


- **Postions :**

In [10]:
df.iloc[[0,2,4], [0,2]]

Unnamed: 0,age,salary
a,25,50000
c,22,42000
e,28,55000


## **4) Boolean indexing & masks**

**Boolean masks are extremely common in data science for filtering.**

In [11]:
mask = df['age'] > 25
df.loc[mask]
df.loc[mask, ['city', 'salary']]

Unnamed: 0,city,salary
b,Mumbai,60000
d,Chennai,80000
e,Bengaluru,55000


üëÜ**Important: if mask is a boolean array, it must match the axis length. Use .values carefully ‚Äî prefer label-based masks when index aligns.**

#### **combine conditions:**

In [12]:
df.loc[(df['age'] > 25) & (df['salary'] > 50000)]

Unnamed: 0,age,city,salary
b,30,Mumbai,60000
d,40,Chennai,80000
e,28,Bengaluru,55000


In [14]:
df.loc[(df['city'] == 'Mumbai') | (df['city'] == 'Bengaluru')]

Unnamed: 0,age,city,salary
b,30,Mumbai,60000
e,28,Bengaluru,55000


üëÜ **Use parentheses around conditions because & and | have lower precedence than comparison operators.**

## **5) Fancy indexing (lists, arrays) and .isin**

- **Select Multiple Labels:**

In [None]:
df.loc[['b', 'd']] #row labels

Unnamed: 0,age,city,salary
b,30,Mumbai,60000
d,40,Chennai,80000


**Select columns in order(reorder)**

In [17]:
df.loc[:, ['salary', 'age']] #column labels

Unnamed: 0,salary,age
a,50000,25
b,60000,30
c,42000,22
d,80000,40
e,55000,28


##### **Use `.isin()`**

In [18]:
df.loc[df['city'].isin(['Delhi', 'Kolkata'])]

Unnamed: 0,age,city,salary
a,25,Delhi,50000
c,22,Kolkata,42000


## **6) Slicing rules (inclusive vs exclusive)**

- ##### **`.loc` uses inclusive slicing:**

In [19]:
df.loc['b': 'd'] #includes 'b, 'c' ,'d'

Unnamed: 0,age,city,salary
b,30,Mumbai,60000
c,22,Kolkata,42000
d,40,Chennai,80000


In [20]:
df.loc[:, 'age' : 'salary'] #includes 'age' through 'salary'

Unnamed: 0,age,city,salary
a,25,Delhi,50000
b,30,Mumbai,60000
c,22,Kolkata,42000
d,40,Chennai,80000
e,28,Bengaluru,55000


##### **`.iloc` uses Python-like slice (end excluded):**

In [21]:
df.iloc[1: 4] #rows with positions 1,2,3,not 4

Unnamed: 0,age,city,salary
b,30,Mumbai,60000
c,22,Kolkata,42000
d,40,Chennai,80000


In [22]:
df.iloc[:, 0:2] #cols positions 0  and 1

Unnamed: 0,age,city
a,25,Delhi
b,30,Mumbai
c,22,Kolkata
d,40,Chennai
e,28,Bengaluru


- **Tricky case: if your index is integer-based (e.g., 0,1,2,3), .loc[1:3] will select labels 1..3 inclusive, while .iloc[1:3] will select positions 1 and 2. This is a common source of confusion.**

----

## **7) Assignment and SettingWithCopyWarning ‚Äî the most important gotcha**

**When you assign to a view of dataframe rather than the original, pandas may warn:**

```lua

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer, col_indexer] = value instead


```

#### **Causes**

- **Chained indexing like `df[df['age']>25]['salary'] = 0 `can produce the warning because `df[df['age']>25]` may return a copy, and you assign to that copy ‚Äî original not changed.**

#### **Correct ways**

**Use `.loc` for assignment:**


In [24]:
df['age'].astype(int)
df.loc[df['age'] > 25, 'salary'] = df['salary'] * 1.1

**Or explicitly copy before modifying:**

In [25]:
subset= df[df['age'] > 25].copy()

subset['salary'] = subset['salary'] * 1.1

#### üåü **Avoid chained indexing. Always prefer one expression with .loc.**

### **Why `.loc` helps**

**`.loc` provides an indexer into the original DataFrame so pandas knows you intend to modify the original, not a temporary object.**

## **8) .at and .iat ‚Äî optimized single-element access**

- `.at[label_row, label_col]` ‚Äî fast label-based scalar access. Faster than `.loc[row, col]` for scalars.

- `.iat[row_pos, col_pos]` ‚Äî fast integer-position scalar access.

Use when reading or writing single cells many times (but vectorized operations are preferred).

In [26]:
val = df.at['b', 'salary']

val

np.float64(72600.0)

In [28]:
df.at['b', 'salary'] = 65000
df.loc['b','salary']

np.float64(65000.0)

In [30]:
val2 = df.iat[1, 2]

val2

np.float64(65000.0)

## **9) MultiIndex (hierarchical) with .loc and .iloc**

- **For MultiIndex rows or columns, .loc supports tuple labels:**

In [33]:
# MultiIndex example for rows

idx = pd.MultiIndex.from_tuples(
    [('A', 'x'), ('A','y'), ('B', 'x')],
    names = ['group', 'subgroup']
    
    )

data = {
    'value' : [10, 20, 30],
    'value2' : [100, 200, 300]
}

# step 3: creating dataframe
df_m = pd.DataFrame(data, index = idx)

df_m.loc[ ('A', 'x')] # select single multiplex row
df_m.loc['A'] # selects all 'A' group (partial index)

Unnamed: 0_level_0,value,value2
subgroup,Unnamed: 1_level_1,Unnamed: 2_level_1
x,10,100
y,20,200


- `.iloc` still indexes integer positions (works the same).

- For MultiIndex columns, use tuples in `.loc[:, ('colA','subcol1')]`.

Partial indexing with `.loc` on MultiIndex is powerful: `df.loc['A']` returns all rows whose first level is `'A'`.

## **10) Date/time partial string indexing with .loc**

**If index is DatetimeIndex:**

In [35]:
dates = pd.date_range('2020-01-01', periods = 6, freq = 'D')
df_time = pd.DataFrame(
    {'val': range(6)}, index = dates
)

df_time.loc['2020-01-03'] #single day


val    2
Name: 2020-01-03 00:00:00, dtype: int64

In [36]:
df_time.loc['2020-01'] #whole month (partial string)


Unnamed: 0,val
2020-01-01,0
2020-01-02,1
2020-01-03,2
2020-01-04,3
2020-01-05,4
2020-01-06,5


In [37]:
df_time.loc['2020-01-02' : '2020-01-04'] #inclusive

Unnamed: 0,val
2020-01-02,1
2020-01-03,2
2020-01-04,3


## **11) View vs Copy ‚Äî pragmatic rules**

- There is no guaranteed rule whether `.loc/.iloc` returns a view or a copy in all cases ‚Äî it depends on internal memory layout and operations.

- **Practical advice:**

    - If you plan to modify the result and want to modify original data, use `.loc[...] = ...` on the original DataFrame (not chained).

    - If you need an independent object to modify, call `.copy()` explicitly.

    - For slicing rows by position, `df.iloc[2:5]` often returns a view but may be a copy; treat it as possibly a copy.

    - Use `.loc` assignment to modify original safely:
```python
    df.loc[mask, 'col'] = new_values
```
- When in doubt, `df_subset = df.loc[...].copy()` ‚Äî explicit and safe.

## **12) Performance considerations**

- `.loc` uses label lookup ‚Äî cost depends on index type (hash lookup for labels vs positional).

- `.iloc` uses integer-based selection ‚Äî slightly faster for pure positional access.

- `.at/.iat` are fastest for single-element access.

- Bulk operations (vectorized assignments, boolean mask assignments) are fast. Avoid Python loops with `.loc` per-row assignments.

- Reindexing/index alignment operations (e.g., `df.reindex`) are more expensive than direct `.loc` selection.

## **13) Practical Data-Science examples & recipes**

**(A) Feature selection**

In [38]:
feature_cols = ['age', 'salary']
x = df.loc[:, feature_cols]
y = df.loc[:, 'city']


**(B) Train/test split preserving index**

If you want to use indices to join predictions back:

In [None]:
train = df.loc[df['date'] < '2020-01-01']
test = df.loc[df['date'] >= '2020-01-01']

**(C) Imputation on subset of rows**

In [42]:
df.loc[df['age'].isna(), 'age'] = df['age'].median()

**(D) One-hot encoding selected columns**

In [48]:
cols_to_encode = ['city']
dummies = pd.get_dummies(df.loc[:, cols_to_encode], drop_first = True)

**(E) Group-wise transformations**

In [49]:
df['salary_norm'] = df['salary'] / df.groupby('city')['salary'].transform('mean')

In [51]:
df[['salary', 'salary_norm']]

Unnamed: 0,salary,salary_norm
a,50000.0,1.0
b,65000.0,1.0
c,42000.0,1.0
d,96800.0,1.0
e,66550.0,1.0


**(F) Replace values in-place safely**

In [56]:
df.loc[df['salary'] < 0, 'salary'] = np.nan

**(G) Reorder columns before ML model**

In [58]:
col_order = ['salary', 'age', 'city']

x = df.loc[:, col_order]

x

Unnamed: 0,salary,age,city
a,50000.0,25,Delhi
b,65000.0,30,Mumbai
c,42000.0,22,Kolkata
d,96800.0,40,Chennai
e,66550.0,28,Bengaluru


**(H) Subset for visualization**

In [59]:
subset = df.loc[df['age'].between(25, 35), ['age', 'salary']]
subset

Unnamed: 0,age,salary
a,25,50000.0
b,30,65000.0
e,28,66550.0


## **14) Advanced patterns & tips**

`.loc` **with callable**

`.loc` accepts a callable that receives the DataFrame and returns indexer:

In [60]:
df.loc[lambda d: d['age'] > 25, :]

Unnamed: 0,age,city,salary,salary_norm
b,30,Mumbai,65000.0,1.0
d,40,Chennai,96800.0,1.0
e,28,Bengaluru,66550.0,1.0


#### **Use `.query()` for readable filter expressions**

In [62]:
df.query('age > 25 and salary > 50000')

# # Equivalent to df.loc[(df['age']>25) & (df['salary']>50000)]

Unnamed: 0,age,city,salary,salary_norm
b,30,Mumbai,65000.0,1.0
d,40,Chennai,96800.0,1.0
e,28,Bengaluru,66550.0,1.0


**`.query()` is often faster and more readable for long boolean conditions; beware of column names with spaces or Python keywords (use backticks).**

#### **Reindexing columns with .loc to ensure columns order and presence**

In [63]:
desired = ['age', 'salary', 'experience']
df = df.reindex(columns = desired)

df.loc[:, desired]

Unnamed: 0,age,salary,experience
a,25,50000.0,
b,30,65000.0,
c,22,42000.0,
d,40,96800.0,
e,28,66550.0,


##### **When index is integer labels (0..n), be explicit**

If your index is `0..n-1` and you want position-based access use `.iloc` to avoid confusion between label and position.

## **15) Common mistakes & how to avoid them**

#### 1. **Chained indexing:**

- Bad: `df[df['age']>25]['salary'] = 0`

- Good: `df.loc[df['age']>25, 'salary'] = 0`

#### 2. **Confusing .loc and .iloc with integer indexes:**

- If index labels are integers, `.loc[2]` selects label `2`, `.iloc[2]` selects the third row by position.

#### 3. **Assuming slice behavior same:**

- `.loc[1:3]` includes `3`; `.iloc[1:3]` excludes `3`.

#### 4. **Expecting view/copy behavior:**

- Explicitly `.copy()` when you need a separate object.

## **16) Quick practical examples**

```python
# Select rows 'b' to 'd' and columns 'age' & 'salary'
df.loc['b':'d', ['age','salary']]

# Select first 3 rows and last column by position
df.iloc[:3, -1]

# Set salary to 0 for ages < 23
df.loc[df['age'] < 23, 'salary'] = 0

# Add a new column using vectorized op
df.loc[:, 'salary_plus_tax'] = df['salary'] * 1.05

# Use .at to set single value
df.at['c', 'salary'] = 45000

# MultiIndex selection example (row level)
# df_m.loc[('A', 'x'), 'col'] or df_m.loc['A'] for partial selection


```

## **17) Best practices / Cheat-sheet ‚úÖ**

- Use `.loc` for label-based work (most common in practice).

- Use `.iloc` for position-based indexing (e.g., if you loop over columns by position).

- Use `.at / .iat` for single scalar access (faster).

- **Avoid chained indexing** ‚Äî use .loc[...] = ... for assignments to original DataFrame.

- **Explicitly** `.copy()` when you need an independent object.

- Use **boolean masks** for filtering; wrap conditions with parentheses.

- Use `.query()` for complex filters (readability).

- For time-series indexing, `DatetimeIndex + .loc` partial-string indexing is excellent.

- If index is integer-based and you want positions, use `.iloc` to avoid ambiguity.

- Prefer vectorized operations over row-wise loops.

## **18) Short troubleshooting checklist**

If something unexpected happens:

- Did you use label vs position accidentally? (loc vs iloc)

- Is your index integer-based? Could that cause confusion?

- Are you seeing `SettingWithCopyWarning?` Use `.loc` assignments or `.copy().
`
- Did a slice unexpectedly include/exclude an endpoint? Check inclusive `(.loc)` vs exclusive `(.iloc)`.

## **19) Extra: converting examples into ML pipeline steps**

1. Select features: `X = df.loc[:, feature_cols]`

2. Train/test split by time or strata: `train = df.loc[df['date'] < split_date]`

3. Impute missing: `df.loc[df['col'].isna(), 'col'] = ...`

4. Normalize columns in-place: `df.loc[:, num_cols] = scaler.fit_transform(df.loc[:, num_cols])`

5. One-hot encode specific columns and concat: `df = pd.concat([df, pd.get_dummies(df.loc[:, cat_cols])], axis=1)`

# **üêº Pandas `.loc` and `.iloc` ‚Äî View vs Copy Explained**

The **pandas** `.loc` and `.iloc` indexers can either return a **view** or a **copy** of a subset of the original DataFrame ‚Äî depending on the operation and context.  
Let‚Äôs break it down üëá

---

### üîç Selection (Retrieval)

When you use `.loc` or `.iloc` **purely for selecting data** (without assigning new values):

- They **generally return a view** of the original DataFrame.  
- ‚ö†Ô∏è This means if you modify the returned view, you *might* also modify the original DataFrame.  
- However, this **is not guaranteed** ‚Äî sometimes pandas may return a **copy** instead (for performance or internal optimization reasons).

---

### ‚úçÔ∏è Assignment (Setting Values)

When you use `.loc` or `.iloc` **to assign new values** to a subset of the DataFrame:

- They **generally operate directly on the original DataFrame**.  
- ‚úÖ This means the changes you make are **reflected in the original DataFrame**, not in a separate copy.  
- This distinction is **crucial for in-place modifications**.

---

### ‚ö†Ô∏è Important Note: `SettingWithCopyWarning`

Pandas issues a **`SettingWithCopyWarning`** when it suspects you are trying to modify a **"chained assignment"** ‚Äî meaning the operation might **not be affecting the original DataFrame** as you intended.

üí° This warning is a **reminder** to:

- Ensure you‚Äôre **explicitly working on a copy** if that‚Äôs your goal.  
- Or, better yet, use a **single `.loc` or `.iloc` operation** for assignment to avoid confusion.

---

### üß≠ In Summary

| Operation Type | Behavior | Effect on Original DataFrame |
|----------------|-----------|------------------------------|
| **Selection** (Retrieval) | Often returns a **view**, but sometimes a **copy** | Changes *may* or *may not* affect the original |
| **Assignment** (Setting Values) | Operates **directly** on the original DataFrame | Changes are **reflected** in the original |

---

### üß© Key Takeaway

> `.loc[...]` and `.iloc[...]` are primarily used for **accessing** and **manipulating** data within an existing DataFrame.  
> When used for **selection**, they often provide a view.  
> When used for **assignment**, they typically **modify the original DataFrame directly**.
