# Data Selection Techniques in Pandas: .loc[], .iloc[], and [] Bracket Notation

### What Is Data Selection in Pandas?

Once we’ve loaded and organized our data in a DataFrame, the next most important task is **selecting specific rows, columns, or values** to explore, clean, analyze, or manipulate. Pandas gives us multiple tools for this — the most powerful and commonly used being:

- `.loc[]` — Label-based selection
- `.iloc[]` — Integer-based selection
- `[]` — Bracket notation (quick access, often shorthand)

Understanding when and how to use each one is essential for mastering data analysis. Each method has its strengths and can be used depending on whether we want to select by **row label**, **row position**, or just **column name**.

Let’s explore all of them step-by-step with real examples using the Titanic dataset.

### `[]` Bracket Notation (Quick Column Access)

This is the most basic and often-used method to access **columns**. We simply pass the column name in quotes.

In [1]:
import pandas as pd
df = pd.read_csv("data/train.csv")

# Accessing a single column
df['Name']

# Accessing multiple columns
df[['Name', 'Age']]

Unnamed: 0,Name,Age
0,"Braund, Mr. Owen Harris",22.0
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
2,"Heikkinen, Miss. Laina",26.0
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,"Allen, Mr. William Henry",35.0
...,...,...
886,"Montvila, Rev. Juozas",27.0
887,"Graham, Miss. Margaret Edith",19.0
888,"Johnston, Miss. Catherine Helen ""Carrie""",
889,"Behr, Mr. Karl Howell",26.0


### Limitations:

- Can only access **columns** (not rows).
- If you try `df[0]` or a numeric index, it will raise an error.

### `.loc[]`: Label-Based Selection

The `.loc[]` method is used to select rows and columns **by labels** (i.e., index values or column names).

**Syntax:**

```python
df.loc[row_label, column_label]
```

**Example:**

In [2]:
# Select a single row by index label
print(df.loc[0])

# Select multiple rows by index labels
print(df.loc[[0, 1, 2]])

# Select specific rows and columns
print(df.loc[0:4, ['Name', 'Sex', 'Age']])

# Select all rows but only certain columns
print(df.loc[:, ['Survived', 'Fare']])

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                               22.0
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0 

### Advantages:

- Works well with **custom indexes**
- Very intuitive and readable when selecting **named rows and columns**

### `.iloc[]`: Integer-Based Selection

The `.iloc[]` method is used to select **rows and columns by integer position**.

**Syntax:**

```python
df.iloc[row_index, column_index]
```

 **Example:**

In [3]:
# First row
print(df.iloc[0])

# First 5 rows
print(df.iloc[0:5])

# First 5 rows, first 3 columns
print(df.iloc[0:5, 0:3])

# Row 10, column 5
print(df.iloc[10, 5])

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                               22.0
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female 

### Use `.iloc[]` when:

- You want positional slicing (just like arrays)
- You're not sure what the index or column labels are

### Difference Between `.loc[]` and `.iloc[]`

| Feature | `.loc[]` | `.iloc[]` |
| --- | --- | --- |
| Selection type | Label-based | Position-based |
| Index used | Custom or default index labels | Integer positions (0-based) |
| Inclusiveness | Includes both start and end index | End index is **exclusive** |
| Best use case | Readable, works well with custom IDs | Purely numerical slicing or iteration |

### Real-World Examples (Using Titanic)

1. Selecting all **female passengers**:

In [4]:
df_female = df.loc[df['Sex'] == 'female']
print(df_female.head())

   PassengerId  Survived  Pclass  \
1            2         1       1   
2            3         1       3   
3            4         1       1   
8            9         1       3   
9           10         1       2   

                                                Name     Sex   Age  SibSp  \
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
8  Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)  female  27.0      0   
9                Nasser, Mrs. Nicholas (Adele Achem)  female  14.0      1   

   Parch            Ticket     Fare Cabin Embarked  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
8      2            347742  11.1333   NaN        S  
9      0            237736  30.0708   NaN        C  


2. Selecting rows **with age above 60**, but only show Name, Age, Fare:

In [5]:
df_senior = df.loc[df['Age'] > 60, ['Name', 'Age', 'Fare']]
print(df_senior)

                                          Name   Age      Fare
33                       Wheadon, Mr. Edward H  66.0   10.5000
54              Ostby, Mr. Engelhart Cornelius  65.0   61.9792
96                   Goldschmidt, Mr. George B  71.0   34.6542
116                       Connors, Mr. Patrick  70.5    7.7500
170                  Van der hoef, Mr. Wyckoff  61.0   33.5000
252                  Stead, Mr. William Thomas  62.0   26.5500
275          Andrews, Miss. Kornelia Theodosia  63.0   77.9583
280                           Duane, Mr. Frank  65.0    7.7500
326                  Nysveen, Mr. Johan Hansen  61.0    6.2375
438                          Fortune, Mr. Mark  64.0  263.0000
456                  Millet, Mr. Francis Davis  65.0   26.5500
483                     Turkula, Mrs. (Hedwig)  63.0    9.5875
493                    Artagaveytia, Mr. Ramon  71.0   49.5042
545               Nicholson, Mr. Arthur Ernest  64.0   26.0000
555                         Wright, Mr. George  62.0   

3. Selecting **first 10 rows and first 4 columns** using `.iloc[]`:

In [6]:
print(df.iloc[0:10, 0:4])

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   
5            6         0       3   
6            7         0       1   
7            8         0       3   
8            9         1       3   
9           10         1       2   

                                                Name  
0                            Braund, Mr. Owen Harris  
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  
2                             Heikkinen, Miss. Laina  
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  
4                           Allen, Mr. William Henry  
5                                   Moran, Mr. James  
6                            McCarthy, Mr. Timothy J  
7                     Palsson, Master. Gosta Leonard  
8  Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)  
9                Nasser, Mrs. Nicholas (Adele Achem) 

### Exercises

Q1. Select and display only the 'Name', 'Age', and 'Sex' columns for the first 7 rows using `.loc[]`.

In [7]:
import pandas as pd

df = pd.read_csv("data/train.csv")

selected_data = df.loc[0:6, ["Name", "Age", "Sex"]]
print(selected_data)

                                                Name   Age     Sex
0                            Braund, Mr. Owen Harris  22.0    male
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0  female
2                             Heikkinen, Miss. Laina  26.0  female
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0  female
4                           Allen, Mr. William Henry  35.0    male
5                                   Moran, Mr. James   NaN    male
6                            McCarthy, Mr. Timothy J  54.0    male


Q2. Use `.iloc[]` to get rows from index 10 to 20 and columns from index 0 to 5.

In [8]:
subset = df.iloc[10:21, 0:6]
print(subset)

    PassengerId  Survived  Pclass  \
10           11         1       3   
11           12         1       1   
12           13         0       3   
13           14         0       3   
14           15         0       3   
15           16         1       2   
16           17         0       3   
17           18         1       2   
18           19         0       3   
19           20         1       3   
20           21         0       2   

                                                 Name     Sex   Age  
10                    Sandstrom, Miss. Marguerite Rut  female   4.0  
11                           Bonnell, Miss. Elizabeth  female  58.0  
12                     Saundercock, Mr. William Henry    male  20.0  
13                        Andersson, Mr. Anders Johan    male  39.0  
14               Vestrom, Miss. Hulda Amanda Adolfina  female  14.0  
15                   Hewlett, Mrs. (Mary D Kingcome)   female  55.0  
16                               Rice, Master. Eugene    male   2

Q3. Select all rows where 'Fare' is greater than 100 and display 'Name', 'Fare', and 'Pclass'.

In [9]:
high_fare_passengers = df.loc[df["Fare"] > 100, ["Name", "Fare", "Pclass"]]
print(high_fare_passengers)

                                                  Name      Fare  Pclass
27                      Fortune, Mr. Charles Alexander  263.0000       1
31      Spencer, Mrs. William Augustus (Marie Eugenie)  146.5208       1
88                          Fortune, Miss. Mabel Helen  263.0000       1
118                           Baxter, Mr. Quigg Edmond  247.5208       1
195                               Lurette, Miss. Elise  146.5208       1
215                            Newell, Miss. Madeleine  113.2750       1
258                                   Ward, Miss. Anna  512.3292       1
268      Graham, Mrs. William Thompson (Edith Junkins)  153.4625       1
269                             Bissette, Miss. Amelia  135.6333       1
297                       Allison, Miss. Helen Loraine  151.5500       1
299    Baxter, Mrs. James (Helene DeLaudeniere Chaput)  247.5208       1
305                     Allison, Master. Hudson Trevor  151.5500       1
306                            Fleming, Miss. Marga

Q4. Use `[]` bracket notation to select and display the 'Survived' column.

In [10]:
survived_column = df["Survived"]
print(survived_column.head())

0    0
1    1
2    1
3    1
4    0
Name: Survived, dtype: int64


Q5. What’s the difference in the result between `df.loc[0:5]` and `df.iloc[0:5]`?

**Answer:**

- `df.loc[0:5]` selects **rows with labels/indexes from 0 to 5, inclusive**. It includes **both** the start and end indexes.
- `df.iloc[0:5]` selects **rows by integer position from 0 up to but not including 5**. It includes **only** index positions 0 to 4.

In [11]:
print(df.loc[0:5])
print(df.iloc[0:5])

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   
5            6         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   
5                                   Moran, Mr. James    male   NaN      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN       

### Summary

Data selection is at the core of working with any dataset. Whether we're analyzing data, applying filters, or preparing it for machine learning, being able to select specific rows and columns is absolutely essential. In Pandas, we’re provided with multiple tools for this: bracket notation (`[]`), `.loc[]`, and `.iloc[]`.

The simplest way to access columns is by using the bracket notation like `df['Name']`, which quickly returns a single column as a Series. We can even select multiple columns by wrapping them in a list. However, this method only works for columns — we can’t use it for rows.

To access both rows and columns with full flexibility, we use `.loc[]` and `.iloc[]`. The `.loc[]` method allows label-based selection, meaning we select rows and columns by their index names or column names. This is particularly useful when we’ve set a custom index (like `PassengerId` or `Pclass`). It’s intuitive and powerful for real-world datasets.

On the other hand, `.iloc[]` is position-based, like slicing in a Python list. It’s great for numerical slicing and when we don’t know the exact labels. A key thing to remember is that `.loc[]` includes the end index in a range, while `.iloc[]` excludes it (just like Python slicing).

By mastering these techniques, we can navigate, filter, and manipulate datasets with confidence. In AI/ML projects, selecting the right data — whether it’s features or training subsets — is often the first step toward building smarter systems.