# 📊 Data Selection & Filtering in Pandas

Selecting the right rows and columns is the first step in analyzing any dataset. Pandas gives you several powerful ways to do this.

---

## 🔹 Selecting Rows & Columns

### Selecting Columns

```python
df["column_name"]        # Single column (as Series)
df[["col1", "col2"]]     # Multiple columns (as DataFrame)
```

### Selecting Rows by Index

Use `.loc[]` (label-based) and `.iloc[]` (position-based):

```python
df.loc[0]                # First row (by label)
df.iloc[0]               # First row (by position)
```

### Selecting Specific Rows and Columns

```python
df.loc[0, "Name"]        # Value at row 0, column 'Name'
df.iloc[0, 1]            # Value at row 0, column at index 1
```

You can also **slice**:

```python
df.loc[0:2, ["Name", "Age"]]   # Rows 0 to 2, selected columns
df.iloc[0:2, 0:2]              # Rows and cols by index position
```

---

## ⚡ Fast Access: `.at` and `.iat`

These are optimized for **single element access**:

```python
df.at[0, "Name"]       # Fast label-based access
df.iat[0, 1]           # Fast position-based access
```

---

## 🔍 Filtering with Conditions

### Simple Condition

```python
df[df["Age"] > 30]
```

### Multiple Conditions (AND / OR)

```python
df[(df["Age"] > 25) & (df["City"] == "Dhaka")]
df[(df["Name"] == "Bob") | (df["Age"] < 30)]
```

> ✅ **Use parentheses around each condition!**

---

## 🔎 Querying with `.query()`

The `.query()` method in pandas lets you filter DataFrame rows using a string expression — it's a more readable and often more concise alternative to boolean indexing.

### ✅ Example

```python
df.query("Age > 25 and City == 'Dhaka'")
```

### 🔁 Dynamic column names

```python
col = "Age"
df.query(f"{col} > 25")
```

---

## 🧠 Rules and Tips for `.query()`

1. **Column names become variables**

   ```python
   df.query("age > 25 and city == 'Dhaka'")
   ```

2. **String values must be in quotes**

   ```python
   df.query("name == 'Harry'")
   df.query('city == "Mumbai"')  # Mix quotes if needed
   ```

3. **Use backticks for special column names**

   ```python
   df.query("`first name` == 'Alice'")
   ```

4. **Reference Python variables with `@`**

   ```python
   age_limit = 30
   df.query("age > @age_limit")
   ```

5. **Logical operators**

   * Use: `and`, `or`, `not`
   * Avoid: `&`, `|`, `~`

   ❌ Bad:

   ```python
   df.query("age > 30 & city == 'Dhaka'")
   ```

   ✅ Good:

   ```python
   df.query("age > 30 and city == 'Dhaka'")
   ```

6. **Chained comparisons**

   ```python
   df.query("25 < age <= 40")
   ```

7. **Avoid reserved keywords as column names**

   ```python
   df.query("`class` == 'Physics'")
   ```

8. **Case-sensitive**

   ```python
   df.query("City == 'Dhaka'")  # ❌ if actual value is 'Dhaka'
   ```

9. **Returns a copy, not a view**

   ```python
   filtered = df.query("age < 50")
   ```

---

## ✅ Summary

* Use `df[col]`, `.loc[]`, `.iloc[]`, `.at[]`, `.iat[]` to access data
* Filter with logical conditions or `.query()` for readable code
* Mastering selection makes the rest of pandas feel easy

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv("data2.csv")

In [4]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [5]:
df["Actor"]

0         Shah Rukh Khan
1            Salman Khan
2             Aamir Khan
3          Ranbir Kapoor
4          Ranveer Singh
5     Ayushmann Khurrana
6          Rajkummar Rao
7         Hrithik Roshan
8           Akshay Kumar
9          Kartik Aaryan
10          Varun Dhawan
11         Vicky Kaushal
Name: Actor, dtype: object

In [6]:
df["Actor"][0]

'Shah Rukh Khan'

In [7]:
type(df["Actor"])

pandas.core.series.Series

In [8]:
df[["Actor", "IMDb"]]

Unnamed: 0,Actor,IMDb
0,Shah Rukh Khan,7.2
1,Salman Khan,6.0
2,Aamir Khan,8.4
3,Ranbir Kapoor,5.6
4,Ranveer Singh,7.0
5,Ayushmann Khurrana,8.3
6,Rajkummar Rao,7.5
7,Hrithik Roshan,6.5
8,Akshay Kumar,7.0
9,Kartik Aaryan,5.9


In [9]:
df.loc[1]        # Shows result by the label name. If the label name were with a, b, c.... then we would use a, b, c.....

Actor                       Salman Khan
Film                    Tiger Zinda Hai
Year                               2017
Genre                            Action
BoxOffice(INR Crore)                565
IMDb                                6.0
Name: 1, dtype: object

In [10]:
df.iloc[1]       # Shows result by the row number

Actor                       Salman Khan
Film                    Tiger Zinda Hai
Year                               2017
Genre                            Action
BoxOffice(INR Crore)                565
IMDb                                6.0
Name: 1, dtype: object

In [11]:
df.loc[8, 'IMDb']       # for getting specific infor of that pertucular row column

7.0

In [12]:
df.iloc[8]              # for getting all the informationons of row 8

Actor                   Akshay Kumar
Film                      Good Newwz
Year                            2019
Genre                         Comedy
BoxOffice(INR Crore)             318
IMDb                             7.0
Name: 8, dtype: object

In [13]:
df.iloc[8, 5]          # 8 is the index of row and 5 is the index of column

7.0

In [14]:
df.loc[:2, ["Actor", "Film", "IMDb"]]

Unnamed: 0,Actor,Film,IMDb
0,Shah Rukh Khan,Pathaan,7.2
1,Salman Khan,Tiger Zinda Hai,6.0
2,Aamir Khan,Dangal,8.4


In [15]:
df.at[0, "Actor"]

'Shah Rukh Khan'

In [16]:
df.iat[0, 0]

'Shah Rukh Khan'

In [17]:
df[df["IMDb"] > 7]

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [18]:
df[df["IMDb"] > 7]["Actor"]

0         Shah Rukh Khan
2             Aamir Khan
5     Ayushmann Khurrana
6          Rajkummar Rao
11         Vicky Kaushal
Name: Actor, dtype: object

In [19]:
df[(df["IMDb"] > 7) & (df["Year"] > 2018)]

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [20]:
df.query("Year > 2018 and IMDb > 7")

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
