 
# Data Selection & Filtering

Selecting the right rows and columns is *the first step* in analyzing any dataset. Pandas gives you several powerful ways to do this.

---

## Selecting Rows & Columns

### Selecting Columns

```python
df["column_name"]        # Single column (as Series)
df[["col1", "col2"]]     # Multiple columns (as DataFrame)
```

### Selecting Rows by Index

Use `.loc[]` (label-based) and `.iloc[]` (position-based):

```python
df.loc[0]                # First row (by label)
df.iloc[0]               # First row (by position)
```

### Select Specific Rows and Columns

```python
df.loc[0, "Name"]        # Value at row 0, column 'Name'
df.iloc[0, 1]            # Value at row 0, column at index 1
```

You can also slice:

```python
df.loc[0:2, ["Name", "Age"]]   # Rows 0 to 2, selected columns
df.iloc[0:2, 0:2]              # Rows and cols by index position
```

---

## Fast Access: `.at` and `.iat`

These are optimized for **single element access**:

```python
df.at[0, "Name"]       # Fast label-based access
df.iat[0, 1]           # Fast position-based access
```

---

## Filtering with Conditions

### Simple Condition

```python
df[df["Age"] > 30]
```

### Multiple Conditions (AND / OR)

```python
df[(df["Age"] > 25) & (df["City"] == "Delhi")]
df[(df["Name"] == "Bob") | (df["Age"] < 30)]
```

> Use parentheses around each condition!

---

## Querying with `.query()`

The `.query()` method in pandas lets you filter DataFrame rows using a string expression — it's a more readable and often more concise alternative to using boolean indexing.

This is a cleaner, SQL-like way to filter:

```python
df.query("Age > 25 and City == 'Delhi'")
```

Dynamic column names:

```python
col = "Age"
df.query(f"{col} > 25")
```



Here are the main **rules and tips** for using `.query()` in pandas:

---

### **1. Column names become variables**
You can reference column names directly in the query string:

```python
df.query("age > 25 and city == 'Delhi'")
```

---

### **2. String values must be in quotes**
Use **single** or **double** quotes around strings in the expression:

```python
df.query("name == 'Harry'")
```

If you have quotes inside quotes, mix them:

```python
df.query('city == "Mumbai"')
```

---

### **3. Use backticks for column names with spaces or special characters**
If a column name has spaces, use backticks (`` ` ``):

```python
df.query("`first name` == 'Alice'")
```

---

### **4. You can use `@` to reference Python variables**
To pass external variables into `.query()`:

```python
age_limit = 30
df.query("age > @age_limit")
```

---

### **5. Logical operators**
Use these:
- `and`, `or`, `not` — instead of `&`, `|`, `~`
- `==`, `!=`, `<`, `>`, `<=`, `>=`

Bad:
```python
df.query("age > 30 & city == 'Delhi'")  # ❌
```

Good:
```python
df.query("age > 30 and city == 'Delhi'")  # ✅
```

---

### **6. Chained comparisons**
Just like Python:

```python
df.query("25 < age <= 40")
```

---

### **7. Avoid using reserved keywords as column names**
If you have a column named `class`, `lambda`, etc., you’ll need to use backticks:

```python
df.query("`class` == 'Physics'")
```

---

### **8. Case-sensitive**
Column names and string values are case-sensitive:

```python
df.query("City == 'delhi'")  # ❌ if actual value is 'Delhi'
```

---

### **9. `.query()` returns a **copy**, not a view**
The result is a new DataFrame. Changes won't affect the original unless reassigned:

```python
filtered = df.query("age < 50")
```

---


## Summary

- Use `df[col]`, `.loc[]`, `.iloc[]`, `.at[]`, `.iat[]` to access data  
- Filter with logical conditions or `.query()` for readable code  
- Mastering selection makes the rest of pandas feel easy
 

In [1]:
import pandas as pd 

In [4]:
df=pd.read_csv("data.csv")
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [7]:
df["Actor"]

0         Shah Rukh Khan
1            Salman Khan
2             Aamir Khan
3          Ranbir Kapoor
4          Ranveer Singh
5     Ayushmann Khurrana
6          Rajkummar Rao
7         Hrithik Roshan
8           Akshay Kumar
9          Kartik Aaryan
10          Varun Dhawan
11         Vicky Kaushal
Name: Actor, dtype: object

In [8]:
df["Actor"][3]

'Ranbir Kapoor'

In [9]:
type(df["Actor"])

pandas.core.series.Series

In [11]:
df[['Actor','IMDb']]

Unnamed: 0,Actor,IMDb
0,Shah Rukh Khan,7.2
1,Salman Khan,6.0
2,Aamir Khan,8.4
3,Ranbir Kapoor,5.6
4,Ranveer Singh,7.0
5,Ayushmann Khurrana,8.3
6,Rajkummar Rao,7.5
7,Hrithik Roshan,6.5
8,Akshay Kumar,7.0
9,Kartik Aaryan,5.9


In [14]:
df.loc[1]

Actor                       Salman Khan
Film                    Tiger Zinda Hai
Year                               2017
Genre                            Action
BoxOffice(INR Crore)                565
IMDb                                6.0
Name: 1, dtype: object

In [15]:
df.iloc[1]

Actor                       Salman Khan
Film                    Tiger Zinda Hai
Year                               2017
Genre                            Action
BoxOffice(INR Crore)                565
IMDb                                6.0
Name: 1, dtype: object

In [21]:
df.loc[8,"Actor"]

'Akshay Kumar'

In [24]:
df.iloc[8,0]

'Akshay Kumar'

In [26]:
df.loc[0:2,["Actor","IMDb","Film"]]

Unnamed: 0,Actor,IMDb,Film
0,Shah Rukh Khan,7.2,Pathaan
1,Salman Khan,6.0,Tiger Zinda Hai
2,Aamir Khan,8.4,Dangal


In [27]:
df.iloc[0:2,[0,5,1]]

Unnamed: 0,Actor,IMDb,Film
0,Shah Rukh Khan,7.2,Pathaan
1,Salman Khan,6.0,Tiger Zinda Hai


In [33]:
df.iloc[0:2,[0:2,5]]

SyntaxError: invalid syntax (3030104152.py, line 1)