Data Selection & Filtering
Selecting the right rows and columns is the first step in analyzing any dataset.
Pandas gives you several powerful ways to do this

In [1]:
import pandas as pd

In [9]:
df = pd.read_csv(r"C:\Users\rosha\Pandas practice\researchdata.csv")  

In [10]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [11]:
df['Actor']

0         Shah Rukh Khan
1            Salman Khan
2             Aamir Khan
3          Ranbir Kapoor
4          Ranveer Singh
5     Ayushmann Khurrana
6          Rajkummar Rao
7         Hrithik Roshan
8           Akshay Kumar
9          Kartik Aaryan
10          Varun Dhawan
11         Vicky Kaushal
Name: Actor, dtype: object

In [14]:
type(df['Actor'])

pandas.core.series.Series

In [12]:
df['Actor'][6]

'Rajkummar Rao'

In [13]:
type(df['Actor'][6])

str

In [15]:
df[['Film', 'Genre']]

Unnamed: 0,Film,Genre
0,Pathaan,Action
1,Tiger Zinda Hai,Action
2,Dangal,Biography
3,Brahmastra,Fantasy
4,Padmaavat,Historical
5,Andhadhun,Thriller
6,Stree,Horror Comedy
7,War,Action
8,Good Newwz,Comedy
9,Bhool Bhulaiyaa 2,Horror Comedy


In [16]:
df.loc[7]                # First row (by label)

Actor                   Hrithik Roshan
Film                               War
Year                              2019
Genre                           Action
BoxOffice(INR Crore)               475
IMDb                               6.5
Name: 7, dtype: object

In [17]:
df.iloc[5]             # First row (by position)

Actor                   Ayushmann Khurrana
Film                             Andhadhun
Year                                  2018
Genre                             Thriller
BoxOffice(INR Crore)                   111
IMDb                                   8.3
Name: 5, dtype: object

In [18]:
df.loc[9, "IMDb"]        # Value at row 0, column 'Name'

np.float64(5.9)

In [19]:
df.iloc[0, 4]          # Value at row 0, column at index 1

np.int64(1050)

In [22]:
df.loc[0:2, ["Film", "IMDb"]]   # Rows 0 to 2, selected columns

Unnamed: 0,Film,IMDb
0,Pathaan,7.2
1,Tiger Zinda Hai,6.0
2,Dangal,8.4


In [23]:
df.iloc[0:2, 0:2]              # Rows and cols by index position

Unnamed: 0,Actor,Film
0,Shah Rukh Khan,Pathaan
1,Salman Khan,Tiger Zinda Hai


In [25]:
df.at[3, "Film"]       # Fast label-based access

'Brahmastra'

In [26]:
df.iat[4, 5]      ## Fast position-based access

np.float64(7.0)

In [28]:
df[df['IMDb']>7]     #return IMDb greater than 7

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [29]:
df[df['IMDb']>7]['Actor']             #Only Actor

0         Shah Rukh Khan
2             Aamir Khan
5     Ayushmann Khurrana
6          Rajkummar Rao
11         Vicky Kaushal
Name: Actor, dtype: object

In [30]:
df[(df['IMDb']>7) & (df['Year']>2017)]       # IMBd greater than 7 and also after years 2017
                                               #AND operator

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [31]:
df[(df['IMDb']>7) | (df['Year']>2017)]              #OR operator

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


The .query() method in pandas lets you filter DataFrame rows using a string expression 
it's a more readable and often more concise alternative to using boolean indexing.
This is a cleaner, SQL-like way to filter:

In [34]:
df.query("Year > 2018")

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [36]:
df.query("Year > 2019 and IMDb > 6")

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2


Here are the main rules and tips for using 
.query() in pandas

 1. Column names become variables
 You can reference column names directly in the query string:
 df.query("age > 25 and city == 'Delhi'"

2. String values must be in quotes
 Use single or double quotes around strings in the expression:
 df.query("name == 'Harry'")
 If you have quotes inside quotes, mix them:
 df.query('city == "Mumbai"')

3. Use backticks for column names with spaces or special
 characters
 If a column name has spaces, use backticks (` ):
 df.query("`first name` == 'Alice'")

4. You can use @ to reference Python variables To pass external variables into 
age_limit = 30
 df.query("age > @age_limit")

5. Logical operators
 Use these: - and , or , not — instead of & , | , ~ - == , != , < , > , <= , >=
df.query("age > 30 & city == 'Delhi'") # wrong
df.query("age > 30 and city == 'Delhi'") # right

6. Chained comparisons Just like Python:
 df.query("25 < age <= 40")

7. Avoid using reserved keywords as column names 
 If you have a column named class ,lambda , etc., you’ll need to use backticks
df.query("`class` == 'Physics'")

8. Case-sensitive
Column names and string values are case-sensitive:
df.query("City == 'delhi'") #  if actual value is 'Delhi'

9.query() returns a copy, not a view
The result is a new DataFrame. Changes won’t affect the original unless reassigned
filtered = df.query("age < 50")

for copy use
copy()