# Why use df.query?

1. Readability: Using df.query can make code more readable and understandable.
2. Simplification: It simplifies filtering conditions, making code more concise and maintainable.
3. Avoiding Ambiguity: df.query helps avoid issues related to operator precedence.
4. Efficiency: df.query can be more efficient for large DataFrames.
5. Reduced Risk of Code Injection: It provides protection against code injection attacks.
6. Easier Parameterization: You can parameterize queries with variables.
7. Integration with NumPy Functions: df.query can work with NumPy functions.
8. Consistency: Promotes a consistent filtering syntax in your codebase.
9. Interactivity: Useful for interactive exploration of data.
10. Compatibility: Compatible with various data sources and formats.

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = {
    'int_column': [1, 2, 3, 4, 5],
    'float_column': [1.1, 2.2, 3.3, 4.4, 5.5],
    'string_column': ['A', 'B', 'C', 'D', 'E'],
    'bool_column': [True, False, True, False, True],
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,int_column,float_column,string_column,bool_column
0,1,1.1,A,True
1,2,2.2,B,False
2,3,3.3,C,True
3,4,4.4,D,False
4,5,5.5,E,True


In [7]:
df.query('int_column > 3')

Unnamed: 0,int_column,float_column,string_column,bool_column
3,4,4.4,D,False
4,5,5.5,E,True


In [9]:
df.query('float_column != 2.2')

Unnamed: 0,int_column,float_column,string_column,bool_column
0,1,1.1,A,True
2,3,3.3,C,True
3,4,4.4,D,False
4,5,5.5,E,True


In [11]:
df.query("(int_column == 5) and (string_column == 'E')")

Unnamed: 0,int_column,float_column,string_column,bool_column
4,5,5.5,E,True


In [12]:
df.query("string_column == 'B'")

Unnamed: 0,int_column,float_column,string_column,bool_column
1,2,2.2,B,False


In [14]:
df.query('int_column in [3, 4]')

Unnamed: 0,int_column,float_column,string_column,bool_column
2,3,3.3,C,True
3,4,4.4,D,False


In [16]:
df.query('int_column not in [5, 2]')

Unnamed: 0,int_column,float_column,string_column,bool_column
0,1,1.1,A,True
2,3,3.3,C,True
3,4,4.4,D,False


In [18]:
var = 5
df.query('int_column == @var')

Unnamed: 0,int_column,float_column,string_column,bool_column
4,5,5.5,E,True


In [20]:
df.query("string_column.str.contains('A')")

Unnamed: 0,int_column,float_column,string_column,bool_column
0,1,1.1,A,True


In [21]:
df.query("int_column == 1 and float_column == 1.1 or string_column=='A'")

Unnamed: 0,int_column,float_column,string_column,bool_column
0,1,1.1,A,True
