#Pandas.DataFrame.query() by Examples

---

**Pandas DataFrame.query() method is used to query the rows based on the expression (single or multiple column conditions) provided and returns a new DataFrame. In case you wanted to update the existing referring DataFrame use inplace=True argument.**

In [0]:

# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000   30days      1000
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500


In [0]:
# Below are the quick examples

# Query Rows using DataFrame.query()
df2=df.query("Courses == 'Spark'")
df2

Unnamed: 0,Courses,Fee,Duration,Discount
0,Spark,22000,30days,1000


In [0]:
# Using variable
value='Spark'
df2=df.query("Courses == @value")
df2

Unnamed: 0,Courses,Fee,Duration,Discount
0,Spark,22000,30days,1000


In [0]:
# inpace
df2 = df.copy()
df2.query("Courses == 'Spark'",inplace=True)
df2

Unnamed: 0,Courses,Fee,Duration,Discount
0,Spark,22000,30days,1000


In [0]:
# Not equals, in & multiple conditions
df2 = df.query("Courses != 'Spark'")
df2

Unnamed: 0,Courses,Fee,Duration,Discount
1,PySpark,25000,50days,2300
2,Hadoop,23000,30days,1000
3,Python,24000,,1200
4,Pandas,26000,,2500


In [0]:
df.query("Courses in ('Spark','PySpark')")

Unnamed: 0,Courses,Fee,Duration,Discount
0,Spark,22000,30days,1000
1,PySpark,25000,50days,2300


In [0]:
df2 = df.query("Fee >= 23000")
df2

Unnamed: 0,Courses,Fee,Duration,Discount
1,PySpark,25000,50days,2300
2,Hadoop,23000,30days,1000
3,Python,24000,,1200
4,Pandas,26000,,2500


In [0]:
df.query("Fee >= 23000 and Fee <= 24000")

Unnamed: 0,Courses,Fee,Duration,Discount
2,Hadoop,23000,30days,1000
3,Python,24000,,1200


##Using DataFrame.query()

**Following is the syntax of DataFrame.query() method.**


# query() method syntax
DataFrame.query(expr, inplace=False, **kwargs)


- expr – expression takes conditions to query rows

- inplace – Defaults to False. When set toTrue, it updates the referring DataFrame and query() method returns None.
- **kwargs –  Keyword arguments that works with eval()

---

**DataFrame.query() takes condition in expression to select rows from a DataFrame. This expression can have one or multiple conditions.**

In [0]:

# Query all rows with Courses equals 'Spark'
df2=df.query("Courses == 'Spark'")
print(df2)

  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000


##In case you wanted to use a variable in the expression, use @ character.

In [0]:

# Query Rows by using Python variable
value='Spark'
df2=df.query("Courses == @value")
print(df2)

  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000


###If you notice the above examples return a new DataFrame after filtering the rows. if you wanted to update the existing DataFrame use inplace=True

In [0]:

# Replace current esisting DataFrame
df2 = df.copy()
print(f'{df2}\n')
df2.query("Courses == 'Spark'",inplace=True)
print(df2)


   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000   30days      1000
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500

  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000


###If you wanted to select based on column value not equals then use != operator.

In [0]:

# not equals condition
df2=df.query("Courses != 'Spark'")
df2

Unnamed: 0,Courses,Fee,Duration,Discount
1,PySpark,25000,50days,2300
2,Hadoop,23000,30days,1000
3,Python,24000,,1200
4,Pandas,26000,,2500


###4. Select Rows Based on List of Column Values


**If you have values in a python list and wanted to select the rows based on the list of values, use in operator, it’s like checking a value contains in a list of string values.**

In [0]:
# Query Rows by list of values
print(df.query("Courses in ('Spark','PySpark')"))

   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300


####You can also write with a list of values in a python variable.

In [0]:
# Query Rows by list of values
values=['Spark','PySpark']
print(df.query("Courses in @values"))

   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300


####To select rows that are not in a list of column values can be done using not in operator.

In [0]:
# Query Rows not in list of values
values=['Spark','PySpark']
print(df.query("Courses not in @values"))

  Courses    Fee Duration  Discount
2  Hadoop  23000   30days      1000
3  Python  24000     None      1200
4  Pandas  26000      NaN      2500


**If you have column names with special characters using column name surrounded by tick ` character .**

In [0]:
df2 = pd.DataFrame(df.values, columns=['Courses', 'Courses Fee', 'Duration', 'Discount'])
print(df2)

   Courses Courses Fee Duration Discount
0    Spark       22000   30days     1000
1  PySpark       25000   50days     2300
2   Hadoop       23000   30days     1000
3   Python       24000     None     1200
4   Pandas       26000      NaN     2500


In [0]:

# Using columns with special characters
print(df2.query("`Courses Fee` >= 23000"))


   Courses Courses Fee Duration Discount
1  PySpark       25000   50days     2300
2   Hadoop       23000   30days     1000
3   Python       24000     None     1200
4   Pandas       26000      NaN     2500


##5. Query with Multiple Conditions

**In Pandas or any table-like structures, most of the time we would need to select the rows based on multiple conditions by using multiple columns, you can do that in Pandas DataFrame as below.**

In [0]:
# Query by multiple conditions
print(df2.query("`Courses Fee` >= 23000 and `Courses Fee` <= 24000"))

  Courses Courses Fee Duration Discount
2  Hadoop       23000   30days     1000
3  Python       24000     None     1200


##Query Rows using apply()


**pandas.DataFrame.apply() method is used to apply the expression row-by-row and return the rows that matched the values. The below example returns every match when Courses contains a list of specified string values.**

In [0]:
# By using lambda function
print(df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])]))

   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300


##Other Examples using df[] and loc[]

In [0]:
# Other examples you can try to query rows
print(df[df["Courses"] == 'Spark'])
print()

value='PySpark'
print(df.loc[df['Courses'] == value])
print()


print(df.loc[df['Courses'] != 'Spark'])

  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000

   Courses    Fee Duration  Discount
1  PySpark  25000   50days      2300

   Courses    Fee Duration  Discount
1  PySpark  25000   50days      2300
2   Hadoop  23000   30days      1000
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500


In [0]:
values = ['Spark', 'Python']
print(df.loc[df['Courses'].isin(values)])
print()
print(df.loc[~df['Courses'].isin(values)])


  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000
3  Python  24000     None      1200

   Courses    Fee Duration  Discount
1  PySpark  25000   50days      2300
2   Hadoop  23000   30days      1000
4   Pandas  26000      NaN      2500


In [0]:
print(df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)])
print()
print(df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )])

  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000
2  Hadoop  23000   30days      1000
3  Python  24000     None      1200

   Courses    Fee Duration  Discount
1  PySpark  25000   50days      2300
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500


In [0]:
# Select based on value contains
print(df[df['Courses'].str.contains("Spark")])
print()
# Select after converting values
print(df[df['Courses'].str.lower().str.contains("spark")])
print()
# Select startswith
print(df[df['Courses'].str.startswith("P")])

   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300

   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300

   Courses    Fee Duration  Discount
1  PySpark  25000   50days      2300
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500
