# High-Performance Pandas: eval() and query()

Pandas includes some experimental tools that allow you to directly access C-speed operations without costly allocation of intermediate arrays. These are the **eval()** and **query()** functions, which rely on the Numexpr package.

In [1]:
import numpy as np
import pandas as pd

In [None]:
mask = (x > 0.5) & (y < 0.5)
import numexpr
mask_number = numexpr.evaluate('(x > 0.5) & (y < 0.5)')
np.allclose(mask, mask_number)

**Operations supported by pd.eval()**

In [None]:
df1, df2, df3, df4, df5 = (pd.DataFrame(rng.randint(0, 1000, (100, 3))) for i in range(5))

In [None]:
pd.eval('-df1 * df2 / (df3 + df4) - df5')
pd.eval('df1 < df2 <= df3 != df4')
pd.eval('(df1 < 0.5) & (df2 < 0.5) | (df3 < df4)')
pd.eval('(df1 < 0.5) and (df2 < 0.5) or (df3 < df4)')
pd.eval('df2.T[0] + df3.iloc[1]')

**DataFrame.eval() for Column-Wise Operations**

In [None]:
(df['A'] + df['B']) / (df['C'] - 1)
pd.eval("(df.A + df.B) / (df.C - 1)")

The DataFrame.eval() method allows much more succinct evaluation of expressions with the columns:

In [None]:
df.eval('(A + B) / (C - 1)')

**1. Assignment in DataFrame.eval()**  
We can use df.eval() to create a new column 'D' and assign to it a value computed from the other columns:

In [None]:
df.eval('D = (A + B) / C', inplace=True)

In the same way, any existing column can be modified:

In [None]:
df.eval('D = (A - B) / C', inplace=True)

**2. Local variables in DataFrame.eval()**  
The DataFrame.eval() method supports an additional syntax that lets it work with local Python variables. Consider the following:

In [None]:
column_mean = df.mean(1)
result1 = df['A'] + column_mean
result2 = df.eval('A + @column_mean')
np.allclose(result1, result2)

The @ character here marks a variable name rather than a column name, and lets you efficiently evaluate expressions involving the two "namespaces": the namespace of columns, and the namespace of Python objects.   
Notice that this @ character is only supported by the DataFrame.eval() method, not by the pandas.eval(), because the pandas.eval() function only has access to the one (Python) namespace.

**DataFrame.query() Method**

In [None]:
result1 = df[(df.A < 0.5) & (df.B < 0.5)]
result2 = pd.eval('df[(df.A < 0.5) & (df.B < 0.5)]')
np.allclose(result1, result2)

As with the example used in our discussion of DataFrame.eval(), this is an expression involving columns of the DataFrame. It cannot be expressed using the DataFrame.eval() syntax, however! Instead, for this type of filtering operation, you can use the **query()** method:

In [None]:
df.query('A < 0.5 and B < 0.5')

In addition to being a more efficient computation, compared to the masking expression this is much easier to read and understand. Note that the **query()** method also accepts the @ flag to mark local variables:

In [None]:
Cmean = df['C'].mean()
result1 = df[(df.A < Cmean) & (df.B < Cmean)]
result2 = df.query('A < @Cmean and B < @Cmean')
np.allclose(result1, result2)