# Pandas eval expression evaluation


[Original Material](https://pandas.pydata.org/pandas-docs/stable/enhancingperf.html)

The top-level function pandas.eval() implements expression evaluation of Series and DataFrame objects.

The point of using eval() for expression evaluation rather than plain Python is two-fold: 1) large DataFrame objects are evaluated more efficiently and 2) large arithmetic and boolean expressions are evaluated all at once by the underlying engine (by default numexpr is used for evaluation).

**Note** You should not use eval() for simple expressions or for expressions involving small DataFrames. In fact, eval() is many orders of magnitude slower for smaller expressions/objects than plain ol’ Python. A good rule of thumb is to only use eval() when you have a DataFrame with more than 10,000 rows.
eval() supports all arithmetic expressions supported by the engine in addition to some extensions available only in pandas.

**Note** The larger the frame and the larger the expression the more speedup you will see from using eval().

**Note** Operations such as
```text
1 and 2  # would parse to 1 & 2, but should evaluate to 2
3 or 4  # would parse to 3 | 4, but should evaluate to 3
~1  # this is okay, but slower when using eval
```
should be performed in Python. An exception will be raised if you try to perform any boolean/bitwise operations with scalar operands that are not of type bool or np.bool_. Again, you should perform these kinds of operations in plain Python.

## Supported Syntax

These operations are supported by pandas.eval():

- Arithmetic operations except for the left shift (<<) and right shift (>>) operators, 
   - the_golden_ratio
- Comparison operations, including chained comparisons, e.g., 2 < df < df2
- Boolean operations, e.g., df < df2 and df3 < df4 or not df_bool
- list and tuple literals, e.g., [1, 2] or (1, 2)
- Attribute access, e.g., df.a
- Subscript expressions, e.g., df[0]
- Simple variable evaluation, e.g., pd.eval('df') (this is not very useful)
- Math functions: sin, cos, exp, log, expm1, log1p, sqrt, sinh, cosh, tanh, arcsin, arccos, arctan, arccosh, arcsinh, arctanh, abs and arctan2.


This Python syntax is not allowed:

- Expressions
    - Function calls other than math functions.
    - is/is not operations
    - if expressions
    - lambda expressions
    - list/set/dict comprehensions
    - Literal dict and set expressions
    - yield expressions
    - Generator expressions
    - Boolean expressions consisting of only scalar values
- Statements
    - Neither simple nor compound statements are allowed. This includes things like for, while, and if.

In [3]:
import pandas as pd
import numpy as np


In [4]:
nrows, ncols = 20000, 100

df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows, ncols)) for _ in range(4)]

### Adding multiple DataFrames

Speedup of 2x

In [5]:
%timeit df1 + df2 + df3 + df4

51 ms ± 6.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [6]:
%timeit pd.eval('df1 + df2 + df3 + df4')

18.9 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


### Boolean Operation

Speedup of 3x

In [7]:
%timeit (df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)

73.8 ms ± 7.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [8]:
%timeit pd.eval('(df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)')

21 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


### 