#**Need:**
**mask = (x > 0.5) & (y <0.5)**

Numpu execution will be
tmp1 = (x > 0.5)
tmp2 = (y <0.5)
mask = tmp1 & tmp2

**Execution**

==> for every internal step, explicitely memory will be allocated

==> Computational overhead will be faced


# High-Performance Pandas: eval() and query()

## ``pandas.eval()`` for Efficient Operations

The ``eval()`` function in Pandas uses string expressions to efficiently compute operations using ``DataFrame``s.
For example, consider the following ``DataFrame``s:

In [1]:
import numpy as np
import pandas as pd


In [2]:
nrows, ncols = 100000, 100
rng = np.random.RandomState(42)
df1, df2, df3, df4 = (pd.DataFrame(rng.rand(nrows, ncols))
                      for i in range(4))

To compute the sum of all four ``DataFrame``s using the typical Pandas approach, we can just write the sum:

In [3]:
df1

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
0,0.374540,0.950714,0.731994,0.598658,0.156019,0.155995,0.058084,0.866176,0.601115,0.708073,...,0.119594,0.713245,0.760785,0.561277,0.770967,0.493796,0.522733,0.427541,0.025419,0.107891
1,0.031429,0.636410,0.314356,0.508571,0.907566,0.249292,0.410383,0.755551,0.228798,0.076980,...,0.093103,0.897216,0.900418,0.633101,0.339030,0.349210,0.725956,0.897110,0.887086,0.779876
2,0.642032,0.084140,0.161629,0.898554,0.606429,0.009197,0.101472,0.663502,0.005062,0.160808,...,0.030500,0.037348,0.822601,0.360191,0.127061,0.522243,0.769994,0.215821,0.622890,0.085347
3,0.051682,0.531355,0.540635,0.637430,0.726091,0.975852,0.516300,0.322956,0.795186,0.270832,...,0.990505,0.412618,0.372018,0.776413,0.340804,0.930757,0.858413,0.428994,0.750871,0.754543
4,0.103124,0.902553,0.505252,0.826457,0.320050,0.895523,0.389202,0.010838,0.905382,0.091287,...,0.455657,0.620133,0.277381,0.188121,0.463698,0.353352,0.583656,0.077735,0.974395,0.986211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,0.071979,0.439323,0.188588,0.586705,0.640611,0.662409,0.318503,0.600419,0.609742,0.390592,...,0.122887,0.491140,0.032855,0.567250,0.428673,0.421092,0.021024,0.398596,0.405897,0.869783
99996,0.313411,0.010490,0.469216,0.600825,0.451085,0.496918,0.983128,0.422056,0.719077,0.045588,...,0.072444,0.715574,0.300257,0.087290,0.130703,0.549202,0.287877,0.589258,0.516884,0.254370
99997,0.560873,0.647396,0.043068,0.282439,0.042950,0.346690,0.954034,0.603182,0.447768,0.888498,...,0.880079,0.508377,0.442052,0.621332,0.314942,0.131085,0.697310,0.111705,0.397560,0.988347
99998,0.710115,0.067999,0.611329,0.136199,0.054724,0.018160,0.911428,0.762005,0.245312,0.891027,...,0.249632,0.894231,0.342761,0.844330,0.659797,0.835561,0.117920,0.211202,0.931760,0.296913


In [4]:
%timeit df1 + df2 + df3 + df4

193 ms ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


The same result can be computed via ``pd.eval`` by constructing the expression as a string:

In [5]:
%timeit pd.eval('df1 + df2 + df3 + df4')

104 ms ± 8.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [6]:
df1 + df2

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
0,1.301078,1.333176,1.603463,1.360130,0.484844,1.144815,0.178822,1.225081,1.555577,0.712784,...,0.430059,1.530233,1.691532,0.672755,1.543484,1.294976,0.989557,0.433453,0.730529,0.595565
1,0.746596,1.127358,1.218888,0.828092,1.490152,1.229622,0.429451,0.844914,0.509903,0.220628,...,0.526131,1.029756,1.164077,0.972181,0.573872,0.857131,1.270501,1.094534,1.319478,0.997979
2,1.617828,0.134042,0.254313,1.057008,1.464738,0.661747,0.782577,1.023670,0.848179,0.780149,...,0.187322,0.809664,1.234689,1.156358,0.675640,1.244769,0.911580,0.675087,0.751111,0.747014
3,0.421140,1.442721,1.433321,1.400884,1.307772,1.183608,0.540550,1.248817,0.987035,0.317875,...,1.304103,0.979170,1.216443,0.855481,0.679234,1.852634,1.715034,0.714021,1.256312,1.325709
4,0.898077,1.617197,1.157995,1.466456,1.121863,1.118848,0.857809,0.420576,1.751593,0.579845,...,0.804718,1.606243,0.666652,0.616132,1.108881,1.352141,1.389189,0.387744,1.850711,1.933147
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,1.045469,1.070759,0.909209,0.728634,0.996060,1.300733,0.917771,0.867989,1.324900,1.300132,...,0.843519,0.560771,0.600764,0.941086,0.941878,0.739284,0.375729,1.184349,1.031524,1.763640
99996,1.017933,0.946832,1.180836,1.096137,0.480894,0.704690,1.656817,1.198553,1.590214,0.133691,...,0.708479,1.248276,1.044345,0.829245,1.097878,1.547446,0.920054,0.669320,0.806724,0.637229
99997,1.210628,1.000250,0.157345,1.109083,0.806835,0.515458,1.358310,0.683078,0.617398,1.842429,...,1.232751,1.144176,1.116789,0.911396,0.778269,0.489770,1.649387,1.091178,0.583648,0.988828
99998,1.685482,0.363327,1.254268,0.657311,0.647627,0.234716,1.114813,1.380831,0.436370,1.744830,...,0.478893,1.354087,0.944448,1.696175,1.400087,1.568309,0.534606,0.796227,1.717350,0.458296


**The ``eval()`` version of this expression is about 50% faster (and uses much less memory)**, while **giving the same result**:

In [7]:
np.allclose(df1 + df2 + df3 + df4,
            pd.eval('df1 + df2 + df3 + df4'))

True

### Operations supported by ``pd.eval()``

``pd.eval()`` supports a wide range of operations.

In [8]:
df1, df2, df3, df4, df5 = (pd.DataFrame(rng.randint(0, 1000, (100, 3)))
                           for i in range(5))

#### Arithmetic operators
``pd.eval()`` supports all arithmetic operators. For example:

In [25]:
df1

Unnamed: 0,0,1,2
0,180,112,748
1,447,205,487
2,656,100,98
3,90,450,613
4,529,224,530
...,...,...,...
95,31,787,643
96,984,624,352
97,283,543,751
98,5,142,278


In [10]:
result1 = -df1 * df2 / (df3 + df4) - df5
result2 = pd.eval('-df1 * df2 / (df3 + df4) - df5')
np.allclose(result1, result2)

True

#### Comparison operators
``pd.eval()`` supports all comparison operators, including **chained expressions:**

In [11]:
result1 = (df1 < df2) & (df2 <= df3) & (df3 != df4)
result2 = pd.eval('df1 < df2 <= df3 != df4')
np.allclose(result1, result2)

True

#### Bitwise operators
``pd.eval()`` supports the ``&`` and ``|`` bitwise operators:

In [12]:
result1 = (df1 < 0.5) & (df2 < 0.5) | (df3 < df4)
result2 = pd.eval('(df1 < 0.5) & (df2 < 0.5) | (df3 < df4)')
np.allclose(result1, result2)

True

In addition, it supports the use of the literal ``and`` and ``or`` in Boolean expressions:

In [13]:
result3 = pd.eval('(df1 < 0.5) and (df2 < 0.5) or (df3 < df4)')
np.allclose(result1, result3)

True

#### Object attributes and indices

``pd.eval()`` supports access to object attributes via the ``obj.attr`` syntax, and indexes via the ``obj[index]`` syntax:

In [14]:
result1 = df2.T[0] + df3.iloc[1]
result2 = pd.eval('df2.T[0] + df3.iloc[1]')
np.allclose(result1, result2)

True

#### Other operations
Other operations such as function calls, conditional statements, loops, and other more involved constructs are **currently not implemented** in ``pd.eval()``.
If you'd like to execute these more complicated types of expressions, you can use the Numexpr library itself.

## ``DataFrame.eval()`` for Column-Wise Operations

Just as Pandas has a top-level ``pd.eval()`` function, ``DataFrame``s have an ``eval()`` method that works in similar ways.
The benefit of the ``eval()`` method is that columns can be referred to *by name*.
We'll use this labeled array as an example:

In [None]:
df = pd.DataFrame(rng.rand(1000, 3), columns=['A', 'B', 'C'])
df.head()

Using ``pd.eval()`` as above, we can compute expressions with the three columns like this:

In [16]:
result1 = (df['A'] + df['B']) / (df['C'] - 1)
result2 = pd.eval("(df.A + df.B) / (df.C - 1)")
np.allclose(result1, result2)

True

The ``DataFrame.eval()`` method allows much more succinct evaluation of expressions with the columns:

In [17]:
result3 = df.eval('(A + B) / (C - 1)')
np.allclose(result1, result3)

True

**Notice** here that we treat ***column names as variables*** within the evaluated expression, and the result is what we would wish.

### Assignment in DataFrame.eval()

In addition to the options just discussed, ``DataFrame.eval()``  also allows assignment to any column.
Let's use the ``DataFrame`` from before, which has columns ``'A'``, ``'B'``, and ``'C'``:

In [None]:
df.head()


We can use ``df.eval()`` to create a new column ``'D'`` and assign to it a value computed from the other columns:

In [None]:
df.eval('D = (A + B) / C', inplace=True)
df.head()

In the same way, any existing column can be modified:

In [20]:
df.eval('D = (A - B) / C', inplace=True)
df.head()

Unnamed: 0,A,B,C,D
0,0.375506,0.406939,0.069938,-0.449425
1,0.069087,0.235615,0.154374,-1.078728
2,0.677945,0.433839,0.652324,0.374209
3,0.264038,0.808055,0.347197,-1.566886
4,0.589161,0.252418,0.557789,0.603708


### Local variables in DataFrame.eval()

The ``DataFrame.eval()`` method supports an additional syntax that lets it work with local Python variables.
Consider the following:

In [21]:
column_mean = df.mean(1)
result1 = df['A'] + column_mean
result2 = df.eval('A + @column_mean')
np.allclose(result1, result2)

True

**The ``@`` character here marks a variable name** rather than a *column name*, and lets you efficiently evaluate expressions involving the two "namespaces": the namespace of columns, and the namespace of Python objects.
Notice that this ``@`` character is only supported by the ``DataFrame.eval()`` *method*, not by the ``pandas.eval()`` *function*, because the ``pandas.eval()`` function only has access to the one (Python) namespace.

## DataFrame.query() Method

The ``DataFrame`` has another method based on evaluated strings, called the ``query()`` method.
Consider the following:

In [22]:
result1 = df[(df.A < 0.5) & (df.B < 0.5)]
result2 = pd.eval('df[(df.A < 0.5) & (df.B < 0.5)]')
np.allclose(result1, result2)

True

In [23]:
result2 = df.query('A < 0.5 and B < 0.5')
result2
#np.allclose(result1, result2)

Unnamed: 0,A,B,C,D
0,0.375506,0.406939,0.069938,-0.449425
1,0.069087,0.235615,0.154374,-1.078728
7,0.406639,0.128631,0.160742,1.729526
8,0.020236,0.354904,0.067919,-4.927445
16,0.110796,0.100477,0.561988,0.018362
...,...,...,...,...
984,0.154935,0.096410,0.795801,0.073542
985,0.126921,0.443428,0.859320,-0.368322
988,0.381958,0.058112,0.917827,0.352839
994,0.132644,0.472306,0.778643,-0.436223


In addition to being a **more efficient computation**, compared to the masking expression this is much **easier to read and understand**.
Note that the ``query()`` method also accepts the ``@`` flag to mark local variables:

In [24]:
Cmean = df['C'].mean()
#result1 = df[(df.A < Cmean) & (df.B < Cmean)]
result2 = df.query('A < @Cmean and B < @Cmean')
#np.allclose(result1, result2)

## Performance: When to Use These Functions
These functions should be used to save **computation time and minimize memory use**