# High Performance 

mask = (x > 0.5) % (y < 0.5)

ex:
df = df[df[mask]]
intermediate variables in memory

```python
tmp1 (x > 0.5)
tmp2 (y < 0.5)
mask = tmp1 & tmp2
```

Can use pd.eval("") -> performs elementwise directly using numexpr

Good for compound expressions

In [2]:
import pandas as pd
import numpy as np

nrows, ncols = 1000000, 100
df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows,ncols)) for _ in range(4)]
df1.head()

KeyboardInterrupt: 

In [11]:
%timeit df1 + df2 +df3 +df4
%timeit pd.eval("df1 + df2 +df3 +df4")

1.62 s ± 603 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
334 ms ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [12]:
plain = df1 +df2+df3+df4
sum_eval = pd.eval("df1 + df2 +df3 +df4")

sum_eval.equals(plain)

True

In [15]:
#df.eval()
rolls = pd.DataFrame(np.random.randint(1,6, (6,3)), columns = ["Die1", "Die2", "Die3"])
rolls.eval("Sum = Die1 + Die2 + Die3", inplace = True)
rolls

Unnamed: 0,Die1,Die2,Die3,Sum
0,5,4,2,11
1,1,2,3,6
2,5,1,5,11
3,4,5,2,11
4,3,2,5,10
5,2,5,5,12


In [22]:
# use variables
high = 11 
rolls.eval("Winner = Sum > @high", inplace = True)
rolls

Unnamed: 0,Die1,Die2,Die3,Sum,Winner
0,5,4,2,11,False
1,1,2,3,6,False
2,5,1,5,11,False
3,4,5,2,11,False
4,3,2,5,10,False
5,2,5,5,12,True


In [23]:
#traditional way
rolls[rolls["Sum"]<= high]

Unnamed: 0,Die1,Die2,Die3,Sum,Winner
0,5,4,2,11,False
1,1,2,3,6,False
2,5,1,5,11,False
3,4,5,2,11,False
4,3,2,5,10,False


# Query 

In [24]:
rolls.query("Sum <= @high")

Unnamed: 0,Die1,Die2,Die3,Sum,Winner
0,5,4,2,11,False
1,1,2,3,6,False
2,5,1,5,11,False
3,4,5,2,11,False
4,3,2,5,10,False


In [3]:
os = pd.read_csv("athlete_events.csv")
os.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


In [6]:
%timeit os[os["NOC"] == "SWE"]
%timeit os.query("NOC == 'SWE'")

13 ms ± 30.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.13 ms ± 317 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [7]:
%timeit os[os["Height"]>180]
%timeit os.query("Height > 180")

11.3 ms ± 330 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.8 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [10]:
%timeit os[(os["Sex"] == "F") & (os["Height"] > 180) & (os["NOC"] == "SWE")]
%timeit os.query("Sex =='F' & Height > 180 & NOC == 'SWE'")

22.9 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
10.1 ms ± 95.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
