# Efficiency Techniques Application

This notebook will go through some of our functions using Cython-optimized functions and compare the running speed with the original ones.

In [47]:
import re
from typing import List

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from scipy.stats import ttest_1samp
from scipy.stats import ttest_ind
from scipy.stats import wilcoxon
from scipy.stats import mannwhitneyu
from sklearn.utils import resample

In [61]:
import efficiency.py_efficiency as ef
import final_func as fn

### Comparison

- Without Cython: fn
- With Cython: ef

In [78]:
# Load data
pit = pd.read_csv('data/pit_stops.csv')
results = pd.read_csv('data/results.csv')
status = pd.read_csv('data/status.csv')
lap = pd.read_csv("data/lap_times.csv")

### 1. function: merge_df()

In [63]:
# Before
%timeit -r 100 -n 1 fn.merge_data([pit, results, status])

16.5 ms ± 1.24 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [64]:
# After
%timeit -r 100 -n 1 ef.merge_data([pit, results, status])

16.8 ms ± 901 µs per loop (mean ± std. dev. of 100 runs, 1 loop each)


### 2. function: process_data()

In [65]:
merge_df = fn.merge_data([pit, results, status])

In [66]:
# Before
%timeit -r 100 -n 1 ef.process_data(merge_df)

187 ms ± 10.2 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [67]:
# After
%timeit -r 100 -n 1 fn.process_data(merge_df)

183 ms ± 4.05 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [68]:
merge_df = fn.process_data(merge_df)

### 3. function: pit_stop_group()

In [69]:
# Before
%timeit -r 100 -n 1 ef.pit_stop_group(merge_df)

4.99 ms ± 499 µs per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [70]:
# After
%timeit -r 100 -n 1 fn.pit_stop_group(merge_df)

5.17 ms ± 654 µs per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [71]:
# Before
%timeit -r 100 -n 1 fn.pit_stop_group(merge_df, by='total_stops')

4.12 ms ± 745 µs per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [72]:
# After
%timeit -r 100 -n 1 ef.pit_stop_group(merge_df, by='total_stops')

4.73 ms ± 1.01 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [73]:
df_group = fn.pit_stop_group(merge_df, by='total_stops')

### 4. function: front_back_division()

In [74]:
# Before
%timeit -r 100 -n 1 fn.front_back_division(merge_df, top_num=5)

13.4 ms ± 2 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [75]:
# After
%timeit -r 100 -n 1 ef.front_back_division(merge_df, top_num=5)

13.3 ms ± 1.49 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [76]:
# Before
%timeit -r 100 -n 1 fn.front_back_division(merge_df, select_col='abs_deviation_mean', top_num=5)

4.51 ms ± 627 µs per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [77]:
# After
%timeit -r 100 -n 1 ef.front_back_division(merge_df, select_col='abs_deviation_mean', top_num=5)

5.07 ms ± 1.17 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


### 5. function: lap_data_process()

In [79]:
# Before
%timeit -r 100 -n 1 fn.lap_data_process(results, lap)

1.01 s ± 25.1 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


In [80]:
# After
%timeit -r 100 -n 1 ef.lap_data_process(results, lap)

894 ms ± 68.8 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)


### Conclusion

Our project does not need a lot of calculations, and is already very fast. After Cython-optimizing the functions, we discover that the functions are not faster after the 'optimization'. Instead, many of them are slightly slower than the original ones.