## Requirements

#### VSCode extensions (needs "Install in Dev Container", check extensions after starting Dev Container): Jupyter, Python.
#### Use Kernel: Python3.8+

## Setup

In [1]:
import os
from sys import path as sys_path
from shutil import rmtree

sys_path.insert(0, '.') # Import repo's evadb

from evadb import connect
from evadb.mojo import MOJO_BUILTINS_PATH
from evadb.functions.function_bootstrap_queries import Similarity_function_query, Text_feat_function_query
if os.path.exists("evadb_data"):
    rmtree("evadb_data", ignore_errors=True)
print("⏳ Establishing evadb connection...")
cursor = connect().cursor()
cursor.query(Similarity_function_query).execute()

# To ensure the model doesn't download during the benchmark, but we isolate this so it doesn't affect the benchmark for Python
os.system('python3 -c \'__import__("sentence_transformers").SentenceTransformer("all-MiniLM-L6-v2").encode(["hi"])\'')
pass

⏳ Establishing evadb connection...


Downloading: "http://ml.cs.tsinghua.edu.cn/~chenxi/pytorch-models/mnist-b07bb66b.pth" to /root/.cache/torch/hub/checkpoints/mnist-b07bb66b.pth


  0%|          | 0.00/1.03M [00:00<?, ?B/s]

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
Downloading (…)e9125/.gitattributes: 100%|██████████| 1.18k/1.18k [00:00<00:00, 2.35MB/s]
Downloading (…)_Pooling/config.json: 100%|██████████| 190/190 [00:00<00:00, 423kB/s]
Downloading (…)7e55de9125/README.md: 100%|██████████| 10.6k/10.6k [00:00<00:00, 19.1MB/s]
Downloading (…)55de9125/config.json: 100%|██████████| 612/612 [00:00<00:00, 1.37MB/s]
Downloading (…)ce_transformers.json: 100%|██████████| 116/116 [00:00<00:00, 179kB/s]
Downloading (…)125/data_config.json: 100%|██████████| 39.3k/39.3k [00:00<00:00, 41.5MB/s]
Downloading pytorch_model.bin: 100%|██████████| 90.9M/90.9M [00:00<00:00, 96.3MB/s]
Downloading (…)nce_bert_config.json: 100%|██████████| 53.0/53.0 [00:00<00:00, 106kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 112/112 [00:00<00:00, 237kB/s]
Downloading (…)e9125/tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 15.3MB/

## Benchmark: Load and Setup time: Python vs Mojo

### Python

In [2]:
%%time
cursor.query(Text_feat_function_query).execute()
pythonQ = cursor.query("SELECT Similarity(SentenceFeatureExtractor('hi').features, SentenceFeatureExtractor('bye').features)")

CPU times: user 2.24 s, sys: 1.34 s, total: 3.59 s
Wall time: 1.67 s


### Mojo

In [3]:
%%time
cursor.query(f"CREATE OR REPLACE FUNCTION SentenceTransformerFeatureExtractor IMPL '{MOJO_BUILTINS_PATH}'").execute()
mojoQ = cursor.query("SELECT Similarity(SentenceTransformerFeatureExtractor('hi').features, SentenceTransformerFeatureExtractor('bye').features)")

CPU times: user 80.7 ms, sys: 1.74 ms, total: 82.4 ms
Wall time: 3.65 s


## Benchmark: Function Execution Time: Python vs Mojo

### Python

In [4]:
%%timeit
pythonQ.df()

698 ms ± 25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Mojo

In [5]:
%%timeit
mojoQ.df()

148 ms ± 5.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Mojo Features: multiple functions per binary, support for non-builtin functions

In [6]:
%%time
cursor.query(f"CREATE OR REPLACE FUNCTION CustomSentenceTransformerFeatureExtractor1 IMPL './mojo-demo/CustomSourceSTFES'").execute()

CPU times: user 73.2 ms, sys: 522 µs, total: 73.8 ms
Wall time: 3.6 s


<evadb.models.storage.batch.Batch at 0x7f7790e2e650>

In [7]:
%%time
cursor.query(f"CREATE OR REPLACE FUNCTION CustomSentenceTransformerFeatureExtractor2 IMPL './mojo-demo/CustomSourceSTFES'").execute()

CPU times: user 50.3 ms, sys: 3.42 ms, total: 53.7 ms
Wall time: 371 ms


<evadb.models.storage.batch.Batch at 0x7f7790e2eef0>

In [8]:
%%time
cursor.query(f"SELECT Similarity(CustomSentenceTransformerFeatureExtractor1('hi').features, CustomSentenceTransformerFeatureExtractor2('bye').features)").df()

CPU times: user 328 ms, sys: 0 ns, total: 328 ms
Wall time: 257 ms


Unnamed: 0,distance
0,1.262829


## Stop all Mojo processes (done by atexit in regular python scripts)

In [9]:
from evadb.mojo import MojoController
MojoController.stop_all()

## Stop EvaDB

In [10]:
cursor.close()
if os.path.exists("evadb_data"):
    rmtree("evadb_data", ignore_errors=True)

# Benchmark Conclusion

## Function Load & Setup

### Function load and setup is faster in Python (1.67s vs. 3.65s), as some of the imports are shared. However, Mojo supports multiple functions in a single binary, and it is only the first function from each binary that has a lot of overhead (for ./mojo-demo/CustomSourceSTFES, the first function load took 3.6s, but the second took 371ms which is primarily the model load)

## Function Execution

### Function execution was 5x faster in Mojo (698 ms ± 25 ms vs. 148 ms ± 5.43 ms), despite the fact that for Mojo the input and output had to be serialized and deserialized on each end. This likely suggests that there might be room for improvement in the Python implementation, as it should be at least as fast (because most of the execution in the Mojo process actually happens through CPython)