Speedtest with SBERT embedding on different machine

In [1]:
import pandas as pd
import numpy as np

from pathlib import Path

In [2]:
# load the dataset

dataset_path = Path('../../dataset/topic_modelling/top_10_games/00_Terraria.pkl')

dataset = pd.read_pickle(dataset_path)

dataset.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Index: 75499 entries, 57735 to 133233
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   index         75499 non-null  int64 
 1   app_id        75499 non-null  int64 
 2   app_name      75499 non-null  object
 3   review_text   75499 non-null  object
 4   review_score  75499 non-null  int64 
 5   review_votes  75499 non-null  int64 
dtypes: int64(4), object(2)
memory usage: 4.0+ MB


In [3]:
X = dataset['review_text'].values
X = X[:5000]        # take 10K as an example

In [4]:
import platform
import torch

if platform.system() == 'Linux' or platform.system() == 'Windows':
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
else:
    device = torch.device('mps')

print(device)

cpu


In [6]:
import warnings

def check_max_local_length(max_seq_length, texts):
    max_local_length = np.max([len(t.split()) for t in texts])
    if max_local_length > max_seq_length:
        warnings.simplefilter("always", DeprecationWarning)
        warnings.warn(
            f"the longest document in your collection has {max_local_length} words, the model instead "
            f"truncates to {max_seq_length} tokens."
        )

In [5]:
# load sbert module and test embedding

import warnings
from sentence_transformers import SentenceTransformer

BATCH_SIZE = 16

model = SentenceTransformer('all-MiniLM-L6-v2')     # our default model

check_max_local_length(model.max_seq_length, X)

X_embeddings = np.array(
    model.encode(X, batch_size=BATCH_SIZE, show_progress_bar=True)
)

  from .autonotebook import tqdm as notebook_tqdm
Batches: 100%|██████████| 313/313 [01:20<00:00,  3.87it/s]


Windows CPU only observation (Batch size = 16)
- Beginning is slow, around 1-2s/it
- But then it speeds up to 4-6it/s
- Seems depending on the length of the reviews

Overall is 02:27 (can be as low as 01:20) for 5000 reviews, on a CPU only machine
- Think the result is okay if deploying in the VM

Memory is not an issue, running this notebook in WSL only consumes <=3GB ram before executing the cell above.  
It can be executed on a 8GB WSL environment, with the actual ram usage around 2-3GB

In [7]:
# load sbert module and test embedding

import warnings
from sentence_transformers import SentenceTransformer

BATCH_SIZE = 1      # intensionally set to 1 to test with real-time response

model = SentenceTransformer('all-MiniLM-L6-v2')     # our default model

check_max_local_length(model.max_seq_length, X)

X_embeddings = np.array(
    model.encode(X, batch_size=BATCH_SIZE, show_progress_bar=True)
)

Batches: 100%|██████████| 5000/5000 [01:56<00:00, 42.86it/s] 


If batch size = 1, around 15 reviews per sec can be processed on the 8GB, CPU only, WSL env. (shorter reviews -> 35 reviews per second)

During middle to end (with cache proximity ?, or just the reviews are shorter?), the speed can up to 80-90 reviews per sec.

In [8]:
# average length of the reviews
print('Average length:', np.mean([len(t.split()) for t in X]))
print()

# median length of the reviews
print('Median length:', np.median([len(t.split()) for t in X]))
print()

Average length: 61.839

Median length: 25.0

