# Mock Vector Db Handler

This class is a mock handler for simulating a vector database, designed primarily for testing and development scenarios.
It offers functionalities such as text embedding, hierarchical navigable small world (HNSW) search,
and basic data management within a simulated environment resembling a vector database.


In [1]:
import sys
sys.path.append('../')
from python_modules.mock_vector_database import MockVecDbHandler

  from .autonotebook import tqdm as notebook_tqdm


## Usage examples

The examples contain:
1. Basic data insertion and retrieval
2. Text embedding and searching
3. Advanced filtering and removal
4. Testing the HNSW search algorithm
5. Simulating database connection and persistence
6. Error handling and tolerance
7. Logging and monitoring
8. Customization and configuration


### 1. Basic Data Insertion and Retrieval

In [8]:
# Initialization
handler = MockVecDbHandler()
# Initialize empty database
handler.establish_connection()

# Insert Data
values_dict = {
    "record1": {"text": "Sample text 1"},
    "record2": {"text": "Sample text 2"}
}
handler.insert_values_dict(values_dict, "text")

# Retrieve Data
handler.filter_keys(subkey="text", subvalue="Sample text 1")
handler.search_database_keys(query='text')
results = handler.get_dict_results(return_keys_list=["text"])
print(results)


Load pretrained SentenceTransformer: all-MiniLM-L6-v2
Use pytorch device: cpu
Batches: 100%|██████████| 1/1 [00:00<00:00, 71.39it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 66.79it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 63.71it/s]

[{'text': 'Sample text 1'}]





### 2. Text Embedding and Searching

In [9]:
# Text Embedding
query = "Sample query"
embedded_query = handler.embed(query)

# Search Database
handler.search_database(query)
search_results = handler.get_dict_results(return_keys_list=["text"])

# Display Results
print(search_results)


Batches: 100%|██████████| 1/1 [00:00<00:00, 74.06it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 79.75it/s]

[{'text': 'Sample text 1'}]





### 3. Advanced Filtering and Removal

In [10]:
# Advanced Filtering
filter_criteria = {"text": "Sample text 1"}
handler.filter_database(filter_criteria)
filtered_data = handler.filtered_data
print(filtered_data)

# Data Removal
handler.remove_from_database(filter_criteria)
print(handler.data)


{'record1': {'text': 'Sample text 1', 'embedding': array([-1.71357505e-02,  5.65761924e-02, -3.18394508e-03,  2.31849682e-02,
        2.30688341e-02,  2.13305708e-02,  1.15358695e-01,  6.18852377e-02,
        2.31766216e-02, -2.92176381e-02,  3.87054347e-02, -3.84811871e-02,
        3.04676872e-02, -6.28629774e-02,  3.79787497e-02,  3.48565355e-02,
        2.50332840e-02, -3.68193239e-02, -5.39114065e-02, -2.92210300e-02,
        3.43315825e-02,  3.51019651e-02,  5.80133796e-02,  3.76393311e-02,
        1.97239481e-02,  5.42244501e-02, -1.98040102e-02,  6.15007989e-02,
        8.23720768e-02, -5.01871035e-02,  1.34406844e-03,  4.03296947e-02,
        1.24388665e-01,  2.76857968e-02,  9.78782177e-02, -2.07473040e-02,
       -5.46452776e-02,  6.28209934e-02, -3.98670044e-03,  9.05148163e-02,
       -2.89950147e-03, -5.35925590e-02,  2.18394175e-02, -8.38834047e-03,
        6.29748777e-02, -5.68507612e-02, -4.83271368e-02,  7.60734454e-02,
        3.54853310e-02,  7.98084363e-02, -8.19146

### 4. Testing the HNSW Search Algorithm

In [6]:
# Insert Data with Embeddings
# Assuming embeddings are pre-calculated and stored in 'embeddings'
data_with_embeddings = {"record1": {"embedding": embeddings[0]}, "record2": {"embedding": embeddings[1]}}
handler.data = data_with_embeddings

# HNSW Search
query_embedding = embeddings[0]  # Example query embedding
labels, distances = handler.hnsw_search(query_embedding, np.array(embeddings), k=1)
print(labels, distances)


NameError: name 'embeddings' is not defined

### 5. Simulating Database Connection and Persistence

In [11]:
# Establish Connection
handler.establish_connection()

# Change and Persist Data
handler.insert_values_dict({"record3": {"text": "New sample text"}}, "text")
handler.save_data()

# Reload Data
handler.establish_connection()
print(handler.data)


Batches: 100%|██████████| 1/1 [00:00<00:00, 74.53it/s]

{'record1': {'text': 'Sample text 1', 'embedding': array([-1.71357505e-02,  5.65761924e-02, -3.18394508e-03,  2.31849682e-02,
        2.30688341e-02,  2.13305708e-02,  1.15358695e-01,  6.18852377e-02,
        2.31766216e-02, -2.92176381e-02,  3.87054347e-02, -3.84811871e-02,
        3.04676872e-02, -6.28629774e-02,  3.79787497e-02,  3.48565355e-02,
        2.50332840e-02, -3.68193239e-02, -5.39114065e-02, -2.92210300e-02,
        3.43315825e-02,  3.51019651e-02,  5.80133796e-02,  3.76393311e-02,
        1.97239481e-02,  5.42244501e-02, -1.98040102e-02,  6.15007989e-02,
        8.23720768e-02, -5.01871035e-02,  1.34406844e-03,  4.03296947e-02,
        1.24388665e-01,  2.76857968e-02,  9.78782177e-02, -2.07473040e-02,
       -5.46452776e-02,  6.28209934e-02, -3.98670044e-03,  9.05148163e-02,
       -2.89950147e-03, -5.35925590e-02,  2.18394175e-02, -8.38834047e-03,
        6.29748777e-02, -5.68507612e-02, -4.83271368e-02,  7.60734454e-02,
        3.54853310e-02,  7.98084363e-02, -8.19146




### 6. Error Handling and Tolerance

In [16]:
# Simulate Errors
# Let's assume an incorrect embedding URL or model name to simulate an error
handler.embeddings_url = "http://invalid_url"
handler.model_type = 'openAI'
try:
    handler.embed("Test text")
except Exception as e:
    print("Error occurred:", e)


Error occurred: An exception has occurred during embedding!


### 7. Logging and Monitoring

In [20]:
# Monitoring Operations
# Perform an operation and check logs
handler.model_type = 'sentence_transformer'
handler.insert_values_dict({"record4": {"text": "Another text"}}, "text")


Batches: 100%|██████████| 1/1 [00:00<00:00, 61.97it/s]


### 8. Customization and Configuration

In [21]:
# Custom Settings
custom_handler = MockVecDbHandler(st_model_name='distilbert-base-nli-mean-tokens')

# Operation with Custom Settings
custom_handler.insert_values_dict({"record5": {"text": "Custom model text"}}, "text")
custom_handler.search_database("Custom model text")
results = custom_handler.get_dict_results(return_keys_list=["text"])
print(results)


Load pretrained SentenceTransformer: distilbert-base-nli-mean-tokens
Downloading .gitattributes: 100%|██████████| 690/690 [00:00<00:00, 2.16MB/s]
Downloading 1_Pooling/config.json: 100%|██████████| 190/190 [00:00<00:00, 434kB/s]
Downloading README.md: 100%|██████████| 3.99k/3.99k [00:00<00:00, 19.4MB/s]
Downloading config.json: 100%|██████████| 550/550 [00:00<00:00, 2.75MB/s]
Downloading (…)ce_transformers.json: 100%|██████████| 122/122 [00:00<00:00, 552kB/s]
Downloading pytorch_model.bin: 100%|██████████| 265M/265M [01:03<00:00, 4.19MB/s] 
Downloading (…)nce_bert_config.json: 100%|██████████| 53.0/53.0 [00:00<00:00, 118kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 112/112 [00:00<00:00, 636kB/s]
Downloading tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 1.07MB/s]
Downloading tokenizer_config.json: 100%|██████████| 450/450 [00:00<00:00, 1.08MB/s]
Downloading vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 814kB/s]
Downloading modules.json: 100%|██████████| 22

TypeError: 'NoneType' object is not iterable