# Benchmark (Merge, Save, Load)
[Vector Stores](https://python.langchain.com/docs/integrations/vectorstores)

It turned out that in the chosen vector store only Faiss is able to merge.

In [20]:
from langchain_community.embeddings import GPT4AllEmbeddings
import markdown_splitter as mdS
import os
import subprocess
import tracemalloc

# data/reusables/open-source/open-source-guide-general.md
query = "Where can I find guidance on creating and nurturing an open source project"

embeddings = GPT4AllEmbeddings()

folder_name = "Local"

if not os.path.exists(folder_name):
    os.makedirs(folder_name)

    subprocess.run(["git", "clone", "https://github.com/github/docs.git", folder_name])

docs = mdS.create_db(path=folder_name, glob="**/*.md",split=None)

bert_load_from_file: gguf version     = 2
bert_load_from_file: gguf alignment   = 32
bert_load_from_file: gguf data offset = 695552
bert_load_from_file: model name           = BERT
bert_load_from_file: model architecture   = bert
bert_load_from_file: model file type      = 1
bert_load_from_file: bert tokenizer vocab = 30522


100%|█████████▉| 5951/5956 [00:02<00:00, 2466.25it/s]


In [21]:
half = int(len(docs)/2)

docsPart1 = docs[:half]
docsPart2 = docs[half:]

## FAISS

In [3]:
tracemalloc.start()
print(tracemalloc.get_traced_memory())

# Store
from langchain_community.vectorstores import FAISS as VectorStore
docsearchPart1 = await VectorStore.afrom_documents(docsPart1, embeddings)
docsearchPart2 = await VectorStore.afrom_documents(docsPart2, embeddings)

print(tracemalloc.get_traced_memory())
tracemalloc.stop()

(928, 11633)
(130635632, 151633809)


In [4]:
tracemalloc.start()
print(tracemalloc.get_traced_memory())

# Merge
docsearchPart1.merge_from(docsearchPart2)

print(tracemalloc.get_traced_memory())
tracemalloc.stop()

(1056, 11761)
(1425961, 2242909)


In [6]:
tracemalloc.start()
print(tracemalloc.get_traced_memory())

# Save
docsearchPart1.save_local("faiss_index")

# Load 
new_db = VectorStore.load_local("faiss_index", embeddings)

print(tracemalloc.get_traced_memory())
tracemalloc.stop()

(44103486, 45295178)
(78782387, 87635148)


In [8]:
%%time
for i in range(10):
    new_db.similarity_search(query)

new_db.similarity_search(query)

CPU times: user 2.6 s, sys: 21 ms, total: 2.62 s
Wall time: 699 ms


[Document(page_content="For more information on open source, specifically how to create and grow an open source project, we've created [Open Source Guides](https://opensource.guide/) that will help you foster a healthy open source community."),
 Document(page_content="For more information on open source, specifically how to create and grow an open source project, we've created [Open Source Guides](https://opensource.guide/) that will help you foster a healthy open source community by recommending best practices for creating and maintaining repositories for your open source project."),
 Document(page_content='There are a variety of ways that you can contribute to open source projects.', metadata={'Header 2': 'Validating an issue or pull request'}),
 Document(page_content='- "[Your Code of Conduct](https://opensource.guide/code-of-conduct/)" from the Open Source Guides\n- "[Building Welcoming Communities](https://opensource.guide/building-community/)" from the Open Source Guides\n- "[Lea