In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings

In [3]:
loader = TextLoader(
    file_path="E:/Downloads/Code Basics Resume Project Challenge/metadata.txt"
)

document = loader.load()
document

[Document(metadata={'source': 'E:/Downloads/Code Basics Resume Project Challenge/metadata.txt'}, page_content='Engineering Data: \n\n- Documents FinSolveâ€™s complete technical architecture and engineering processes.\n- Includes microservices, CI/CD pipelines, security models, and compliance (GDPR, DPDP, PCI-DSS).\n- Covers development standards, DevOps practices, monitoring, and future tech roadmap (AI, blockchain).\n- Owned by the Engineering Team; updated quarterly.\n- Access is restricted to Engineering Team and C-Level Executives due to high sensitivity.\n- Serves as a reference for audits, onboarding, scaling, and system maintenance.\n\n\nFinance Department Data\n\n- Documents FinSolveâ€™s quarterly financial performance for the year 2024.\n- Includes revenue, income, gross margin, marketing spend, vendor costs, and cash flow data.\n- Provides detailed expense breakdowns and risk mitigation strategies for each quarter.\n- Owned by the Finance Team; updated quarterly.\n- Access is

In [9]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size = 300,
    chunk_overlap = 50
)

splitted_doc = splitter.split_documents(document)
splitted_doc

[Document(metadata={'source': 'E:/Downloads/Code Basics Resume Project Challenge/metadata.txt'}, page_content='Engineering Data:'),
 Document(metadata={'source': 'E:/Downloads/Code Basics Resume Project Challenge/metadata.txt'}, page_content='- Documents FinSolveâ€™s complete technical architecture and engineering processes.\n- Includes microservices, CI/CD pipelines, security models, and compliance (GDPR, DPDP, PCI-DSS).\n- Covers development standards, DevOps practices, monitoring, and future tech roadmap (AI, blockchain).'),
 Document(metadata={'source': 'E:/Downloads/Code Basics Resume Project Challenge/metadata.txt'}, page_content='- Owned by the Engineering Team; updated quarterly.\n- Access is restricted to Engineering Team and C-Level Executives due to high sensitivity.\n- Serves as a reference for audits, onboarding, scaling, and system maintenance.'),
 Document(metadata={'source': 'E:/Downloads/Code Basics Resume Project Challenge/metadata.txt'}, page_content='Finance Departm

### Ollama Embeddings:

Ollama Embeddings are open source embedding models, that, one can download and fine-tune it. These embeddings are used for llm's present in Ollama. Here, I am going to use one of my favourite embedding model "nomic-embed-text" from Ollama

In [12]:
embeddings = OllamaEmbeddings(model="nomic-embed-text")
embed_docs = embeddings.embed_documents(splitted_doc)

In [16]:
embed_docs

[[0.49308913946151733,
  0.7686890363693237,
  -2.7216639518737793,
  -1.0686156749725342,
  0.9821287393569946,
  -0.2816486358642578,
  0.14071878790855408,
  0.12401368468999863,
  -0.5363777279853821,
  -0.22546806931495667,
  -0.12855255603790283,
  0.20276857912540436,
  2.1535391807556152,
  0.5416999459266663,
  -0.16100960969924927,
  0.4835439920425415,
  -0.1498449146747589,
  -1.4101601839065552,
  0.24420836567878723,
  0.48189520835876465,
  0.19165897369384766,
  -0.7227255702018738,
  -0.6575222611427307,
  -0.565708577632904,
  2.4482409954071045,
  0.15589101612567902,
  -0.05922819301486015,
  -0.3083973228931427,
  -0.9064579010009766,
  -0.6021285057067871,
  0.37460342049598694,
  -0.1490212082862854,
  0.3774639964103699,
  -0.2091749757528305,
  -1.6450741291046143,
  -0.9560970664024353,
  0.2652163803577423,
  0.1831929087638855,
  -1.084469199180603,
  0.32526153326034546,
  0.8134956955909729,
  -0.7630727291107178,
  -0.9329143166542053,
  -0.35822945833206

### ChromaDB :

Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.

We can save the vector databases locally by passing the parameter 'persist_directory'

In [18]:
from langchain_chroma import Chroma
vectordb = Chroma.from_documents(splitted_doc,embeddings, persist_directory="./vectordb_fin_metadata")
vectordb

<langchain_chroma.vectorstores.Chroma at 0x1976c90a3c0>

In [15]:
vectordb.similarity_search(query="how many departments are there in finsolve technologies?")[0].page_content

'- Documents FinSolveâ€™s complete technical architecture and engineering processes.\n- Includes microservices, CI/CD pipelines, security models, and compliance (GDPR, DPDP, PCI-DSS).\n- Covers development standards, DevOps practices, monitoring, and future tech roadmap (AI, blockchain).'