In [26]:
# Enterprise Search Cookbook

## Enterprise Search is a foundation for building Retrieval-Augmented Generation (RAG) pipelines. It provides accurate answers based on your documents and offers a simple API for indexing and querying document collections.

In [27]:
## 2. Setup and Prerequisites

# First, let's set up our environment:
# Ensure you have conda installed in your system before running this cell.

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 23.5.2
  latest version: 24.9.1

Please update conda by running

    $ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

     conda install conda=24.9.1



## Package Plan ##

  environment location: /home/deval/Document/Work/miniconda3/envs/es_env

  added / updated specs:
    - python=3.9


The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main 
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 
  ca-certificates    pkgs/main/linux-64::ca-certificates-2024.9.24-h06a4308_0 
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 
  libffi             pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 
  libgomp            pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 
  li

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [1]:
## 3. Configuration

# Let's set up the configuration:

# !cp .env.example .env

# Edit .env file with your settings
# You can use !echo "KEY=VALUE" >> .env to add settings

# Start Redis and Qdrant services
!docker-compose -f docker/docker-compose.yml down
!docker-compose -f docker/docker-compose.yml up -d redis qdrant

# Setup LLM (Ollama example)
!docker-compose -f docker/docker-compose-ollama.yml up -d

Stopping redis_doc_store     ... 
Stopping qdrant_vector_store ... 
Removing redis_doc_store     ... 
Removing qdrant_vector_store ... 
[1BRemoving network docker_default32mdone[0m
Creating network "docker_default" with the default driver
Creating qdrant_vector_store ... 
Creating redis_doc_store     ... 
ollama is up-to-date


In [2]:
## 4. Configuring and Running the Pipeline

# First, let's update some important configuration settings
from llamasearch.settings import config
from llamasearch.pipeline import Pipeline, setup_global_embed_model

# Override important settings
config.application.data_path = "./data/test_docs/"  # Path to your documents
config.vector_store_config.collection_name = "my_test_collection"
# config.embedding.model = "sentence-transformers/all-MiniLM-L6-v2"  # A smaller, faster model for testing
# config.vector_store_config.vector_size = 384 # Update vector dims to match the embed model dims

# Set up the global embedding model
global_embed_model = setup_global_embed_model(config)

# Now, let's run the pipeline:
import asyncio
import nest_asyncio

async def run_pipeline():
    tenant_id = "test_tenant"
    pipeline = Pipeline(config, tenant_id, global_embed_model)
    await pipeline.setup()
    return pipeline

# Use this for Jupyter notebooks
nest_asyncio.apply()

pipeline = asyncio.get_event_loop().run_until_complete(run_pipeline())

print("Pipeline setup complete. Ready for queries!")



Path Configuration:
BASE_PATH:     /home/deval/Documents/Work/Deval/ES/LlamaSearch_final/LlamaSearch
--------------------------------------------------------------------------------
Resolved Paths:
Config File:   /home/deval/Documents/Work/Deval/ES/LlamaSearch_final/LlamaSearch/config/config.dev.yaml
Data Path:     /home/deval/Documents/Work/Deval/ES/LlamaSearch_final/LlamaSearch/data/test_docs
Log Directory: /home/deval/Documents/Work/Deval/ES/LlamaSearch_final/LlamaSearch/data/app/logs




  from .autonotebook import tqdm as notebook_tqdm
2024-10-14 14:32:11,922 - [32m[INFO][0m - [37mUsing embedding model: Alibaba-NLP/gte-Qwen2-1.5B-instruct[0m
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  3.18it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-10-14 14:32:19,306 - [32m[INFO][0m - [37mRunning model ajindal/llama3.1-storm:8b on http://localhost:11434[0m
2024-10-14 14:32:19,307 - [32m[INFO][0m - [37mMulti tenancy: True[0m
2024-10-14 14:32:19,307 - [32m[INFO][0m - [37mSetting up Qdrant index...[0m
2024-10-14 14:32:19,335 - [36m[DEBUG][0m - [36mCollection names: [][0m
2024-10-14 14:32:19,335 - [32m[INFO][0m - [37mCollection 

Creating vector store for collection my_test_collection


2024-10-14 14:32:20,842 - [32m[INFO][0m - [37mQdrant index setup completed.[0m
2024-10-14 14:32:20,843 - [32m[INFO][0m - [37mSetting up Docstore...[0m
2024-10-14 14:32:20,843 - [32m[INFO][0m - [37mDocstore setup completed.[0m
2024-10-14 14:32:20,844 - [32m[INFO][0m - [37mSetting up Parser...[0m
2024-10-14 14:32:21,036 - [32m[INFO][0m - [37mParser setup completed.[0m
2024-10-14 14:32:21,037 - [32m[INFO][0m - [37mSetting up Documents...[0m
2024-10-14 14:32:22,044 - [32m[INFO][0m - [37mDocuments setup completed.[0m
2024-10-14 14:32:22,044 - [32m[INFO][0m - [37mSetting up Index creation...[0m
2024-10-14 14:32:22,045 - [32m[INFO][0m - [37mIndex creation setup completed.[0m
2024-10-14 14:32:22,046 - [32m[INFO][0m - [37mSetting up Ingestion pipeline...[0m
2024-10-14 14:32:22,071 - [32m[INFO][0m - [37mIngesting 43 nodes for 38 chunks[0m
2024-10-14 14:32:22,072 - [32m[INFO][0m - [37mAdding nodes to index...[0m
Generating embeddings: 100%|█████████

Pipeline setup complete. Ready for queries!


In [3]:
## 5. Querying and Results

# Let's perform a query:

async def perform_query(pipeline, query):
    response = await pipeline.perform_query_async(query)
    return response

query = "How does attention mechanism work in transformer architecture?"
response = asyncio.run(perform_query(pipeline, query))

print("Query:", query)
print("Response:", response)

Query: How does attention mechanism work in transformer architecture?
Response: The attention mechanism works by computing a weighted sum of values based on a compatibility function between a query and a set of key-value pairs. In the Transformer architecture, this is done using scaled dot-product attention, where the input consists of queries and keys of dimension dk, and values of dimension dv. The attention function is computed as softmax(QKT√dk)V, where Q, K, and V are matrices of queries, keys, and values, respectively.


In [4]:
# Pretty print the context
pipeline.pretty_print_context(response)


--- Document Information ---
+-------------------------------+-----------------+--------------------------------------+
| File Name                     | Last Modified   | Doc ID                               |
| attention_is_all_you_need.pdf | 2024-10-04      | aa1f72b2-3b1d-444f-b367-adb17b8fb29b |
+-------------------------------+-----------------+--------------------------------------+


In [5]:
# Cleanup
asyncio.run(pipeline.cleanup())

print("Pipeline cleaned up.")

Pipeline cleaned up.
