# Chapter 4

## Query Enhancement

Improving the quality of data helps with improving the quality of generated response. Another way is to improve the quality of the query seen by the LLM.

We cannot ask the user to provide the query in the best way possible. Many a times the user is not very sure of the query to be asked. Query enhancement as the name suggests, is an intermediate step that uses LLM to enhance the quality of the query. The enhancement can be - 
- making the query gramatically correct
- breaking down a complex query into relevant sub-queries
- extract the intent of the query (this can be passed for formatted answer in case of nefarious queries)
- if you have a chat history, augment the query with past queries and generated answers/retrieved contexts.
- extract keywords (can be about your product or anything related to your application) and pass it along with the query to your LLM

One can imagine many different ways to enhance the quality of the query or extract meaningful stuff from a query.

In [1]:
%load_ext autoreload
%autoreload 2

import asyncio
import json
import pathlib

import cohere
import weave

import wandb

In [2]:
WANDB_ENTITY = "rag-course"
WANDB_PROJECT = "dev"

wandb.require("core")

run = wandb.init(
    entity=WANDB_ENTITY,
    project=WANDB_PROJECT,
    group="Chapter 4",
)


weave_client = weave.init(f"{WANDB_ENTITY}/{WANDB_PROJECT}")

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mayut[0m ([33mrag-course[0m). Use [1m`wandb login --relogin`[0m to force relogin


weave version 0.50.12 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: ayut.
View Weave data at https://wandb.ai/rag-course/dev/weave


We will download the chunked data from chapter 3. This chunking was done using semantic chunking strategy.

In [3]:
# Reload the data from Chapter 3
chunked_artifact = run.use_artifact(
    f"{WANDB_ENTITY}/{WANDB_PROJECT}/chunked_data:latest", type="dataset"
)
artifact_dir = chunked_artifact.download()
chunked_data_file = pathlib.Path(f"{artifact_dir}/documents.jsonl")
chunked_data = list(map(json.loads, chunked_data_file.read_text().splitlines()))
chunked_data[:2]

2024/07/31 20:34:33 [DEBUG] GET https://storage.googleapis.com/wandb-production.appspot.com/rag-course/dev/j8uh2i2o/artifact/961260984/wandb_manifest.json?Expires=1722441873&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=cLbWOq2Bqbyzfxva%2FHyqUeYhhkgmBemIart65TmN0iaaACLVDvz4g2S2%2BPVaEqcUyaDzc3jtn%2FsXWILgtWb3svJpDdMwhpOOikIaE1F7Oqh2xZ1ZJNTWtWPGDiB1amAtnicQfVRkaaJChZPTufW115zUtUAn1wJvxWZ%2Fjy6qXQZ70MyChzCJa%2FJQq5chMjbPAWMiDSuPYXKpQbN6VVaTeM0rgMeOQ8zemGwLWbtm%2F6cy2QeEaL9XEEv0uRvyV2iTMjOBexDSbc66SrYilWYkaj7KTM6siVSLgy%2Fxs6vHgeZSuMpfZGhSSSh7ENMQWEsquivZblfNPDeNA7WHKjDYJQ%3D%3D


[{'cleaned_content': 'Anonymous Mode Are you publishing code that you want anyone to be able to run easily? Use Anonymous Mode to let someone run your code, see a W&B dashboard, and visualize results without needing to create a W&B account first. Allow results to be logged in Anonymous Mode with wandb.init(anonymous="allow") :::info Publishing a paper? Please cite W&B, and if you have questions about how to make your code accessible while using W&B, reach out to us at support@wandb.com.\n::: How does someone without an account see results? If someone runs your script and you have to set anonymous="allow":  Auto-create temporary account: W&B checks for an account that\'s already signed in. If there\'s no account, we automatically create a new anonymous account and save that API key for the session. Log results quickly: The user can run and re-run the script, and automatically see results show up in the W&B dashboard UI.\nThese unclaimed anonymous runs will be available for 7 days. Claim

In our usecase we will use this query enhancement stage to -
- identify the language of the query (our documentation in in English, Japanese and Korean and we want to answer in the language of the query)
- indentify the intent of the query (a user might ask something that is not related to our documentation)
- generate sub-queries (break down the main query into smaller queries) for retrieving more contexts for our LLM.

These additional informations will be used to inform the response generator and improve the retrieval process.

In [4]:
from scripts.query_enhancer import QueryEnhancer
from scripts.utils import display_source

query_enhancer = QueryEnhancer()
display_source(query_enhancer)

Unable to get source code for QueryEnhancer(name=None, description=None). It might be a built-in or compiled object.


In [5]:
response = await query_enhancer.predict("How do I log images in lightning with wandb?")

🍩 https://wandb.ai/rag-course/dev/r/call/ea1eea43-20b2-4d59-b7e1-6a4cd26c7fae


Look at the response below:

- we identified the query to be in English.
- We derived few sub-queries that make sense.
- We classified the intent based on our intent classification "prompt/guides"

In [6]:
response

{'query': 'How do I log images in lightning with wandb?',
 'language': 'en',
 'search_queries': ['wandb lightning log image',
  'wandb log image in lightning module',
  'wandb log image in pytorch lightning',
  'wandb log image',
  'wandb log image in pytorch'],
 'intents': [{'intent': 'integrations',
   'reason': 'The user is asking about logging images in Lightning, which is a specific integration with Weights & Biases. They want to know how to use Weights & Biases with Lightning to log images.'}]}

Our retriever will remain the same. Yes we have 5 sub-queries that we want to retrieve for but we can do so one by one. 

Let us use our BM25 based retriever from Chapter 2 and index our chunked data.

In [8]:
from scripts.retriever import BM25Retriever
retriever = BM25Retriever()
retriever.index_data(chunked_data)

Since we have more information extracted from our query - like the language and the intent of the query, we write `QueryEnhanedResponseGenerator` whihc uses a new system prompt augmented with language and intent information.

Look at line 24.

In [9]:
from scripts.response_generator import QueryEnhanedResponseGenerator
display_source(QueryEnhanedResponseGenerator)


The `QueryEnhancedRAGPipeline` runs through different `search_queries` or sub-queries and retrieve the chunks. It also deduplicate the chunks so that we don't end up sending the same chunk twice.

Note line 23-27. We check if the extracted intent is not in a list of intents to avoid. If that's the case, we do not do retrieval and can return a formatted answer like - "This query is not related to Weights and Biases. Can you please ask again?"

In [10]:
from scripts.rag_pipeline import QueryEnhancedRAGPipeline
display_source(QueryEnhancedRAGPipeline)

Let us initialize the response generator and our RAG pipeline and run in on one query.

In [13]:
# lets add the new prompt
QUERY_ENHANCED_PROMPT = open("prompts/query_enhanced_system.txt").read()

response_generator = QueryEnhanedResponseGenerator(
    model="command-r-plus", prompt=QUERY_ENHANCED_PROMPT, client=cohere.AsyncClient()
)

In [15]:
rag_pipeline = QueryEnhancedRAGPipeline(
    query_enhancer=query_enhancer,
    retriever=retriever,
    response_generator=response_generator,
)

response = await rag_pipeline.predict("How do I log images in lightning with wandb?")
response

## Evaluate and Compare

In [None]:
eval_dataset = weave.ref(
    "weave:///rag-course/dev/object/Dataset:9O0EmmPINmYjgbXW3kucVrDxlTUQJQs0fVZYJj2mtOk"
).get()

In [None]:
from scripts.response_metrics import ALL_METRICS as RESPONSE_METRICS

response_evaluations = weave.Evaluation(
    name="Response_Evaluation",
    dataset=eval_dataset,
    scorers=RESPONSE_METRICS,
    preprocess_model_input=lambda x: {"query": x["question"]},
)
query_enhanced_response_scores = asyncio.run(
    response_evaluations.evaluate(rag_pipeline)
)

Exercise 