## Indexing
- Clean and extract text
- Segment text into chunks
- Encode these chunks into vectors
- Store vectors in databases

In [1]:
# Imports
import faiss
import numpy as np
from transformers import AutoTokenizer, AutoModel
import textract


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Initialize tokenizer and model for encoding text into vectors
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
model = AutoModel.from_pretrained('distilbert-base-uncased')

def encode_text(text):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
    outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).detach().numpy()




In [6]:
# Extract text from a PDF (Sample source)
text = textract.process("./data/deep-learning.pdf", method="pdfminer").decode()

# Segment the text into chunks
chunks = [text[i:i+500] for i in range(0, len(text), 500)]
print('Sample chunk:')
print(chunks[1])


Sample chunk:
atrices
2.3
. . . . . . . . . . . . . . . . . . . .
2.4
Linear Dependence and Span . . . . . . . . . . . . . . . . . . . .
Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
Special Kinds of Matrices and Vectors
2.6
. . . . . . . . . . . . . . .
2.7
Eigendecomposition . . . . . . . . . . . . . . . . . . . . . . . . . .
Singular Value Decomposition . . . . . . . . . . . . . . . . . . . .
2.8
The Moore-Penrose Pseudoinverse . . . . . . . . . . . . . . . . . .
2.9
2.10 The 


In [7]:
# Encoding and indexing
dim = model.config.hidden_size
index = faiss.IndexFlatL2(dim) # Using L2 distance for simplicity

for chunk in chunks:
    vec = encode_text(chunk)
    index.add(vec)

# Save the index
faiss.write_index(index, "./data/store.faiss")

  return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count


### Handling retrieval
- Encode the user query
- Compute similarity scores between the query vector and document vectors
- Retrieve the top K similar chunks

In [9]:
# Handling retrieval
def retrieve(query, k=5):
    query_vec = encode_text(query)
    D, I = index.search(query_vec, k)
    return [chunks[i] for i in I[0]], D[0]

# Example
query = "What is RAG in AI?"
retrieved_chunks, distances = retrieve(query=query)
print("Retrieved chunks:")
print(retrieved_chunks, distances)

Retrieved chunks:
['orld. For example, Cyc failed to understand a story\nabout a person named Fred shaving in the morning (\n). Its inference\nengine detected an inconsistency in the story: it knew that people do not have\nelectrical parts, but because Fred was holding an electric razor, it believed the\nentity “FredWhileShaving” contained electrical parts. It therefore asked whether\nFred was still a person while he was shaving.\n\nLinde 1992\n\n,\n\nThe diﬃculties faced by systems relying on hard-coded knowledge suggest\nthat', 'ge and acquiring knowledge can be done via learning,\nwhich has motivated the development of large-scale deep architectures. However,\nthere are diﬀerent kinds of knowledge. Some knowledge can be implicit, sub-\nconscious, and diﬃcult to verbalize—such as how to walk, or how a dog looks\ndiﬀerent from a cat. Other knowledge can be explicit, declarative, and relatively\nstraightforward to put into words—every day commonsense knowledge, like “a cat\nis a kind o

### Generation
- Combine the query and retrieved texts into a coherent prompt
- Generate a response using model

In [39]:
from transformers import pipeline
import os

generator = pipeline('text-generation', model='gpt2')

def generate_response(query, retrieved_chunks):
    prompt = f"Question: {query}\n\nContext: " + " ".join(retrieved_chunks)
    total_length = len(tokenizer.encode(prompt)) + 250
    response = generator(prompt, truncation=True, max_length=total_length, num_return_sequences=1)
    generated_response = response[0]['generated_text']

    # Saving the response
    file_path = './data/response.txt'
    os.makedirs(os.path.dirname(file_path), exist_ok=True)

    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(generated_response)
    
    return generated_response



ValueError: Could not load model facebook/blenderbot-400M-distill with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.blenderbot.modeling_blenderbot.BlenderbotForConditionalGeneration'>). See the original errors:

while loading with AutoModelForCausalLM, an error is thrown:
Traceback (most recent call last):
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 737, in _error_catcher
    yield
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 883, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(273881815 bytes read, 455874168 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 1043, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 963, in read
    data = self._raw_read(amt)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 861, in _raw_read
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 761, in _error_catcher
    raise ProtocolError(arg, e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(273881815 bytes read, 455874168 more expected)', IncompleteRead(273881815 bytes read, 455874168 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 283, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3335, in from_pretrained
    resolved_archive_file = cached_file(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1367, in _hf_hub_download_to_cache_dir
    _download_to_tmp_and_move(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1884, in _download_to_tmp_and_move
    http_get(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 539, in http_get
    for chunk in r.iter_content(chunk_size=DOWNLOAD_CHUNK_SIZE):
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(273881815 bytes read, 455874168 more expected)', IncompleteRead(273881815 bytes read, 455874168 more expected))

while loading with BlenderbotForConditionalGeneration, an error is thrown:
Traceback (most recent call last):
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 737, in _error_catcher
    yield
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 883, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(371444828 bytes read, 85681395 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 1043, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 963, in read
    data = self._raw_read(amt)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 861, in _raw_read
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/urllib3/response.py", line 761, in _error_catcher
    raise ProtocolError(arg, e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(371444828 bytes read, 85681395 more expected)', IncompleteRead(371444828 bytes read, 85681395 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 283, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1210, in from_pretrained
    return super(BlenderbotForConditionalGeneration, cls).from_pretrained(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3335, in from_pretrained
    resolved_archive_file = cached_file(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1367, in _hf_hub_download_to_cache_dir
    _download_to_tmp_and_move(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1884, in _download_to_tmp_and_move
    http_get(
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 539, in http_get
    for chunk in r.iter_content(chunk_size=DOWNLOAD_CHUNK_SIZE):
  File "/home/jimna/Desktop/projects/rag/ragenv/lib/python3.10/site-packages/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(371444828 bytes read, 85681395 more expected)', IncompleteRead(371444828 bytes read, 85681395 more expected))




In [38]:
# Running actions
response = generate_response(query, retrieved_chunks=retrieved_chunks)
print(response)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: What is RAG in AI?

Context: orld. For example, Cyc failed to understand a story
about a person named Fred shaving in the morning (
). Its inference
engine detected an inconsistency in the story: it knew that people do not have
electrical parts, but because Fred was holding an electric razor, it believed the
entity “FredWhileShaving” contained electrical parts. It therefore asked whether
Fred was still a person while he was shaving.

Linde 1992

,

The diﬃculties faced by systems relying on hard-coded knowledge suggest
that ge and acquiring knowledge can be done via learning,
which has motivated the development of large-scale deep architectures. However,
there are diﬀerent kinds of knowledge. Some knowledge can be implicit, sub-
conscious, and diﬃcult to verbalize—such as how to walk, or how a dog looks
diﬀerent from a cat. Other knowledge can be explicit, declarative, and relatively
straightforward to put into words—every day commonsense knowledge, like “a cat
is a kind of a