## Building a RAG-enabled PDF Chat App with SuperDuperDB and Open Source LLMs from HuggingFace like DeciLM 7B

This Notbook outlines the development of a chat application capable of interacting with PDF documents, leveraging the integration of SuperDuperDB with large language models (LLMs). The chosen LLM for this prototype is DeciLM 7B Instruct, but the framework can be readily adapted to other Open Source LLMs from HuggingFace.

**Key Advantages:**

* **Modular Architecture:** SuperDuperDB seamlessly integrates with specialized Python libraries like PyMuPDF and Pandas, enabling efficient handling of distinct tasks like PDF parsing and text chunking. This modular approach offers superior performance compared to all-in-one solutions.
* **Open Source Flexibility:** Utilizing Open Source LLMs from HuggingFace empowers developers with greater customization and cost-effectiveness compared to proprietary API offerings.
* **Enhanced Scalability:** SuperDuperDB's vector indexing capabilities facilitate efficient retrieval of relevant information from large document collections, enabling the chat app to scale effectively.

**Core Workflow:**

1. **PDF Parsing and Chunking:** PyMuPDF and Pandas are utilized to extract text from the PDF document and segment it into manageable chunks suitable for LLM processing.
2. **Vector Indexing and Storage:** SuperDuperDB creates a vector representation of each text chunk and stores it efficiently for retrieval.
3. **LLM-powered Chat Interaction:** User queries are formulated as prompts, and SuperDuperDB retrieves relevant chunks from the indexed data. The chosen LLM processes these chunks and generates responses, enabling a conversational interaction with the PDF content.

**Benefits and Applications:**

This approach offers a robust framework for building chat applications capable of interacting with textual data stored in PDF format. Potential applications include:

* **Information Retrieval:** Chatbots can answer user queries directly from within PDFs, improving document accessibility and knowledge extraction.
* **Data Analysis:** LLMs can analyze large document collections through the chat interface, providing insights and summarizations.
* **Educational Tools:** Interactive learning experiences can be built by enabling students to ask questions and receive answers directly from relevant PDFs.

**Conclusion:**

This framework demonstrates the effectiveness of combining SuperDuperDB with Open Source LLMs to create powerful chat applications for interacting with textual data. By leveraging the modularity and flexibility of this approach, developers can build scalable and customizable solutions for diverse applications across various industries.

Let's begin 🚀

Install the libraries 🫡

In [None]:
!pip install huggingface_hub
!pip install transformers
!pip install accelerate
!pip install bitsandbytes
!pip install ninja
!pip install flash-attn
!pip install sentence_transformers
!pip install pymupdf
!pip install superduperdb

# Step 1: PDF Parsing and Chunking

## Let's process a PDF file first in the most pythonic way possible

Download the book

In [None]:
%%capture
%%bash
wget -O state_of_ai_2023.zip https://github.com/harpreetsahota204/langchain-zoomcamp/raw/main/State%20of%20AI%20Report%202023%20-%20ONLINE.pdf.zip
unzip state_of_ai_2023.zip

## Supported Documents: PDF XPS EPUB MOBI FB2 CBZ
Let's process a PDF file first in the most pythonic way possible. Use PyMuPDF library to parse any kind of documents in the most efficient way possible.

In [2]:
import fitz  # PyMuPDF
import json

def chunk_pdf_to_json(pdf_path, chunk_size=500):
    doc = fitz.open(pdf_path)

    # Create a list to store text chunks
    text_chunks = []

    for page_num in range(doc.page_count):
        page = doc[page_num]
        text = page.get_text()

        # Chunk the text into segments
        chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

        # Append each chunk to the list
        text_chunks.extend(chunks)

    # Close the PDF document
    doc.close()

    return text_chunks

def save_to_json(output_file, text_chunks):
    with open(output_file, 'w', encoding='utf-8') as json_file:
        json.dump(text_chunks, json_file, ensure_ascii=False)

def main():
    # Specify the PDF file path
    pdf_path = "State of AI Report 2023 - ONLINE.pdf"


    # Specify the output JSON file path
    output_file = "output.json"

    # Chunk the PDF into text segments
    text_chunks = chunk_pdf_to_json(pdf_path)

    # Save the text segments to a JSON file
    save_to_json(output_file, text_chunks)

    print(f"PDF content saved to {output_file}")

if __name__ == "__main__":
    main()


PDF content saved to output.json


Now the good ol' pandas. Pandas is unbeatable for fast data processing. 

In [3]:
import pandas as pd
import json

# Specify the path to your JSON file
json_file_path = "output.json"

# Read the JSON file into a Pandas DataFrame
with open(json_file_path, 'r', encoding='utf-8') as json_file:
    data = json.load(json_file)

df = pd.DataFrame(data)

df

# Now 'df' is a Pandas DataFrame containing the data from the JSON file
# You can perform various operations on the DataFrame as needed


Unnamed: 0,0
0,"State of AI Report\nOctober 12, 2023\nNathan B..."
1,About the authors\n Introduction | Research | ...
2,hD from Cambridge in cancer research. \nNathan...
3,State of AI Report 2023 team\n#stateofai | 3\n...
4,ssociate \nDirector \nat \nMilltown \nPartners...
...,...
379,About the authors\n Introduction | Research | ...
380,PhD from Cambridge in cancer research. \nNath...
381,State of AI Report 2023 team\n#stateofai | 162...
382,\nAssociate \nDirector \nat \nMilltown \nPartn...


To further process the PDF file and eliminate unnecessary content between texts, you can consider using text extraction techniques or regular expressions. 

In [4]:
import re
# Remove extra whitespaces using regex
df[0] = df[0].apply(lambda x: re.sub(r'\s+', ' ', x).strip())
df[0] = df[0].apply(lambda x: re.sub(r'[^\w\s]', '', x))

df

Unnamed: 0,0
0,State of AI Report October 12 2023 Nathan Bena...
1,About the authors Introduction Research Indu...
2,hD from Cambridge in cancer research Nathan Be...
3,State of AI Report 2023 team stateofai 3 Intr...
4,ssociate Director at Milltown Partners where h...
...,...
379,About the authors Introduction Research Indu...
380,PhD from Cambridge in cancer research Nathan B...
381,State of AI Report 2023 team stateofai 162 In...
382,Associate Director at Milltown Partners where ...


Now change the column name to `text_chunk` and create a `id` column

In [5]:
# Reset the index and convert it to a column named 'id'
df.reset_index(inplace=True)
df.rename(columns={'index': 'id', 0: 'text_chunk'}, inplace=True)

# Display the DataFrame with 'id' column
print(df)

      id                                         text_chunk
0      0  State of AI Report October 12 2023 Nathan Bena...
1      1  About the authors Introduction  Research  Indu...
2      2  hD from Cambridge in cancer research Nathan Be...
3      3  State of AI Report 2023 team stateofai  3 Intr...
4      4  ssociate Director at Milltown Partners where h...
..   ...                                                ...
379  379  About the authors Introduction  Research  Indu...
380  380  PhD from Cambridge in cancer research Nathan B...
381  381  State of AI Report 2023 team stateofai  162 In...
382  382  Associate Director at Milltown Partners where ...
383  383  State of AI Report October 12 2023 Nathan Bena...

[384 rows x 2 columns]


# Step 2: SuperDuperDB for indexing vectors, searching it and persistency 

SuperDuperDB actually wears many hats, this hat is one of them. You can even do model training and other stuffs as well. But let's use only one power of SuperDuperDB today. 

## Now, SuperDuperDB enters in the room

Now, introducing SuperDuperDB into the mix! You have the option to store the data in MongoDB or any other SQL database. Here, we demonstrate with Sqlite and DuckDB, offering the advantage of saving the database locally for future use on your filesystem.

But why a Database is needed for Your PDF Chat App?

You're absolutely right! While the chat application can function without a database for a limited number of PDFs, a database unlocks its true potential and offers many benefits. While a database-less approach might be feasible for small-scale use cases, integrating a database into your PDF chat app unlocks a world of benefits, enhancing efficiency, user experience, data security, and analytical capabilities. It's the key to building a truly robust and scalable solution for interacting with textual data in PDF format.

In [72]:
from superduperdb import superduper
from superduperdb.backends.mongodb import Collection
import os

# SuperDuperDB, now handles your MongoDB database
# It just super dupers your database
db = superduper("sqlite://book.db") # For SQLITE 
# db = superduper('duckdb://test.ddb') # For DuckDB
# (You can save it later after saving the pdf from your filesystem. But we recommend serious persistent database!)

[32m 2023-Dec-15 13:21:22.73[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.build[0m:[36m50  [0m | [34m[1mParsing data connection URI:sqlite://book.db[0m
[32m 2023-Dec-15 13:21:22.73[0m| [1mINFO    [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.build[0m:[36m137 [0m | [1mData Client is ready. <ibis.backends.sqlite.Backend object at 0x7f1f4c33cb80>[0m
[32m 2023-Dec-15 13:21:22.74[0m| [1mINFO    [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.datalayer[0m:[36m79  [0m | [1mBuilding Data Layer[0m


Now create a table

In [73]:
from superduperdb.backends.ibis.query import Table
from superduperdb.backends.ibis.field_types import dtype
from superduperdb import Schema

# Define the 'captions' table
book = Table(
    'book',
    primary_id='id',
    schema=Schema(
        'book-schema',
        fields={'id': dtype(str), 'text_chunk': dtype(str)},
    )
)

db.add(book)

([], Table())

# Persistency is key

Insert the book into the database. This is a one-time task, and you can continue adding as many books as needed. Let's begin with the previous book. It gives you persistence.

In [74]:
# Insert data from the 'images_df' DataFrame into the 'images' table
_ = db.execute(book.insert(df))

[32m 2023-Dec-15 13:21:31.43[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.datalayer[0m:[36m716 [0m | [34m[1mBuilding task workflow graph. Query:<superduperdb.backends.ibis.query.IbisQueryTable[
    [92m[1mbook[0m}
] object at 0x7f1f240d1480>[0m
[32m 2023-Dec-15 13:21:31.44[0m| [1mINFO    [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.backends.local.compute[0m:[36m32  [0m | [1mSubmitting job. function:<function callable_job at 0x7f201a217f40>[0m
[32m 2023-Dec-15 13:21:31.44[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.misc.download[0m:[36m337 [0m | [34m[1m{'cls': 'IbisQueryTable', 'dict': {'identifier': 'book', 'primary_id': 'id'}, 'module': 'superduperdb.backends.ibis.query'}[0m
[32m 2023-Dec-15 13:21:31.44[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[

Now you use `all-MiniLM-L6-v2` to create embedding for the `text_chunk` in the book.

In [75]:
from superduperdb import Model
import sentence_transformers
from superduperdb.ext.numpy import array

# Create a SuperDuperDB Model using Sentence Transformers
superduperdb_model = Model(
    identifier='all-MiniLM-L6-v2',
    object=sentence_transformers.SentenceTransformer('all-MiniLM-L6-v2'),
    encoder=array('float32', shape=(384,)),
    predict_method='encode',
    batch_predict=True,
)

[2023-12-15 13:21:37] sentence_transformers.SentenceTransformer INFO Load pretrained SentenceTransformer: all-MiniLM-L6-v2
[2023-12-15 13:21:37] sentence_transformers.SentenceTransformer INFO Use pytorch device: cuda


Now create a VectorIndex called `book-index`

In [76]:
from superduperdb import VectorIndex, Listener

# Add a VectorIndex
db.add(
    VectorIndex(
        'book-index',
        indexing_listener=Listener(
            model=superduperdb_model,
            key='text_chunk',
            select=book, # Table Name
        ),
        compatible_listener=Listener(
            model=superduperdb_model,
            key='text_chunk',
            active=False,
            select=None,
        )
    )
)

[32m 2023-Dec-15 13:21:41.01[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.datalayer[0m:[36m873 [0m | [34m[1mencoder/numpy.float32[384]/1 already exists - doing nothing[0m
[32m 2023-Dec-15 13:21:41.03[0m| [1mINFO    [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.components.model[0m:[36m221 [0m | [1mAdding model all-MiniLM-L6-v2 to db[0m
[32m 2023-Dec-15 13:21:41.03[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.datalayer[0m:[36m873 [0m | [34m[1mmodel/all-MiniLM-L6-v2/1 already exists - doing nothing[0m


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 768/768 [00:00<00:00, 184217.40it/s]
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 74.60it/s]


[32m 2023-Dec-15 13:21:41.66[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.datalayer[0m:[36m873 [0m | [34m[1mmodel/all-MiniLM-L6-v2/1 already exists - doing nothing[0m


([None],
 VectorIndex(identifier='book-index', indexing_listener=Listener(key='text_chunk', model=Model(identifier='all-MiniLM-L6-v2', object=<Artifact artifact=SentenceTransformer(
   (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
   (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
   (2): Normalize()
 ) serializer=dill>, flatten=False, output_schema=None, encoder=Encoder(identifier='numpy.float32[384]', decoder=<Artifact artifact=<superduperdb.ext.numpy.encoder.DecodeArray object at 0x7f1f1665f1f0> serializer=dill>, encoder=<Artifact artifact=<superduperdb.ext.numpy.encoder.EncodeArray object at 0x7f1f4c33f340> serializer=dill>, shape=(384,), load_hybrid=True, version=1), preprocess=None, postprocess=None, collate_fn=None, metrics=(), predict_method='encode', model_to_device_method=None, b

In [77]:
db.show('vector_index')

['book-index']

Now test drive it. Let's do a vector search on the books. Here we searched for `What is new about FlashAttention?`

In [78]:
from superduperdb import Document
from IPython.display import *

# Execute a query to find text chunks
context = db.execute(
      book
        .like(Document({'text_chunk': 'What is new about FlashAttention?'}), vector_index='book-index', n=5)
        .limit(5)
)

context_str = ""
# Display a horizontal rule to separate results
display(Markdown('---'))

# Display each document's 'txt' field and separate them with a horizontal rule
for r in context:
    display(Markdown(r['text_chunk']))
    display(Markdown('---'))
# context[0]['text_chunk']

Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 181.58it/s]


[32m 2023-Dec-15 13:21:53.52[0m| [1mINFO    [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.datalayer[0m:[36m124 [0m | [1mloading of vectors of vector-index: 'book-index'[0m
[32m 2023-Dec-15 13:21:53.52[0m| [1mINFO    [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.base.datalayer[0m:[36m164 [0m | [1m<superduperdb.backends.ibis.query.IbisCompoundSelect[
    [92m[1mbook.join(_outputs/all-MiniLM-L6-v2/1.relabel({'output': "'_outputs.text_chunk.all-MiniLM-L6-v2.1'"}), _outputs/all-MiniLM-L6-v2/1.relabel({'output': "'_outputs.text_chunk.all-MiniLM-L6-v2.1'"}).input_id == book.id).filter(_outputs/all-MiniLM-L6-v2/1.relabel({'output': "'_outputs.text_chunk.all-MiniLM-L6-v2.1'"}).key == 'text_chunk')[0m}
] object at 0x7f1ffd19a0b0>[0m


  return getattr(parent, self.name)(*args, **kwargs), tables
Loading vectors into vector-table...: 1536it [00:00, 9510.85it/s]

[32m 2023-Dec-15 13:21:53.69[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.vector_search.in_memory[0m:[36m67  [0m | [34m[1m[ 0.1279827   0.2199108   0.1840778   0.13627723 -0.01266751  0.15201518
  0.2332509   0.20212926  0.13149157  0.18559319  0.20774594  0.09110226
  0.21026263  0.17289744  0.13998233  0.24578577  0.08389924  0.15822439
  0.18548933  0.15634294  0.1690236   0.15244472  0.08105554  0.14042866
  0.19324496  0.23143569  0.12598097  0.11050446  0.07055023  0.20252329
  0.09383817  0.06189565  0.17707013  0.15875745  0.12750573  0.13243927
  0.13937046  0.12853792  0.10664025  0.14330223  0.2526616   0.06014976
  0.03820092  0.17658284  0.1255298   0.19222465  0.16840631  0.1735734
  0.33879203  0.18535872  0.21007608  0.17317763  0.5015377   0.20856729
  0.24947053  0.18583965  0.17176494  0.21034792  0.1788768   0.29105696
  0.03449035  0.13759665  0.14320633  0.25413412  0.20329522  0.1957




---

anies have embraced a culture of opacity about their most cutting edge research stateofai  16 The GPT4 technical report puts the nail in the cofﬁn of SOTA LLM research Introduction  Research  Industry  Politics  Safety  Predictions   

---

er training on the users speciﬁc use case But thats hindered by a limited context length due to the resulting compute and memory bottleneck  Several innovations have been used to increase the context length of LLMs Some fundamentally make the memory footprint of attention smaller FlashAttention Others enable models to train on small contexts but run inference on larger ones ALiBi  this is called length extrapolation  at the price of minimal ﬁnetuning and removing positional e

---

stateofai 2023  FlashAttention introduces a signiﬁcant memory saving by making attention linear instead of quadratic in sequence length FlashAttention2 further improves computing the attention matrix by having fewer nonmatmul FLOPS better parallelism and better work partitioning The result is a 28x training speedup of GPTstyle models  Reducing the number of bits in the parameters reduces both the memory footprint and the latency of LLMs The case for 4bit precision kbit Inferenc

---

models in popular products most notably on Adobes Fireﬂy Photoroom or even Discord stateofai  93 Texttoimage models Competition intensiﬁes and integrations abound Introduction  Research  Industry  Politics  Safety  Predictions  Midjourneys revenue which had already reached 1M MRR in March 2022 is projected to reach 200M ARR in 2023 Its number of users grew from 2M to 148M YoY Notably Midjourney is integrated in Discord where users can generate images on a Discord s

---

greement with Google DeepMind Anthropic and OpenAI to gain early access to their most advanced frontier models to improve their understanding of risk  While popular with industry it is unclear if these approaches will survive Recently the UK Government dropped lighttouch from its vocabulary and has repositioned itself as the home of the AI safety debate  The Indian Ministry of Electronics and Information Technology has now said forthcoming legislation may indeed cover some forms of

---

# Step 3: DeciLM 7B & SuperDuperDB again to do LLM-powered Chat Interaction

## DeciLM-7B-instruct

Now it's inferencing time. We picked the DeciLM 7B Instruct for this job. As it is promising highest possible throughput right now. 

Here you can use any other model. It's just basic Huggingface stuffs! Find your model and do your thing!

In [14]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline

In [None]:
model_id = 'Deci/DeciLM-7B-instruct'

using_colab_T4_GPU = False # We used A10 GPU, if you have T4 make it true
if using_colab_T4_GPU:
  bnb_config = BitsAndBytesConfig(
      load_in_4bit = True,
      bnb_4bit_compute_dtype=torch.bfloat16
  )
  dtype_kwargs = {"quantization_config": bnb_config}
else:
  dtype_kwargs = {"torch_dtype": torch.bfloat16}


model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    **dtype_kwargs
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token # For DeciLM

Here is a helper function to generate prompt with DeciLM 7B Instruct

In [41]:
SYSTEM_PROMPT_TEMPLATE ="""
### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.
### User:
{instruction}
### Assistant:
"""

# Function to construct the prompt using the new system prompt template
def get_prompt_with_template(message: str) -> str:
    return SYSTEM_PROMPT_TEMPLATE.format(instruction=message)

# Function to handle the generation of the model's response using the constructed prompt
def generate_model_response(message: str) -> str:
    prompt = get_prompt_with_template(message)
    inputs = tokenizer(prompt, return_tensors='pt')
    if torch.cuda.is_available():  # Ensure input tensors are on the GPU if model is on GPU
        inputs = inputs.to('cuda')
    output = model.generate(**inputs,
                            max_new_tokens=3000,
                            num_beams=5,
                            no_repeat_ngram_size=4,
                            early_stopping=True
                            )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Function to extract the content after "### Response:"
def extract_response_content(full_response: str) -> str:
    response_start_index = full_response.find("### Assistant:")
    if response_start_index != -1:
        return full_response[response_start_index + len("### Assistant:"):].strip()
    else:
        return full_response

# Main function to get the model's response and extract the content after "### Response:"
def get_response_with_template(message: str) -> str:
    full_response = generate_model_response(message)
    return extract_response_content(full_response)

Now create a helper function to do `vector-search` and get the context.

In [None]:
# Helper function to get the context text

from superduperdb import Document

def get_context(query, number = 1):
    # Execute a query to find text chunks
    contexts = db.execute(
          book
            .like(Document({'text_chunk': 'query'}), vector_index='book-index', n=number)
            .limit(number)
    )

    context_str = []

    # Display each document's 'txt' field and separate them with a horizontal rule
    for context in contexts:
        context_str.append(context['text_chunk'])

    result = ' '.join(context_str)

    return result 

Now start chatting with your LLM based on your LLM!

In [80]:
query = "What is new about FlashAttention?"
context = get_context(query, 5)

prompt = f"Your task is to synthesize the query, which is delimited by triple backticks, and write a response that appropriately answers the query based on the retrieved context.\n### Query:\n```{query}```\n### Context:\n```{context}```\n### Response:\nBegin!"

# Sample usage
# user_message = f"{query}. Context: {context}"
response = get_response_with_template(prompt)
print(response)

Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 168.23it/s]
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[32m 2023-Dec-15 13:22:06.51[0m| [34m[1mDEBUG   [0m | [36mip-172-31-29-75[0m| [36ma7d1f8ff-42be-476b-9adc-74c4b826ac32[0m| [36msuperduperdb.vector_search.in_memory[0m:[36m67  [0m | [34m[1m[ 4.51766104e-02  3.04632280e-02  4.90104482e-02  6.72234818e-02
  2.07076222e-03  5.07198870e-02 -5.88827953e-03  7.12032989e-03
 -2.18982063e-02  5.78837246e-02 -3.89772989e-02  1.20863691e-03
  4.93805856e-05 -7.13142008e-02 -2.44551897e-03  1.21803656e-02
  7.72564113e-02  2.86947656e-02  5.74232340e-02  7.13339224e-02
  5.26450053e-02  2.86853388e-02 -3.29149067e-02  1.61617026e-02
 -3.23099345e-02 -1.29313581e-02  1.39991343e-01  8.04095156e-03
 -2.44408958e-02  2.88190488e-02  7.81827420e-03 -1.08232601e-02
 -4.90357503e-02 -1.92871504e-02  9.33369398e-02  8.08862671e-02
  9.14772004e-02  7.28338063e-02  3.91569398e-02  1.40876174e-02
 -7.79762911e-03 -2.20977813e-02  5.46424799e-02  4.43230122e-02
  2.14818865e-03  2.06977986e-02 -5.94393536e-03  5.32741621e-02
 -3.14332359e-02 

Result: 
Based on the provided context, the new aspect of FlashAttention is the integration of Chain of Thought (CoT) and Tree of Thought (ToT) prompting techniques. These techniques aim to improve the quality of prompts and enhance task performance by incorporating intermediate reasoning steps and representing thoughts as a tree structure, respectively. Additionally, FlashAttention leverages search algorithms to explore the tree structure and assigns probabilities to answer binary questions.

# Bring you own LLM from HuggingFace or anywhere! 
In step 3 you can bring any model you like from Huggingface ecosystem. SuperDuperDB is here to help to generate the context for you.

## You see the power of SuperDuperDB and how it blends well with the ecosystem.

You see SuperDuperDB is helping you to generate context for your LLM. Now you can use any other LLM. Just edit the f-string of prompt with the context produced by SuperDuperDB!

Now create your own solution and share it to us. Maybe create a database with 100 pdfs! 