This is a competition hosted on [Kaggle](https://www.kaggle.com/competitions/data-assistants-with-gemma) publish by Google.

## Introduction to LanChain

`LangChain` is a framework that enables the creation of complex applications by combining multiple components into a single, cohesive chain. It allows us to structure workflows by linking user inputs, prompts, and language model outputs together. **Chains** can be composed of various elements like prompt templates, LLMs, and other tools to build more advanced processes. By combining different types of chains, we can enhance the capabilities of the application for specific tasks. LangChain provides flexibility to customize and scale workflows based on the needs of the project.

Flow chart image from `LangChain` [Document](https://js.langchain.com/v0.1/docs/modules/chains/document/stuff/).
![](https://js.langchain.com/v0.1/assets/images/stuff-818da4c66ee17911bc8861c089316579.jpg)

## Introduction to RAG
Please see [LangChain Document](https://python.langchain.com/docs/concepts/rag/) for more details.
1. Splitter: Splits text into smaller chunks for easier processing. [LangChain Document](https://python.langchain.com/docs/concepts/text_splitters/)
2. Embedding: Converts text into numerical vectors that capture meaning. [LangChain Document](https://python.langchain.com/docs/concepts/embedding_models/)
3. Storage: Stores and organizes vector data for efficient retrieval. [LangChain Document](https://python.langchain.com/docs/concepts/vectorstores/)
4. Retriver: Retrieves relevant data or documents based on queries. [LangChain Document](https://python.langchain.com/docs/concepts/retrievers/)

![detail_rag](https://hackmd.io/_uploads/BJfyJcYTJx.png)

In [None]:
!pip install -U transformers bitsandbytes accelerate
# !pip install datasets --no-deps
!pip install datasets s3fs
!pip install langchain pypdf sentence-transformers
!pip install langchain-community
!pip install chromadb
# !pip install qdrant-client

Collecting transformers
  Downloading transformers-4.50.3-py3-none-any.whl.metadata (39 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.4-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoProcessor, pipeline
import torch
from IPython.display import Markdown
import pandas as pd
from datasets import Dataset

In [None]:
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma
# from langchain.vectorstores import Qdrant
# from langchain.chains import LLMChain
# from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.combine_documents.stuff import create_stuff_documents_chain
# from langchain.chains import RetrievalQA
from langchain.chains import create_retrieval_chain

In [None]:
config = {
    "chunk_size": 500, # Size of each document chunk
    "chunk_overlap": 100, # Overlap size between consecutive chunks
    # "search_type": "similarity_score_threshold", # Type of search to use
    # "score_threshold": 0.1, # Minimum similarity score for a match
    # "top_k": 2 # Number of top results to return
}

## Load PLM
We use Google's model [Gemma 1.0](https://huggingface.co/google/gemma-2b-it) 2B instruct version.

In [None]:
# model_id = 'google/gemma-2b'
model_id = 'google/gemma-2b-it'

# 4 Bit Config
bnb_config_4bit = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config_4bit, low_cpu_mem_usage=True, trust_remote_code=True)

# Loading Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.padding_side = "right"
print(f"Gemma 1.0 4Bit Model size: {model.get_memory_footprint()/1024./1024./1024.:,} GB")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

Gemma 1.0 4Bit Model size: 1.8995556831359863 GB


In [None]:
pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs = {"torch.dtype": torch.bfloat16},
    max_new_tokens=512
)

Device set to use cuda:0


In [None]:
# Initialize the HuggingFacePipeline for the language model
gemma_llm = HuggingFacePipeline(
    pipeline=pipeline,
    model_kwargs={
        "temperature": 0.7,
        "max_new_tokens": 512,
        "add_special_tokens": True,
        "do_sample": True,
        "top_k": 10,
        "top_p": 0.95
    },
)

  gemma_llm = HuggingFacePipeline(


In [None]:
# test 1
question = "What is the difference between a variable and an object"

message = [
    {"role": "user", "content": question},
]

prompt = pipeline.tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)

outputs = pipeline(
    prompt,
    max_new_tokens=512,
    add_special_tokens=True,
    do_sample=True,
    temperature=0.7,
    top_k=10,
    top_p=0.95
)
Markdown(outputs[0]["generated_text"][len(prompt):])

Sure, here's the difference between a variable and an object:

**Variable:**

* A variable is a memory location that stores a single value.
* It is a symbolic name that refers to a specific memory location.
* A variable can hold different values during program execution.
* Variables are used to store and manipulate data.
* They are declared using the `=` operator.
* Examples: `age`, `name`, `count`.

**Object:**

* An object is a complex data structure that contains multiple variables and methods.
* It is an instance of a class.
* An object has its own set of data members and behaviors.
* Objects are created using the `new` keyword.
* Objects can interact with each other and store complex data structures.
* They are used to represent real-world entities or data sets.
* Examples: `person`, `product`, `graph`.

**Key Differences:**

| Feature | Variable | Object |
|---|---|---|
| Type | Data type | Object |
| Definition | Symbolic name | Class |
| Storage | Memory location | Class definition |
| Data ownership | Program memory | Class memory |
| Data sharing | Single variable | Multiple variables |
| Access | Using variable name | Using object name |
| Use cases | Storing and manipulating single values | Creating complex data structures and objects |

**Example:**

```python
# Variable
age = 30

# Object
person = {"name": "John", "age": 30, "city": "New York"}
```

In this example, `age` is a variable that stores a single value, while `person` is an object that contains multiple variables and methods.

**Summary:**

| Feature | Variable | Object |
|---|---|---|
| Type | Data type | Object |
| Definition | Symbolic name | Class |
| Data storage | Memory location | Class definition |
| Data ownership | Program memory | Class memory |
| Access | Using variable name | Using object name |

In [None]:
# test 2
question = "Write in detail about python?"

message = [
    {"role": "user", "content": question},
]

prompt = pipeline.tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)

outputs = pipeline(
    prompt,
    max_new_tokens=512,
    add_special_tokens=True,
    do_sample=True,
    temperature=0.7,
    top_k=10,
    top_p=0.95
)
Markdown(outputs[0]["generated_text"][len(prompt):])

**Python** is a high-level, general-purpose programming language. It is known for its versatility and ability to be used for a wide range of tasks, including:

* **Data analysis:** Python is a popular choice for data wrangling, cleaning, and analysis. It has powerful libraries like NumPy, Pandas, and Matplotlib for data manipulation and visualization.
* **Machine learning:** Python is a powerful tool for machine learning tasks, including classification, regression, and clustering. It has libraries like scikit-learn, TensorFlow, and PyTorch for building, training, and evaluating machine learning models.
* **Web development:** While Python is not a dedicated web development language, it can be used with frameworks like Django and Flask to create dynamic web applications.
* **Scripting:** Python is a versatile scripting language that can be used for automating tasks, generating reports, and more.
* **Data science:** Python is widely used in data science and statistics for data analysis, visualization, and machine learning.

**Key Features of Python:**

* **Dynamic typing:** Python is dynamically typed, meaning that you don't need to explicitly declare the data type of variables. This makes it easier to write and maintain code, especially for beginners.
* **Indentation:** Python uses indentation to define code blocks, making it clear to the interpreter where different code elements belong.
* **Modules:** Python has a vast collection of modules that extend the functionality of the language. This allows you to extend the capabilities of Python without modifying the core language itself.
* **Concurrency:** Python has built-in support for concurrency, allowing you to run multiple tasks simultaneously without blocking the main thread.
* **Regular expressions:** Python has powerful regular expression capabilities that make it easy to manipulate and search strings.

**Example Code:**

```python
# Print "Hello, world!"
print("Hello, world!")

# Create a list of numbers
numbers = [1, 2, 3, 4, 5]

# Print the first 3 elements of the list
print(numbers[:3])

# Use the len() function to get the length of a string
string = "Hello world!"
length = len(string)

# Print the length of the string
print(length)
```

**Use Cases of Python:**

* Data analysis and visualization
* Machine learning
* Web development
* Scripting
* Data science and statistics
* Automation
* Machine

## Load the External Dataset

In [None]:
!mkdir knowledge-base
!gdown '17hUr5nSw7NswFxtuhPL4CSXVe3nWZdWt' --output knowledge-base/Introduction_to_Machine_Learning_with_Python-min.pdf

Downloading...
From: https://drive.google.com/uc?id=17hUr5nSw7NswFxtuhPL4CSXVe3nWZdWt
To: /content/knowledge-base/Introduction_to_Machine_Learning_with_Python-min.pdf
100% 7.05M/7.05M [00:00<00:00, 14.2MB/s]


In [None]:
loader = PyPDFDirectoryLoader('/content/knowledge-base/')
docs = loader.load()

In [None]:
print("This book has a total of {} pages.".format(len(docs)))
print("The first page has {} characters.\n".format(len(docs[0].page_content)))
print("The 500 characters of money on the 5-th page are as follows:")
print(docs[4].page_content[:500])

This book has a total of 392 pages.
The first page has 113 characters.

The 500 characters of money on the 5-th page are as follows:
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  vii
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1
Why Machine Learning?                                                                                                   1
Problems Machine Learning Can Solve                         


## Splitter and Embedding and Storage
### Vector Database:
A vector database stores and retrieves high-dimensional vectors generated by machine learning models, enabling efficient similarity searches. It allows for fast querying of embeddings for tasks like nearest neighbor search and semantic search.

- `Chroma`: A lightweight, easy-to-use vector database with a focus on integration with Langchain, providing features like metadata support and persistence.
- `Qdrant`: A highly scalable vector database designed for fast, real-time similarity search and offering advanced filtering capabilities.
- `FAISS`: Developed by Facebook AI, FAISS is highly optimized for large-scale similarity search, often used in research and production environments, supporting various indexing methods.

### Embedding Model:
An embedding model converts text or other data into fixed-size numerical vectors, capturing semantic relationships between data points.

- `sentence-transformers/all-mpnet-base-v2`: A powerful model for generating high-quality sentence embeddings, suitable for various NLP tasks with a balance between speed and accuracy.
- `sentence-transformers/all-MiniLM-L6-v2`: A lightweight, efficient model for generating sentence embeddings, providing fast inference with slightly reduced accuracy compared to larger models like MPNet.

- [Sentence-BERT](https://arxiv.org/abs/1908.10084) is a modification of the BERT architecture designed to efficiently compute sentence embeddings by using siamese and triplet networks, making it highly effective for tasks such as semantic textual similarity and sentence clustering.
  ![Sentence-BERT](https://hackmd.io/_uploads/SkpZhP9a1x.png)

In [None]:
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"device": "cuda"}
)

In [None]:
# =========== Step 1 ===========
#            Splitter
# ==============================
text_splitter = RecursiveCharacterTextSplitter(chunk_size=config['chunk_size'], chunk_overlap=config['chunk_overlap'])
all_splits = text_splitter.split_documents(docs)

In [None]:
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embedding_splits = embedding_model.encode([split.page_content for split in all_splits])
print('(n_chunk, embedding_dim): ', embedding_splits.shape)

(n_chunk, embedding_dim):  (1885, 384)


In [None]:
# ========= Step 2 & 3 ==========
#       Embedding & Storage
# ===============================
vectordb = Chroma.from_documents(
    documents=all_splits,
    embedding=embeddings,
    persist_directory="all_documents")

In [None]:
# https://github.com/langchain-ai/langchain/discussions/22887
results_with_scores = vectordb.similarity_search_with_score("What is the difference between a variable and an object?", k=2)
for doc, score in results_with_scores:
    print(f"Document: {doc}\nSimilarity Score: {score}\n")

Document: page_content='Numbers Can Encode Categoricals
In the example of the adult dataset, the categorical variables were encoded as strings.
On the one hand, that opens up the possibility of spelling errors, but on the other
hand, it clearly marks a variable as categorical. Often, whether for ease of storage or
because of the way the data is collected, categorical variables are encoded as integers.
For example, imagine the census data in the adult dataset was collected using a ques‐' metadata={'author': 'Andreas C. Müller and Sarah Guido', 'creationdate': '2016-09-21T13:04:39+00:00', 'creator': 'AH CSS Formatter V6.2 MR4 for Linux64 : 6.2.6.18551 (2014/09/24 15:00JST)', 'moddate': '2020-08-19T07:09:16+02:00', 'page': 231, 'page_label': '218', 'producer': '3-Heights(TM) PDF Optimization Shell 5.9.1.5 (http://www.pdf-tools.com)', 'source': '/content/knowledge-base/Introduction_to_Machine_Learning_with_Python-min.pdf', 'title': 'Introduction to Machine Learning with Python', 'total_pa

In [None]:
results_with_scores = vectordb.similarity_search_with_score("Write in detail about python", k=2)
for doc, score in results_with_scores:
    print(f"Document: {doc}\nSimilarity Score: {score}\n")

Document: page_content='more. This vast toolbox provides data scientists with a large array of general- and
special-purpose functionality. One of the main advantages of using Python is the abil‐
ity to interact directly with the code, using a terminal or other tools like the Jupyter
Notebook, which we’ll look at shortly. Machine learning and data analysis are funda‐
mentally iterative processes, in which the data drives the analysis. It is essential for' metadata={'author': 'Andreas C. Müller and Sarah Guido', 'creationdate': '2016-09-21T13:04:39+00:00', 'creator': 'AH CSS Formatter V6.2 MR4 for Linux64 : 6.2.6.18551 (2014/09/24 15:00JST)', 'moddate': '2020-08-19T07:09:16+02:00', 'page': 18, 'page_label': '5', 'producer': '3-Heights(TM) PDF Optimization Shell 5.9.1.5 (http://www.pdf-tools.com)', 'source': '/content/knowledge-base/Introduction_to_Machine_Learning_with_Python-min.pdf', 'title': 'Introduction to Machine Learning with Python', 'total_pages': 392, 'trapped': '/False'}
Simi

## QA Chain Setup with Retriever
This code sets up a custom QA chain that retrieves relevant documents and generates answers using a language model. It integrates prompt templates to handle different types of questions and formats responses accordingly.

### chain type
Chain is a way to handle different types of document processing logic in LangChain, helping you organize how the model's outputs are processed and generated.
- **Stuff**: Stuff chain inserts all documents into the prompt and passes it to the LLM for processing, making it ideal for small documents with minimal input. [Document](https://js.langchain.com/v0.1/docs/modules/chains/document/stuff/)

- **Refine**: Refine chain iteratively updates its response by processing one document at a time, making it suitable for tasks with multiple documents exceeding the model's context limit.

- **Map Reduce**: Map Reduce chain processes each document separately in the "Map" step and then combines the results in the "Reduce" step to generate a final output.

- **Map Re-rank**:Map Re-rank chain scores each document’s response based on certainty and returns the one with the highest score.

### search type
Search type determines how documents are retrieved or ranked based on their relevance to the query. Please see [Document](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/vectorstore/) or [Github](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/base.py#L937) for more details.
- similarity (default)
- similarity_score_threshold
- mmr


#### Some document
- [create_retrieval_chain](https://python.langchain.com/api_reference/langchain/chains/langchain.chains.retrieval.create_retrieval_chain.html)
- [create_stuff_documents_chain](https://python.langchain.com/api_reference/langchain/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html)



In [None]:
# Define the prompt template for QA chain
# To enhance the default prompt, the following modifications were made.
prompt_template = """Use the following pieces of context to answer the question at the end. Please follow the following rules:
1. If you find the answer from the context, write the answer in a concise way and add the list of sources that are **directly** used to derive the answer. Exclude the sources that are irrelevant to the final answer.
2. If the context is irrelevant to the question, please feel free to confidently provide what you know. Say **Here is a general answer based on what I know (note: this is a generated response, not from the retrieved context):** and Insert your (model's) generated answer.

{context}

Question: {input}
Helpful Answer:"""

# Create the QA chain prompt using the above-defined prompt template
qa_chain_prompt = PromptTemplate.from_template(prompt_template)


# Define the document prompt template for combining document content with source
document_prompt = PromptTemplate(
    input_variables=["page_content", "source"],
    template="Context:\ncontent:{page_content}\nsource:{source}",
)

# Use the `create_stuff_documents_chain` to combine retrieved documents into one
combine_documents_chain = create_stuff_documents_chain(
    llm=gemma_llm,
    prompt=qa_chain_prompt,
    document_prompt=document_prompt
)

# ========== Step 4 ===========
#           Retrieve
# =============================
# Create a retriever
retriever = vectordb.as_retriever()
# retriever = vectordb.as_retriever(
#     search_type=config['search_type'],
#     search_kwargs={
#         "score_threshold": config['score_threshold'],
#         "k": config['top_k']
#         }
#     )

# Create the RetrievalQA chain using the retriever and the combine_documents_chain
# The `RetrievalQA` class is no longer recommended for use in the latest version of LangChain.
# https://python.langchain.com/docs/versions/migrating_chains/retrieval_qa/
rag_chain = create_retrieval_chain(retriever, combine_documents_chain)

#### Tug-of-War between an LLM's internal info and external info
![ChatGPT Image 2025年4月9日 下午12_45_45 (1)](https://hackmd.io/_uploads/Hk0fnumAkx.png)
> created by ChatGPT
  
When external knowledge from document retrieval conflicts with a model's internal knowledge, large language models (LLMs) often adopt incorrect external information, but they are less likely to do so when the retrieved content is clearly unrealistic. Please refer to [Paper](https://arxiv.org/abs/2404.10198) more details.

In [None]:
# test 1
# Run the query through the retrieval chain and obtain the answer
res = rag_chain.invoke({"input": "What is the difference between a variable and an object?"})

# Print the final result, showing the answer to the query
print("Answer to 'What is the difference between a variable and an object?':", res['answer'])

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Answer to 'What is the difference between a variable and an object?': Use the following pieces of context to answer the question at the end. Please follow the following rules:
1. If you find the answer from the context, write the answer in a concise way and add the list of sources that are **directly** used to derive the answer. Exclude the sources that are irrelevant to the final answer.
2. If the context is irrelevant to the question, please feel free to confidently provide what you know. Say **Here is a general answer based on what I know (note: this is a generated response, not from the retrieved context):** and Insert your (model's) generated answer.

Context:
content:Numbers Can Encode Categoricals
In the example of the adult dataset, the categorical variables were encoded as strings.
On the one hand, that opens up the possibility of spelling errors, but on the other
hand, it clearly marks a variable as categorical. Often, whether for ease of storage or
because of the way the dat

In [None]:
# test 2
# Run the query through the retrieval chain and obtain the answer
res = rag_chain.invoke({"input": "Write in detail about python?"})

# Print the final result, showing the answer to the query
print("Answer to 'Write in detail about python?':", res['answer'])

Answer to 'Write in detail about python?': Use the following pieces of context to answer the question at the end. Please follow the following rules:
1. If you find the answer from the context, write the answer in a concise way and add the list of sources that are **directly** used to derive the answer. Exclude the sources that are irrelevant to the final answer.
2. If the context is irrelevant to the question, please feel free to confidently provide what you know. Say **Here is a general answer based on what I know (note: this is a generated response, not from the retrieved context):** and Insert your (model's) generated answer.

Context:
content:Why Python?
Python has become the lingua franca for many data science applications. It combines
the power of general-purpose programming languages with the ease of use of
domain-specific scripting languages like MATLAB or R. Python has libraries for data
loading, visualization, statistics, natural language processing, image processing, and
mor

## References
- Paul Mooney and Ashley Chow. Google – AI Assistants for Data Tasks with Gemma. https://kaggle.com/competitions/data-assistants-with-gemma, 2024. Kaggle.
- https://www.kaggle.com/code/shiivvvaam/pygemma-finetuned-rag/notebook
- https://nakamasato.medium.com/enhancing-langchains-retrievalqa-for-real-source-links-53713c7d802a
- https://vijaykumarkartha.medium.com/beginners-guide-to-retrieval-chain-from-langchain-f307b1a20e77
- 全端 LLM 應用開發(向量資料庫, Hugging Face, OpenAI, LangChain...): https://ithelp.ithome.com.tw/users/20120030/ironman/7039?page=1