# Base Line

```
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
model_name = "meta-llama/Llama-3.2-3B-Instruct"  # Tên mô hình, thay thế theo tên chính thức của mô hình

# Tải mô hình và tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
qa_pipeline = pipeline(
            "text-generation",
            model=model_name,
            model_kwargs={"torch_dtype": torch.bfloat16},
            device=torch.device('cuda' if torch.cuda.is_available() else 'cpu'),
        )

def answer_question(context, question):
        # Apply the chat template for the context and question
        messages=[
              {"role": "user", "content": f"Answer the question based on the given passages. Only give me the answer and do not output any other words.\n\nThe following are given passages.\n{context}\n\nAnswer the question based on the given passages. Only give me the answer and do not output any other words.\n\nQuestion: {question}\nAnswer:"}
        ]
        prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        
        # Generate the answer using the pipeline
        outputs = qa_pipeline(
            prompt,
            max_new_tokens=128,
            do_sample=True,
            temperature=0.1,
            top_k=50,
            top_p=0.95
        )
        
        # Extracting and returning the generated answer
        answer = outputs[0]["generated_text"][len(prompt):]
        return answer```

Đây là hàm def answer_question_with_retrieval_chunk_text(context, query) cũ đang dùng

=> Muốn chỉnh sửa hàm này
Thay vì chỉ có 1 gom context (của top_k=5) lại thành 1 `final_context` và cho đi qua model llama3 3B
ta update thành như sau

1. def retrieval_chunk_text (query, chunk_text): (chunk_text for i in range(top_k=5)
-> prompt (trích xuất nội dung từ chunk_text) cái mà phù hợp với query

2. def combine_top_k_chunk_text(query, ..., top_k=5): sử dụng hàm trên và combine top_k=5 phần retrieval_chunk_text thành 1 retrieval_final_context

3. Sau đó mới truyền nó qua def answer_question(final_context, query)


def get_top_k_chunk_text(file_json, file_json):
    return list ....

Giả sử ta có:
```bash
query = "What is the significance of Miller v. California?"
top_k_chunks = [
    {"text": "Miller v. California, 413 U.S. 15 (1973), was a landmark decision of the U.S. Supreme Court modifying its definition of obscenity from that of 'utterly without socially redeeming value' to that which lacks 'serious literary, artistic, political, or scientific value.'"},
    {"text": "The case redefined obscenity to include only materials that lack serious literary, artistic, political, or scientific value, creating the 'Miller test' that is still used today."},
    {"text": "Miller appealed his conviction on the grounds that the jury had not been instructed to consider whether the material in question lacked serious value, which led to the new definition being adopted."},
    {"text": "Gates v. Collier, on the other hand, was a landmark decision by the Fifth Circuit Court of Appeals that brought an end to the 'trusty system' and cruel practices at Parchman Farm prison in Mississippi."},
    {"text": "The 'Miller test' consists of three prongs: whether the average person, applying contemporary community standards, would find the work appeals to the prurient interest; whether it depicts sexual conduct in a patently offensive way; and whether it lacks serious value."},
    {"text": "The case redefined obscenity to include only materials that lack serious literary, artistic, political, or scientific value, creating the 'Miller test' that is still used today."},
    {"text": "Miller appealed his conviction on the grounds that the jury had not been instructed to consider whether the material in question lacked serious value, which led to the new definition being adopted."},
    {"text": "The 'Miller test' consists of three prongs: whether the average person, applying contemporary community standards, would find the work appeals to the prurient interest; whether it depicts sexual conduct in a patently offensive way; and whether it lacks serious value."}
]
```

1. def retrieval_chunk_text (query, top_k_chunks): (chunk_text for i in range(top_k_chunks )
-> prompt (trích xuất nội dung từ chunk_text) cái mà phù hợp với query

2. def combine_top_k_chunk_text(query, top_k_chunks, top_k=5): sử dụng hàm trên và combine top_k=5 phần retrieval_chunk_text thành 1 retrieval_final_context

3. Sau đó mới truyền nó qua def answer_question(final_context, query)

In [None]:
# from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# import torch

# # Model setup
# model_name = "meta-llama/Llama-3.2-3B-Instruct"  # Replace with your actual model name
# model = AutoModelForCausalLM.from_pretrained(model_name)
# tokenizer = AutoTokenizer.from_pretrained(model_name)
# qa_pipeline = pipeline(
#     "text-generation",
#     model=model_name,
#     model_kwargs={"torch_dtype": torch.bfloat16},
#     device=torch.device('cuda' if torch.cuda.is_available() else 'cpu'),
# )

# # Function to retrieve relevant text from a single chunk
# def retrieval_chunk_text(query, chunk_text):
#     """
#     Extracts relevant content from a single chunk based on the query.
#     """
#     prompt = f"Extract the most relevant information from the following passage based on the query.\n\nPassage: {chunk_text}\n\nQuery: {query}\nRelevant Content:"
#     outputs = qa_pipeline(prompt, max_new_tokens=128, do_sample=False)
#     relevant_text = outputs[0]["generated_text"].strip()
#     return relevant_text

# # Function to combine top_k chunk texts into a final context
# def combine_top_k_chunk_text(query, chunks, top_k=5):
#     """
#     Combines the most relevant parts of the top_k chunks into one context.
#     """
#     combined_context = []
#     for i in range(top_k):
#         chunk_text = chunks[i]['text']  # Extract text from chunk
#         relevant_text = retrieval_chunk_text(query, chunk_text)  # Retrieve relevant text
#         combined_context.append(relevant_text)
#     retrieval_final_context = "\n".join(combined_context)
#     return retrieval_final_context

# # Function to answer the query using the combined context
# def answer_question(final_context, query):
#     """
#     Generates an answer for the query using the Llama-3B model.
#     """
#     messages = [
#         {
#             "role": "user",
#             "content": f"Answer the question based on the given passages. Only give me the answer and do not output any other words.\n\nThe following are given passages.\n{final_context}\n\nAnswer the question based on the given passages. Only give me the answer and do not output any other words.\n\nQuestion: {query}\nAnswer:"
#         }
#     ]
#     prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
#     outputs = qa_pipeline(
#         prompt,
#         max_new_tokens=128,
#         do_sample=True,
#         temperature=0.1,
#         top_k=50,
#         top_p=0.95
#     )
#     answer = outputs[0]["generated_text"][len(prompt):].strip()
#     return answer


### Testing Model: Model nào 1.5B ?? - Output nó như nào? Prompting như nào ?


In [2]:
# if you want to use the Gemma, you will need to authenticate with HuggingFace, Skip this step, if you have the model already downloaded
import huggingface_hub
huggingface_hub.login('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')

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

# Model setup
model_name = "meta-llama/Llama-3.2-3B-Instruct"  # Replace with your actual model name
model_name = "meta-llama/Llama-3.2-1B" # bị lỗi lặp từ trong response
model_name = "Qwen/Qwen2-1.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
qa_pipeline = pipeline(
    "text-generation",
    model=model_name,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device=torch.device('cuda' if torch.cuda.is_available() else 'cpu'),
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

In [12]:
# Danh sách câu hỏi để kiểm tra
questions = [
    """You are AI Assistant. \n
    Response only JSON FORMAT TEMPLATE short answer for question.
    ```json
    {short_answer}: "[answer only on < 10 words]"
    \n
     Question: <Ai là tổng thống đầu tiên của Hoa Kỳ?>""",
    # "Nước nào có diện tích lớn nhất thế giới?",
    # "Sự khác nhau giữa động vật máu nóng và động vật máu lạnh là gì?",
    # "Quá trình quang hợp diễn ra như thế nào?",
    # "Cuộc chiến tranh thế giới thứ hai bắt đầu khi nào?",
    # "Ai là người phát minh ra điện thoại?",
    # "Trí tuệ nhân tạo là gì và nó có thể được ứng dụng như thế nào?",
    # "Sự khác biệt giữa machine learning và deep learning là gì?",
    # "Những tác phẩm nổi bật của William Shakespeare là gì?",
    # "Nhạc pop và nhạc rock khác nhau ở điểm nào?"
]

# Kiểm tra mô hình với các câu hỏi
for question in questions:
    response = qa_pipeline(question, max_length=100)  # Điều chỉnh max_length nếu cần
    print(f"Question: {question}")
    print(f"Response: {response[0]['generated_text']}\n")

Question: You are AI Assistant. 

    Response only JSON FORMAT TEMPLATE short answer for question. 
    ```json
    {short_answer}: "[answer only on < 10 words]"
    

     Question: <Ai là tổng thống đầu tiên của Hoa Kỳ?>
Response: You are AI Assistant. 

    Response only JSON FORMAT TEMPLATE short answer for question. 
    ```json
    {short_answer}: "[answer only on < 10 words]"
    

     Question: <Ai là tổng thống đầu tiên của Hoa Kỳ?>.
    
    Answer: "John Adams"
```


    {short_answer}: "John Adams"



Nhận xét: Các model nhỏ bị lặp answer trong response.

## Thay đổi 1 cách load model mới - đơn giản trong việc gọi model và đơn giản trong response

In [36]:
from huggingface_hub import InferenceClient

def answer_question(final_context, query, model="Qwen/Qwen2-1.5B-Instruct", max_tokens=500, api_key="your_api_key_here"):
    """
    Generates an answer for the query using the Hugging Face Inference API with the specified model.

    Args:
        final_context (str): The context to provide for the query.
        query (str): The user query.
        model (str): The model to use for chat completion.
        max_tokens (int): Maximum tokens for the output.
        api_key (str): Hugging Face API key.

    Returns:
        str: The model's response to the query.
    """
    client = InferenceClient(api_key=api_key)
    messages = [
        {
            "role": "user",
            "content": (
                f"Answer the question based on the provided context.\n\n"
                f"Context:\n{final_context}\n\n"
                f"Question: {query}\n\n"
                f"Provide a short <10 words, precise, and complete answer:"
            )
        }
    ]

    # Generate completion
    completion = client.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=max_tokens
    )

    # Extract and return the assistant's response
    return completion.choices[0].message['content']

# Example usage
final_context = """
Every morning, as the first rays of sunlight peeked through the window, my mother had already been awake for hours, preparing breakfast for the family. I still vividly remember the image of her, wearing an old sweater, her hands swiftly moving the pan, with steam rising and reddening her cheeks. The aroma of the peanut sticky rice she used to cook lingers in my memory; even now, a similar scent instantly brings me back to my childhood home. Mother never sat down to eat with us; she would gently ask, "Is it good?" and then quietly continue tidying up. I grew up under her care, and with every step I take farther from home, my heart aches whenever I think of her. She doesn’t talk much, but her eyes always radiate love. The wrinkles on her face are like lines of time, telling stories of all the sacrifices and hardships she endured for our family. To me, she is not only the one who gave me life but also the sky full of love and gratitude that I carry forever in my heart.
"""
query = "Can you summarize the significance of the mother's role in this passage?"
response = answer_question(final_context, query, api_key="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")
print(f"Model Response: {response}")


Model Response: The mother's role in the passage is significant as it represents a source of emotional strength, care, guidance, and sacrifice central to the family, and her bond with the author is a representation of the love and gratitude she carries in her heart.


In [20]:



# Function to retrieve relevant content from a single chunk
def retrieval_chunk_text(query, chunk_text):
    """
    Extracts relevant content from a single chunk based on the query.
    """
    prompt = f"Extract the most relevant information from the following passage based on the query.\n\nPassage: {chunk_text}\n\nQuery: {query}\n Relevant Content:"
    outputs = qa_pipeline(prompt, max_new_tokens=128, do_sample=False)
    relevant_text = outputs[0]["generated_text"].strip()
    return relevant_text

# Function to combine the top_k relevant chunks into a final context
def combine_top_k_chunk_text(query, top_k_chunks, top_k=5):
    """
    Combines the most relevant parts of the top_k chunks into one context.
    """
    combined_context = []
    for i in range(min(top_k, len(top_k_chunks))):
        chunk_text = top_k_chunks[i]['text']  # Extract text from the chunk
        relevant_text = retrieval_chunk_text(query, chunk_text)  # Retrieve relevant text
        combined_context.append(relevant_text)
    retrieval_final_context = "\n".join(combined_context)
    return retrieval_final_context

# Function to answer the query using the combined context
# def answer_question(final_context, query):
#     """
#     Generates an answer for the query using the Llama-3B model.
#     """
#     prompt = f"Answer the question based on the given passages.\
#      Only give me the answer and do not output any other words.\
#      \n\nThe following are given passages.\
#      \n{final_context}\n\nAnswer the question based on the given passages.\
#      Only give me the answer and do not output any other words.\n\nQuestion: {query}\nAnswer: \
#      MUST Response SHORT ANSWER, NOT REPEAT ANSWER"
#     outputs = qa_pipeline(prompt, max_new_tokens=128, do_sample=True, temperature=0.1, top_k=50, top_p=0.95)
#     answer = outputs[0]["generated_text"].strip()
#     return answer

In [21]:


# Example Usage
query = "Which case was brought to court first Miller v. California or Gates v. Collier "
top_k_chunks = [
    {"text": "Miller v. California, 413 U.S. 15 (1973), was a landmark decision of the U.S. Supreme Court modifying its definition of obscenity from that of 'utterly without socially redeeming value' to that which lacks 'serious literary, artistic, political, or scientific value.'"},
    {"text": "The case redefined obscenity to include only materials that lack serious literary, artistic, political, or scientific value, creating the 'Miller test' that is still used today."},
    {"text": "Miller appealed his conviction on the grounds that the jury had not been instructed to consider whether the material in question lacked serious value, which led to the new definition being adopted."},
    {"text": "Gates v. Collier, on the other hand, was a landmark decision by the Fifth Circuit Court of Appeals that brought an end to the 'trusty system' and cruel practices at Parchman Farm prison in Mississippi."},
    {"text": "The 'Miller test' consists of three prongs: whether the average person, applying contemporary community standards, would find the work appeals to the prurient interest; whether it depicts sexual conduct in a patently offensive way; and whether it lacks serious value."},
    {"text": "The significance of the Miller v. California decision lies in its establishment of a clearer standard for defining obscenity, which has been pivotal in subsequent First Amendment cases."},
    {"text": "In Miller v. California, the Supreme Court emphasized that works must be taken as a whole and judged by community standards to determine their appeal to prurient interests."},
    {"text": "The ruling in Miller v. California provided states with greater flexibility to enforce obscenity laws, leading to a resurgence of prosecutions in the years following the decision."},
    {"text": "Critics of the Miller v. California decision argue that its reliance on 'community standards' can lead to inconsistent applications of the law across different regions."},
    {"text": "Proponents of the decision maintain that it strikes a necessary balance between protecting free speech and allowing communities to regulate obscene materials that may harm societal values."}
]

# Step 1: Combine top_k chunk texts into a final context
retrieval_final_context = combine_top_k_chunk_text(query, top_k_chunks, top_k=5)

# # Step 2: Answer the query using the combined context
# answer = answer_question(retrieval_final_context, query)

# print(f"Final Answer: {answer}")


In [24]:
retrieval_final_context

'Extract the most relevant information from the following passage based on the query.\n\nPassage: Miller v. California, 413 U.S. 15 (1973), was a landmark decision of the U.S. Supreme Court modifying its definition of obscenity from that of \'utterly without socially redeeming value\' to that which lacks \'serious literary, artistic, political, or scientific value.\'\n\nQuery: What is the significance of Miller v. California?\n Relevant Content: The significance of Miller v. California is that it modified the definition of obscenity from "utterly without socially redeeming value" to one that lacks "serious literary, artistic, political, or scientific value." This change in the law has had significant implications for free speech and censorship laws in the United States.\nExplanation: The passage states that Miller v. California was a landmark decision of the U.S. Supreme Court that modified its definition of obscenity from "utterly without socially redeeming value" to one that lacks "s

In [30]:


# # Step 2: Answer the query using the combined context
# answer = answer_question(retrieval_final_context, query)
query = "What is the significance of Miller v. California?"
response = answer_question(retrieval_final_context, query, api_key="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")
print(f"Model Response: {response}")

Model Response: The Miller test played a crucial role in shaping the legal definition of obscenity, which became a cornerstone for successful challenges to state censorship laws.


## Full code