# IUST Computer Engineering Department 🏫
## Introduction to Natural Language Processing 📚 (The Final Project)
### Course Instructor: Dr. Marzieh Davoodabadi Farahani 👩‍🏫
### Project Teaching Assistant: Erfan Moosavi Monazzah (tel: @ErfanMoosavi2000) 📞
-------------------------------------------------------------------------------<br>
The objective of this project is to acquaint you with the fundamentals of Retrieval Augmented Generation (RAG). Be sure to explore various options and address challenges in a creative manner. 🎯

**Project Guidelines** 📝
- Avoid cheating at all costs. If a set of submissions is found to be [plagiarized](https://translate.google.as/?sl=en&tl=fa&text=Very%20hard%20word%2C%20I%20know%2C%20here%27s%20the%20meaning%3A%0Aplagiarized&op=translate), only one will be randomly chosen for grading. The others will fail the project. ❌
- You are allowed to use any document, article, paper, or video as a resource for writing your code, provided you include a link to the material used. 📖
- The use of Language Learning Models (LLMs), ChatBots, and Copilots is encouraged. If you utilize any of these tools, make sure to attach the chat history that led you to the answer to your question, or the code, to this .ipynb document. (You must provide the entire chat, not just the final answer or your initial prompt.) 💻
- You may not submit any additional documents, files, etc., along with this document. Only solutions, codes, explanations, etc., in this document will be graded. 📄
- You are required to implement everything (except the Language Modeling parts) from scratch. The use of libraries like langchain, llama_index, etc., is not permitted for this purpose. 🚫
- Please adhere to the code guidelines provided throughout the documents. 📝 I’ve spent time in a library 📚 crafting all of this, so if you overlook them, you’ll lose the points allocated for that section. ❌
- We need to use GPUs for this assignment, don't forget to turn on GPU usage for your notebook session.

-------------------------------------------------------------------------------<br>
# Alright, let's get started. 🚀

## What is RAG? 🤔
We've all used ChatGPT and experienced moments when it starts to generate content that is often incorrect or unrelated to our query. Do you know why this happens? These Large Language Models (LLMs) are not magical entities; they are simply models trained on a vast amount of text. 📚 You could even consider a significant portion of the internet. However, this is not all the data available in the world, because data is not a static concept. You yourself generate some data every day through your use of the Internet, Social Media, and so on. 🌐💻📱

So, no matter how much data you use to train your LLM, you always end up encountering new data. This is one of the reasons behind the famous ChatGPT response that tells you it only knows things up to a certain date. 📅 Also, these models tend to hallucinate too. It means they provide incorrect answers but in a very convincing manner. 🎭

On the other hand, we have retrieval techniques. Don't worry if it sounds complicated (it actually isn't easy, you may need to take a course to familiarize yourself with these concepts 😅, but that's not necessary for this project), but you use it on a daily basis. You can think of Search Engines (like Google, for example) as a complex form of information retrieval. 🔍

So, one day, people came up with this idea that it would be cool if ChatGPT could search Google for us, read the articles for us, summarize what it read, and tell us that. 📖 So, this is not exactly what RAG is, but it's something similar. We have a corpus (a large amount of data) and a query (what a user typed as input). Now, we search through this corpus using techniques related to vectors and vector databases, and find the most similar items in our corpus to the query. Then, we pass these items to an LLM and ask for a structured, well-formatted, user-friendly output. 📈📊

## I'm Interested in the Technical Details, What Should I Read? 📚🔍
- I strongly recommend reading the [original RAG paper](https://arxiv.org/abs/2005.11401). If you need help understanding the paper or have any questions about it, feel free to reach out to me via Telegram or find me on the second floor of the department in the NLP lab on Sundays and Tuesdays. 📖
- There appears to be a [comprehensive 2.5-hour course](https://www.freecodecamp.org/news/mastering-rag-from-scratch/) available. I haven't personally watched it, but if you find a better one, let me know so I can update this document. 🎥
- Here is [an article](https://www.smashingmagazine.com/2024/01/guide-retrieval-augmented-generation-language-models/) that explains the concepts very well. Initially, I wanted to use this article as the basis for this project, but unfortunately, the llama_index library used in the article seems to be outdated, so most of the code would need to be rewritten. On second thought, I found it more useful to focus on core concepts rather than learning specific libraries. You might want to check out some libraries like langchain or llama_index which provide a lot of tools for RAG. (But not for this project) 📝💡
- Don't hesitate to use Google, ask chatbots about any new concepts and terms. If you use search engine-aware chatbots like Microsoft Copilot, they provide links for each part of their answers which is useful if you want to delve deeper into that part. 🌐🤖
- Lastly, we have [the article](https://learnbybuilding.ai/tutorials/rag-from-scratch) that serves as the foundation for this project. 📚🔍

# Learn
First, we’re going to go through a simple RAG implementation. It’s going to be similar to the article, except for the (LLM) part. For that, I’m going to use Hugging Face. 🤗 I’ll also try to explain the code in simple terms, but feel free to read the article if you prefer their writing style.

## Let's Install the Necessary Libraries 📚🔧
Did you know that using the `--quiet` or `-q` option with the `pip install` command minimizes the output displayed on your screen? 🖥️ This can make your terminal less cluttered. Also, using `-U` will upgrade the libraries if they were previously installed. This is particularly useful for certain libraries like `transformers` that are frequently updated. 🔄

In [1]:
!pip install -U accelerate transformers --quiet

## Gather a Corpus 📚
Technically, a corpus refers to a large and structured set of texts. However, for the sake of our discussion, let’s consider our collection as a “corpus”, even though it might not be large in the traditional sense. 😉

In [None]:
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

## Create a Retriever 🕵️‍♂️
Now, we’re going to create a simple retriever. The role of the retriever is to compare the user’s query with a large corpus of text and find those that are most similar in context. (You know what context is by now, don’t you? 😊 If you’ve forgotten, refer back to your initial lectures). For now, let’s say we want to find similar text based on simple similarity metrics. The code is straightforward, and I have faith in you, chief! Dive into the code. 👨‍💻

In [None]:
def jaccard_similarity(query, document):
    query = query.lower().split(" ")
    document = document.lower().split(" ")
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

Hey, you may want to look at wikipedia page for [Jaccard Similarity](https://en.wikipedia.org/wiki/Jaccard_index).

In [None]:
def return_response(query, corpus):
    similarities = []
    for doc in corpus:
        similarity = jaccard_similarity(user_input, doc)
        similarities.append(similarity)
    return corpus_of_documents[similarities.index(max(similarities))]

## Create a Generator 🖥️
Now, we’re going to create a generator. This will help us compile the information retrieved into a well-structured and user-friendly text.

OK, let's say in a senario, we ask user what they like to do, the their answer is this:

In [None]:
user_input = "I like to hike"

Now by using the retrieval model I find this activity that best fits this user.

In [None]:
relevant_document = return_response(user_input, corpus_of_documents)
print(relevant_document)

Go for a hike and admire the natural scenery.


The answer seems good enough, but we can do better, yeah?

Let’s import a Language Model. I’m going to try out Microsoft Phi-3 because it recently hit the market, and I haven’t had a chance to try it for myself yet. So, I’m seizing this opportunity to do so! 😊👨‍💻

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

Downloading the model gonna take a while, use this time to rest your eyes for a bit. 😊👀💤

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

Now we try to get the LLM to become our generator. We simply place the retrieved information and user query in the following prompt and ask the model for well formatted text.

In [None]:
prompt = """You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended activity and the user input."""

In [None]:
prompt = prompt.replace("{relevant_document}", relevant_document).replace("{user_input}", user_input)
print(prompt)

You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: Go for a hike and admire the natural scenery.
The user input is: I like to hike
Compile a recommendation to the user based on the recommended activity and the user input.


In [None]:
messages = [
    {"role": "user", "content": prompt},
]

Here's the augmented generated text

In [None]:
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])



 Based on your interest in hiking and the recommended activity, I suggest you plan a hike in a beautiful natural setting. Look for trails that offer stunning views of mountains, forests, or water bodies. This will allow you to fully enjoy the serenity of nature, appreciate the scenic beauty, and experience the physical benefits of hiking. Don't forget to pack essentials like water, snacks, and appropriate gear for your adventure. Happy hiking!


## Very Cool, but Not Perfect! 😎👌
Alright, you’ve just seen a very basic example of RAG. However, there are some issues present. The corpus is small, and the documents in the corpus are short sentences, which causes the Language Model (LM) to generate some text on its own. 📚🤖

Also, our retriever is not very efficient and it may encounter bugs in some cases. For instance, even when users specify that they are not interested in a certain activity, the retriever might still bring up that activity for them. 🐜🔍

So, in this project, you’re going to address some of these issues. The rest of this document consists of some empty cells and tips for you on how to fill them with code. Let’s get coding! 👨‍💻🚀

# The Project

## Determine Your Task 🎯
What do you aim to implement with RAG? A recommender system? 🎁 A chatbot for a website’s FAQ? 💬 A medical advisor? 🩺 Or perhaps something else entirely?

Specify your objective in this cell.

In [None]:
task_title = "Book Recommendation System"
url_for_more_information = "https://huggingface.co/datasets/booksouls/goodreads-book-descriptions"

print(f"My task is: {task_title}")
print(f'For more information see: {url_for_more_information}')

My task is: Book Recommendation System
For more information see: https://huggingface.co/datasets/booksouls/goodreads-book-descriptions


## 🧐 Find or gather a corpus
Remember the fake corpus? 📚 It’s time to switch things up and use something real. 🌐 You need to use a dataset from  [huggingface datasets](https://huggingface.co/datasets) for this project. 🚀 Don’t use files that are outside of this notebook, this notebook should be able to run on its own without depending on anything external. 💻👍


In [3]:
!pip install datasets



In [4]:
from datasets import load_dataset

# Load the Goodreads dataset
dataset = load_dataset('booksouls/goodreads-book-descriptions')
print(dataset['train'][0])
print(dataset['train'][1])
print(dataset['train'][2])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/317M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/314M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1021106 [00:00<?, ? examples/s]

{'title': 'Good Harbor', 'description': 'Anita Diamant\'s international bestseller "The Red Tent" brilliantly re-created the ancient world of womanhood. Diamant brings her remarkable storytelling skills to "Good Harbor" -- offering insight to the precarious balance of marriage and career, motherhood and friendship in the world of modern women. The seaside town of Gloucester, Massachusetts is a place where the smell of the ocean lingers in the air and the rocky coast glistens in the Atlantic sunshine. When longtime Gloucester-resident Kathleen Levine is diagnosed with breast cancer, her life is thrown into turmoil. Frightened and burdened by secrets, she meets Joyce Tabachnik -- a freelance writer with literary aspirations -- and a once-in-a-lifetime friendship is born. Joyce has just bought a small house in Gloucester, where she hopes to write as well as vacation with her family. Like Kathleen, Joyce is at a fragile place in her life.\nA mutual love for books, humor, and the beauty of 

## 📝 Create some queries
I want you to create 20 queries related to your task. You can use any Language Model you want for this matter, or if you’re feeling strong 💪 and have the time, write it yourself. 🖊️

You need to create a Hugging Face account, format your 20 queries into the accepted dataset format for Hugging Face 🤗 and push it to your Hugging Face account. Be sure to make it public and use it for the evaluation task. 👀

In [5]:
import pandas as pd

# Create a DataFrame with the queries
queries = [
    "Can you recommend a good mystery novel?",
    "Recommend popular fantasy book?",
    "I'm looking for a romance novel with a happy ending.",
    "Suggest a science fiction book set in space.",
    "What is the best historical fiction book?",
    "Can you recommend a book similar to 'To Kill a Mockingbird'?",
    "What is a good book for young adults?",
    "I like thrillers with plot twists. Any suggestions?",
    "Suggest a non-fiction book about psychology.",
    "What is a classic novel everyone should read?",
    "Can you recommend a good biography?",
    "I'm interested in self-help books. Any recommendations?",
    "What is a good dystopian novel?",
    "Suggest a book with strong female leads.",
    "Can you recommend a book on personal finance?",
    "What is the best horror book of all time?",
    "Suggest a light-hearted comedy book.",
    "What is a good book about World War II?",
    "I'm looking for a children's book for a 10-year-old.",
    "Can you recommend a book with a complex plot?"
]

data = {'query': queries}
df = pd.DataFrame(data)

# Save to CSV
df.to_csv('queries.csv', index=False)

In [10]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
from datasets import Dataset

# Load the CSV file into a DataFrame
df = pd.read_csv('queries.csv')

# Convert the DataFrame to a Hugging Face Dataset
queries = Dataset.from_pandas(df)

# Push the dataset to Hugging Face
queries.push_to_hub('EhsanAhmadpoor/book-recommendation-queries')

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/262 [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/datasets/EhsanAhmadpoor/book-recommendation-queries/commit/e55220ec11ec473e2fa5a33e0a7143d427a28d8c', commit_message='Upload dataset', commit_description='', oid='e55220ec11ec473e2fa5a33e0a7143d427a28d8c', pr_url=None, pr_revision=None, pr_num=None)

## 🛠️ Create a Retriever
To create your retriever, you need to use an encoder model. Something like BERT? Nah, BERT is so yesterday. Find something new and shiny! ✨ The basic idea is to encode every document (sentence) in your corpus into a vector space using the same encoder. Then, encode the user query into that same space. With some similarity metrics like dot product, you can find the most similar document to the user’s input and retrieve it. 🎯 You can train your own encoder if you have enough data and resources, 💪 or you can use one of those [ready-made on Hugging Face](https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=trending), like these ones.

In [6]:
!pip install -U sentence-transformers --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/227.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/227.1 kB[0m [31m651.8 kB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.0/227.1 kB[0m [31m684.2 kB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━[0m [32m143.4/227.1 kB[0m [31m1.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [7]:
from sentence_transformers import SentenceTransformer, util
import torch

# Load the pre-trained model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Encode the corpus
corpus = dataset['train']  # Limiting to 1000 for demonstration
corpus_embeddings = model.encode(corpus['description'][:1000], convert_to_tensor=True)

def retrieve_document(query, corpus_embeddings, corpus):
    query_embedding = model.encode(query, convert_to_tensor=True)
    # Compute cosine similarity between the query and all corpus documents
    cosine_scores = util.pytorch_cos_sim(query_embedding, corpus_embeddings)
    # Find the document with the highest score
    top_result_idx = torch.argmax(cosine_scores)
    print(top_result_idx)
    return corpus['title'][top_result_idx],corpus['description'][top_result_idx]

# Example usage
user_query = "What is a good book about World War II?"
relevant_document_title, relevant_document_des = retrieve_document(user_query, corpus_embeddings, corpus)
print(f"title:\n----------------\n{relevant_document_title}\n----------------\n")
print(f"description:\n----------------\n{relevant_document_des}\n----------------\n")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tensor(344)
title:
----------------
Hitler's War Directives, 1939-1945
----------------

description:
----------------
The Second World War was Hitler's personal war in many senses. He intended it, prepared for it, chose the moment for launching it, planned its course, and, on several occasions between 1939 and 1942, claimed to have won it.
Although the aims he sought to achieve were old nationalist aspirations, the fact that the policy and strategy for their realization were imposed so completely by Hitler meant that if victory had come, it would have been very much a personal triumph: the ultimate failure was thus a personal one too.
This book presents all of Hitler's directives, from preparations for the invasion of Poland (31 August 1939) to his last desperate order to his troops on the Eastern Front (15 April 1945), whom he urges to choke the Bolshevik assault 'in a bath of blood'. They provide a fascinating insight into Hitler's mind and how he interpreted and reacted to events a

## 🎛️ Create a Generator
For this part, I practically handed you the whole code on a silver platter. 🍽️ But since we know you’re an explorer at heart and love trying new things, you can’t use the model I previously used. 😈 You have to try 3 different generators and compare them based on the quality of their answers. 🧪📊 [These might come in handy](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending).

In [8]:
prompt = """You are a bot that gives recommendations for books. Try to be a helpful recommender system.
    This is the book title you should recommend: {relevant_document_title}
    This is the book description you should recommend:{relevant_document_des}
    The user input is: {user_query}
    Compile a recommendation to the user based on the recommended review and the user input."""
prompt = prompt.replace("{relevant_document_title}", relevant_document_title).replace("{relevant_document_des}", relevant_document_des).replace("{user_query}", user_query)
print(prompt)

You are a bot that gives recommendations for books. Try to be a helpful recommender system.
    This is the book title you should recommend: Hitler's War Directives, 1939-1945
    This is the book description you should recommend:The Second World War was Hitler's personal war in many senses. He intended it, prepared for it, chose the moment for launching it, planned its course, and, on several occasions between 1939 and 1942, claimed to have won it.
Although the aims he sought to achieve were old nationalist aspirations, the fact that the policy and strategy for their realization were imposed so completely by Hitler meant that if victory had come, it would have been very much a personal triumph: the ultimate failure was thus a personal one too.
This book presents all of Hitler's directives, from preparations for the invasion of Poland (31 August 1939) to his last desperate order to his troops on the Eastern Front (15 April 1945), whom he urges to choke the Bolshevik assault 'in a bath 

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

model_id = "NousResearch/Meta-Llama-3-8B-Instruct"
meta_llama_pipe = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

outputs = meta_llama_pipe(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

print(f"\nModel: Meta-Llama\nResponse:")
print(outputs[0]["generated_text"])

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Model: Meta-Llama
Response:
You are a bot that gives recommendations for books. Try to be a helpful recommender system.
    This is the book title you should recommend: Hitler's War Directives, 1939-1945
    This is the book description you should recommend:The Second World War was Hitler's personal war in many senses. He intended it, prepared for it, chose the moment for launching it, planned its course, and, on several occasions between 1939 and 1942, claimed to have won it.
Although the aims he sought to achieve were old nationalist aspirations, the fact that the policy and strategy for their realization were imposed so completely by Hitler meant that if victory had come, it would have been very much a personal triumph: the ultimate failure was thus a personal one too.
This book presents all of Hitler's directives, from preparations for the invasion of Poland (31 August 1939) to his last desperate order to his troops on the Eastern Front (15 April 1945), whom he urges to choke the 

In [11]:
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b",
    quantization_config=quantization_config)

input_text = prompt
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

tokenizer_config.json:   0%|          | 0.00/40.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/856 [00:00<?, ?B/s]

RuntimeError: No GPU found. A GPU is needed for quantization.

In [12]:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-72B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-72B-Instruct")

prompt = prompt
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

config.json:   0%|          | 0.00/663 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/79.0k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/37 [00:00<?, ?it/s]

model-00001-of-00037.safetensors:   0%|          | 0.00/3.76G [00:00<?, ?B/s]

model-00002-of-00037.safetensors:   0%|          | 0.00/4.00G [00:00<?, ?B/s]

model-00003-of-00037.safetensors:   0%|          | 0.00/3.81G [00:00<?, ?B/s]

KeyboardInterrupt: 

## 📊 Evaluate the results
Here, you’ve got to put those 3 models to the test. Use the 20 queries you’ve created on each of the 3 models. Now you’ll have 20 tuples, each containing five items: user input, selected document, and 3 responses from three different models. Use a judge model on each tuple to select the best answer. 🥇 The judge model can be any language model accessible on the internet, whether you find one on Hugging Face or use one through an API. 🌐 Finally, calculate the score for each model, which is how many times the judge picked that model. 🏆

In [None]:
from transformers import pipeline

# Use a judge model
judge_model = pipeline("text-classification", model="textattack/bert-base-uncased-SST-2")

def evaluate_responses(user_query, relevant_document_title, relevant_document_des, responses):
    prompt = f"""Which of the following responses is the best recommendation for the given query and document?
Query: {user_query}
Book Title: {relevant_document_title}
Book Description: {relevant_document_des}

Responses:
1. {responses[0]}
2. {responses[1]}
3. {responses[2]}

Please respond with the number of the best response."""

    judge_response = judge_model(prompt)
    return judge_response

# Example usage with responses from Meta-Llama, Gemma, and Qwen
responses = [meta_llama_response, gemma_response, qwen_response]
best_response = evaluate_responses(user_query, relevant_document_title, relevant_document_des, responses)

print("Best Response:\n", best_response)


### Now that I'm writing this message, it's 3 in the morning and I'm tired as fox. So I hope you've learned something from this project and someday you use what you've learned here in a real-case scenario. Good Luck! ✌️