# **Proof of concepts in RAG**
Compare the resource usage of some small LLMs of Hugging Face in Advanced RAG. 

## View the resources used

In [1]:
import psutil
import GPUtil
import time


def performance_metrics(function):
    def wrapper(*args, **kwargs):
        gpus = GPUtil.getGPUs()
        ram_info = psutil.virtual_memory()
        init_ram = ram_info.available / (1024 ** 3)
        init_gpu = gpus[0].memoryFree
        start_time = time.time()

        out = function(*args, **kwargs)

        gpus = GPUtil.getGPUs()
        ram_info = psutil.virtual_memory()
        print(
            f'Used RAM: {round(init_ram - ram_info.available / (1024 ** 3),3)} GB')
        print(f'Used GPU: {round(init_gpu - gpus[0].memoryFree,3)} MB')
        print(f'Consumed time {time.time() - start_time} seconds')

        return out
    return wrapper

## Put all together

In [2]:
import os
# Hugging Face Tokenizers Warning: This warning is related to the parallelism feature in 
# Hugging Face’s tokenizers library. When a process that has already used parallelism
# gets forked, it can lead to deadlocks, which is why parallelism is disabled.

# Avoid using tokenizers before the fork if possible.
os.environ["TOKENIZERS_PARALLELISM"] = "false"


In [3]:
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index import SimpleDirectoryReader
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.embeddings import resolve_embed_model
from llama_index.llms import HuggingFaceLLM
from llama_index.prompts import PromptTemplate
from llama_index import VectorStoreIndex, StorageContext, ServiceContext
from llama_index import load_index_from_storage


@performance_metrics
def build_index(files, embed_model='local:BAAI/bge-large-en-v1.5',
                model='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
                model_kwargs={'trust_remote_code': True},
                generate_kwargs={"temperature": 0.7, "do_sample": True},
                persist_dir='LLM_TRAIN/sentence_index'):

    documents = SimpleDirectoryReader(input_files=files).load_data()

    node_parser = SentenceWindowNodeParser.from_defaults(
        window_size=3,
        window_metadata_key="window",
        include_prev_next_rel=True,
        original_text_metadata_key="original_text",
    )
    embed_model = resolve_embed_model(embed_model)
    llm = HuggingFaceLLM(
        model_name=model,
        tokenizer_name=model,
        query_wrapper_prompt=PromptTemplate(
            "<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"),
        context_window=2048,  # 4096
        max_new_tokens=256,  # 512
        model_kwargs=model_kwargs,
        generate_kwargs=generate_kwargs,
        device_map="auto",
    )

    sentence_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
        node_parser=node_parser,
    )

    if not os.path.exists(persist_dir):
        sentence_index = VectorStoreIndex.from_documents(
            documents, service_context=sentence_context
        )

        sentence_index.storage_context.persist(persist_dir=persist_dir)

    else:
        # print('Loading index from disk')
        sentence_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=persist_dir),
            service_context=sentence_context)

    return sentence_index


@performance_metrics
def query_index(index, question):

    # This takes a value stored in the metadata and replaces a node text
    postproc = MetadataReplacementPostProcessor(target_metadata_key="window")

    sentence_window_engine = index.as_query_engine(
        similarity_top_k=3,  # fetch the most similarity
        node_postprocessors=[postproc])

    result = sentence_window_engine.query(question)
    return {
        'query': question,
        'response': result.response,
        'documents': [node.metadata['file_name'] for node in result.source_nodes],
        'pages': [node.metadata['page_label'] for node in result.source_nodes]
    }

## **LLM**: TinyLlama/TinyLlama-1.1B-Chat-v1.0 [$^{\textbf{[source]}}$](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)


In [None]:
sentence_index = build_index(
    files=["LLM_TRAIN/eBook-How-to-Build-a-Career-in-AI.pdf"],
    embed_model="local:BAAI/bge-large-en-v1.5",
    model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    persist_dir="LLM_TRAIN/sentence_index",
)

Used RAM: 2.276 GB
Used GPU: 5623.0 MB
Consumed time 7.663482666015625 seconds


In [None]:
dict_response = query_index(sentence_index,
      "What are the keys to building a career in AI?")
dict_response

Used RAM: 0.047 GB
Used GPU: 254.0 MB
Consumed time 3.1760852336883545 seconds


{'query': 'What are the keys to building a career in AI?',
 'response': 'The keys to building a career in AI are learning foundational technical skills, working on projects, and finding a job. These steps stack on top of each other and are part of a broader process of gaining experience, building a portfolio, and finding a job. Chapters with focused topics on learning foundational technical skills include learning about AI concepts and methodologies, building a portfolio, and creating impact. The key to building a career in AI is to focus on learning foundational skills, working on projects, and finding a job, all of which are supported by being part of a community.',
 'documents': ['eBook-How-to-Build-a-Career-in-AI.pdf',
  'eBook-How-to-Build-a-Career-in-AI.pdf',
  'eBook-How-to-Build-a-Career-in-AI.pdf'],
 'pages': ['9', '6', '35']}

## **LLM:** zephyr-7b-alpha [$^{\textbf{[source]}}$](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) with quantization of 4-bits 


In [None]:
from transformers import BitsAndBytesConfig
import torch

# huggingface api token
# hf_token = 'hf_token'

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


sentence_index = build_index(
    files=["LLM_TRAIN/eBook-How-to-Build-a-Career-in-AI.pdf"],
    embed_model="local:BAAI/bge-large-en-v1.5",
    model="HuggingFaceH4/zephyr-7b-alpha",
    model_kwargs={"quantization_config": quantization_config},
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95, "do_sample":True},
    persist_dir="LLM_TRAIN/sentence_index",
)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Used RAM: 3.159 GB
Used GPU: 5991.0 MB
Consumed time 13.527836561203003 seconds


In [None]:
dict_response = query_index(sentence_index,
      "What are the keys to building a career in AI?")
dict_response

2024-01-27 23:46:56.917487: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-27 23:46:56.944786: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Used RAM: 0.084 GB
Used GPU: 294.0 MB
Consumed time 7.929395437240601 seconds


{'query': 'What are the keys to building a career in AI?',
 'response': 'According to the given context information, the keys to building a career in AI are:\n\n1. Learning foundational technical skills (Chapter 1)\n2. Working on meaningful projects to deepen skills, build a portfolio, and create impact (Chapter 2)\n3. Finding a job (Chapter 3)\n\nIn addition to these three key steps, here are additional things to think about as you plot your path to success:\n\n- Working in teams for large projects\n\nThese keys to building a career in AI are outlined in Chapter 10, which is covered in the eBook.',
 'documents': ['eBook-How-to-Build-a-Career-in-AI.pdf',
  'eBook-How-to-Build-a-Career-in-AI.pdf',
  'eBook-How-to-Build-a-Career-in-AI.pdf'],
 'pages': ['9', '6', '35']}

In [None]:
query_index(
    sentence_index, "What are steps to take when finding projects to build your experience?")

Used RAM: -0.001 GB
Used GPU: 4.0 MB
Consumed time 11.548146963119507 seconds


{'query': 'What are steps to take when finding projects to build your experience?',
 'response': '1. Identify areas that interest you: Look for industries and sectors that align with your interests and passions.\n2. Research and gather information: Once you have identified potential areas, research to understand the current trends, challenges, and opportunities.\n3. Network and collaborate: Connect with people who have experience in those sectors and work collaboratively to gain insights and build relationships.\n4. Identify meaningful projects: Consider the technical complexity and business impact of potential projects and determine which ones could serve as meaningful stepping stones.\n5. Avoid analysis paralysis: Don\'t spend too much time deciding which project to work on. Choose one that interests you and move forward with it.\n6. Gain experience: Work on projects that will help you gain experience and build your portfolio.\n7. Learn from others: Collaborate with others and learn 

## **LLM:** Phi-1_5: a 1.3B by Microsoft Research [$^{\textbf{[source]}}$](https://huggingface.co/microsoft/phi-1_5)

In [5]:
# Phi-2 does not fit in memory
# {Used RAM: 5.811 GB
# Used GPU: 10429.0 MB
# Consumed time 197. seconds}

sentence_index = build_index(
    files=["LLM_TRAIN/eBook-How-to-Build-a-Career-in-AI.pdf"],
    embed_model='local:BAAI/bge-large-en-v1.5',
    model =   "microsoft/phi-1_5",
    persist_dir="LLM_TRAIN/sentence_index",
)

Used RAM: 4.068 GB
Used GPU: 6819.0 MB
Consumed time 9.75664210319519 seconds


In [7]:
dict_response = query_index(sentence_index,
      "What are the keys to building a career in AI?")
dict_response

Used RAM: 0.007 GB
Used GPU: 0.0 MB
Consumed time 5.750103712081909 seconds


{'query': 'What are the keys to building a career in AI?',
 'response': 'Context information is below.\n---------------------\npage_label: 35\nfile_path: LLM_TRAIN/eBook-How-to-Build-a-Career-in-AI.pdf\n\nPAGE 35Keys to Building a Career in AI CHAPTER 11\nThe path to career success in AI is more complex than what I can  cover in one short eBook. \n Hopefully the previous chapters will give you momentum to move forward. \n Here are additional things to think about as you plot your path to success: \nWhen we tackle large projects, we succeed better by \nworking in teams than individually.\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: What are the keys to building a career in AI?\nAnswer: </s>\n<|assistant|>\nContext information is below.\n---------------------\npage_label: 35\nfile_path: LLM_TRAIN/eBook-How-to-Build-a-Career-in-AI.pdf\n\nPAGE 35Keys to Building a Career in AI CHAPTER 12\nThe path to career success in AI is more c

## **LLM:** TinyDolphin-2.8-1.1b [$^{\textbf{[source]}}$](https://huggingface.co/cognitivecomputations/TinyDolphin-2.8-1.1b)

In [4]:
# Phi-2 does not fit in memory
# {Used RAM: 5.811 GB
# Used GPU: 10429.0 MB
# Consumed time 197. seconds}

sentence_index = build_index(
    files=["LLM_TRAIN/eBook-How-to-Build-a-Career-in-AI.pdf"],
    model =   "cognitivecomputations/TinyDolphin-2.8-1.1b",
    embed_model='local:BAAI/bge-large-en-v1.5',
    persist_dir="LLM_TRAIN/sentence_index",
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Used RAM: 2.215 GB
Used GPU: 5649.0 MB
Consumed time 7.947089195251465 seconds


In [5]:
dict_response = query_index(sentence_index,
      "What are the keys to building a career in AI?")
dict_response

Used RAM: 0.025 GB
Used GPU: 256.0 MB
Consumed time 6.055798053741455 seconds


{'query': 'What are the keys to building a career in AI?',
 'response': "As a learning machine, I am eager to help you build a career in AI. I'll provide you with the \nfoundational skills you need, along with valuable projects and resources to help you \naccelerate your journey. \n\nIn more detail, here are some key steps to building a career in AI:\n\n1. **Learning Foundational Skills**: The first step in building a career in AI is learning the principles and techniques of building artificial intelligence. This includes concepts such as data science, machine learning, and advanced computing.\n\n2. **Working on Projects**: Building a career in AI is all about creating impactful solutions that solve real-world problems. This involves working on projects that challenge your skills and drive your career forward.\n\n3. **Finding a Job**: Once you have the skills and projects you've developed, you'll need to find a job. There are many job opportunities in the field of artificial intelligen

## **Overview**

| LLM                         | Used RAM | Used GPU | Time (seconds)        | Notes                                          |
|-----------------------------|----------|----------|-----------------------|------------------------------------------------|
| TinyLlama-1.1B-Chat-v1.0    | 2.276 GB | 5623.0 MB| 7.66348               | Respuestas acordes                                      | 
| zephyr-7b-alpha (4-bit)     | 3.159 GB | 5991.0 MB| 13.5278               |   Respuestas acordes                                             |
| Phi-2: a 2.7B               | 5.811 GB | 10429.0 MB| N/A                   | no queda memoria para las prediciones                          |
| Phi-1_5: a 1.3B             | 4.068 GB | 6819.0 MB| 9.75664               | Respuestas de baja calidad, reproduce el contexto  |
|   TinyDolphin-2.8-11b                          |    2.215 GB       |      5649.0 MB    |      7.94708                 |     Respuestas acordes                                           |
|                             |          |          |                       |                                                |
|                             |          |          |                       |                                                |
