Source: https://medium.com/@ronantech/cache-augmented-generation-cag-in-llms-a-step-by-step-tutorial-6ac35d415eec

Source: https://colab.research.google.com/drive/1-0eKIu6cGAZ47ROKQaF6EU-mHtvJBILV?usp=sharing

Source: https://arxiv.org/pdf/2412.15605v1

With **T4 GPU**

In [1]:
from IPython.display import Markdown
from datetime import datetime
import torch, transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.cache_utils import DynamicCache
import os

In [2]:
print(f"torch version: {torch.__version__}")
print(f"transformers version: {transformers.__version__}")

torch version: 2.5.1+cu121
transformers version: 4.47.1


In [12]:
def generate(model, input_ids: torch.Tensor, past_key_values, max_new_tokens: int = 512) -> torch.Tensor:
    device = model.model.embed_tokens.weight.device
    origin_len = input_ids.shape[-1]
    input_ids = input_ids.to(device)
    output_ids = input_ids.clone()
    next_token = input_ids

    with torch.no_grad():
        for _ in range(max_new_tokens):
            out = model(
                input_ids=next_token,
                past_key_values=past_key_values,
                use_cache=True
            )
            logits = out.logits[:, -1, :]
            token = torch.argmax(logits, dim=-1, keepdim=True)
            output_ids = torch.cat([output_ids, token], dim=-1)
            past_key_values = out.past_key_values
            next_token = token.to(device)

            if model.config.eos_token_id is not None and token.item() == model.config.eos_token_id:
                break
    return output_ids[:, origin_len:]

In [4]:
torch.serialization.add_safe_globals([DynamicCache])
torch.serialization.add_safe_globals([set])

def get_kv_cache(model, tokenizer, prompt: str) -> DynamicCache:
    device = model.model.embed_tokens.weight.device
    input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(device)
    cache = DynamicCache()

    with torch.no_grad():
        _ = model(
            input_ids=input_ids,
            past_key_values=cache,
            use_cache=True
        )
    return cache

def clean_up(cache: DynamicCache, origin_len: int):
    for i in range(len(cache.key_cache)):
        cache.key_cache[i] = cache.key_cache[i][:, :, :origin_len, :]
        cache.value_cache[i] = cache.value_cache[i][:, :, :origin_len, :]

In [5]:
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

In [6]:
# model_name = "mistralai/Mistral-7B-Instruct-v0.1"
# model_name = "openai-community/gpt2"
model_name = 'meta-llama/Llama-3.2-1B-Instruct'
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HF_TOKEN, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
    trust_remote_code=True,
    token=HF_TOKEN
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
print(f"Loaded {model_name}.")

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

Loaded meta-llama/Llama-3.2-1B-Instruct.


In [7]:
with open("/content/Licence to Think - Abstract.txt", "r", encoding="utf-8") as f:
    doc_text = f.read()

# # System Prompt - Option 1
# system_prompt = f"""
# <|system|>
# You are an assistant who provides concise factual answers.
# <|user|>
# Context:
# {doc_text}
# Question:
# """.strip()

# System Prompt - Option 2
instruction="You are an assistant who provides concise factual answers."
system_prompt = f"""<|begin_of_text|><|start_header_id|>system<end_header_id>{instruction}<|eot_id|><|start_header_id|>user<|end_header_id|>{doc_text}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

start_time=datetime.now()
ronan_cache = get_kv_cache(model, tokenizer, system_prompt)
end_time=datetime.now()
print(f"KV cache built in {end_time-start_time}")
origin_len = ronan_cache.key_cache[0].shape[-2]
print("KV cache built.")

KV cache built in 0:00:01.008072
KV cache built.


In [17]:
start_time=datetime.now()
user="Give complete details about author of this document. Share your response in structured format."
system="Always return a complete response"
clean_up(ronan_cache, origin_len)
formatted_chat=f"""<|begin_of_text|><|start_header_id|>system<end_header_id>{system}<|eot_id|><|start_header_id|>user<|end_header_id|>{user}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
input_ids_q1=tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False).input_ids.to(device)
gen_ids_q1=generate(model, input_ids_q1, ronan_cache, max_new_tokens=512)
answer1=tokenizer.decode(gen_ids_q1[0], skip_special_tokens=True)
end_time=datetime.now()
print(f"Generation Time: {end_time-start_time}\n")
# print("Q1:", question1)
# print("A1:", answer1)
Markdown(answer1.split("assistant")[0]) # .split("assistant")[0]

Generation Time: 0:00:11.287203





Here is the information about the author of the document:

**Author Name:** Dipankar Porey

**Title:** Licence to Think

**Email Address:** dipankarporey1996@gmail.com

**Phone Number:** +91-8972834354

**LinkedIn Profile:** LinkedIn Profile: linkedin.com/in/dipankarporey1996

**GitHub Profile:** GitHub Profile: github.com/dipankarporey1996

**IIT Guwahati:** IIT Guwahati is a premier institution for engineering and technology education in India. It is a public research university located in Guwahati, Assam, India.

In [14]:
start_time=datetime.now()
question1="summarize the document. Share your response in structured format."
instruction="Always return a complete response"
clean_up(ronan_cache, origin_len)
formatted_chat = f"""<|begin_of_text|><|start_header_id|>system<end_header_id>{instruction}<|eot_id|><|start_header_id|>user<|end_header_id|>{question1}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
input_ids_q1 = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False).input_ids.to(device)
gen_ids_q1 = generate(model, input_ids_q1, ronan_cache, max_new_tokens=512)
answer1 = tokenizer.decode(gen_ids_q1[0], skip_special_tokens=True)
end_time=datetime.now()
print(f"Generation Time: {end_time-start_time}\n")
# print("Q1:", question1)
# print("A1:", answer1)
Markdown(answer1.split("assistant")[0])

Generation Time: 0:00:10.549356





**Licence to Think: A Novel Approach to Conditional Generative Adversarial Networks (GANs)**

**Abstract**

The document discusses the challenges of training Generative Adversarial Networks (GANs) and proposes a novel approach to address these challenges. GANs are a class of models that can generate high-quality images, audio, and data without explicit guidance. However, training GANs can be difficult due to stagnation and instability. The authors introduce a new concept that enables autonomy in the model, allowing it to independently orchestrate its thought processes.

**Key Points**

1. **Challenges in GAN Training**: GANs face challenges such as stagnation, lack of diversity, and instability due to the intricacies of network design, goal formulation, and training methodologies.
2. **Backpropagation**: The authors question the underlying process of backpropagation, which is a fundamental concept in deep learning, and how it relates to the GAN framework.
3. **Autonomy in GANs**: The authors propose a novel approach to autonomy in GANs, which enables the model to independently identify patterns and assign responsibilities based on specific conditions or classes.

**Approach**

The authors introduce two approaches to achieve autonomy in GANs:

1. **Single Arbitrary Controller**: Introducing a single arbitrary controller into both the discriminator and generator.
2. **List of Arbitrary Controllers**: Introducing a list of arbitrary controllers into both the discriminator and generator.

**Importance of Conditions**

The authors highlight the importance of conditions in the context of Conditional Generative Adversarial Networks (CGANs). They argue that conditions can aid in capturing internal patterns within the generator and discriminator, leading to robust and independent learning patterns.

**Dynamic Conditions**

The authors propose a dynamic approach, where conditions are introduced as randomness and variation in external conditions, encouraging independence in learning patterns. This approach aims to deepen the relationship between the discriminator and generator, fostering robust and independent learning patterns.

**Conclusion**

The authors conclude that their approach can provide a novel solution to the challenges faced by GANs, enabling autonomy in the model and promoting robust and independent learning patterns.

In [15]:
start_time=datetime.now()
question1="Tell about the new approach. Share your response in structured format."
instruction="Always return a complete response"
clean_up(ronan_cache, origin_len)
formatted_chat=f"""<|begin_of_text|><|start_header_id|>system<end_header_id>{instruction}<|eot_id|><|start_header_id|>user<|end_header_id|>{question1}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
input_ids_q1=tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False).input_ids.to(device)
gen_ids_q1=generate(model, input_ids_q1, ronan_cache, max_new_tokens=1024)
answer1=tokenizer.decode(gen_ids_q1[0], skip_special_tokens=True)
end_time=datetime.now()
print(f"Generation Time: {end_time-start_time}\n")
# print("Q1:", question1)
# print("A1:", answer1)
Markdown(answer1.split("assistant")[0])

Generation Time: 0:00:21.793402





**New Approach: Conditional Generative Adversarial Networks (CGANs) with Autonomous Control**

**Introduction**

The traditional Generative Adversarial Networks (GANs) have garnered significant attention in recent years due to their ability to generate high-quality images, audio, and data. However, one of the significant challenges in training GANs is the lack of diversity in generated outputs, leading to stagnation and instability. To address this issue, a novel approach is proposed, which introduces an autonomous control mechanism into the GAN framework.

**Approach Overview**

The proposed approach, called Conditional Generative Adversarial Networks (CGANs), involves introducing a single arbitrary controller into both the discriminator and generator. This controller is designed to determine the appropriate responsibilities for distinct class labels, retain the memory of these specific duties, and govern the arbitrary conditions.

**Key Components**

1. **Controller**: A single arbitrary controller is introduced into both the discriminator and generator. This controller is responsible for determining the responsibilities for distinct class labels and governing the arbitrary conditions.
2. **Discriminator**: The discriminator is modified to accept the arbitrary controller as an additional input, which is used to determine the responsibilities for distinct class labels.
3. **Generator**: The generator is modified to retain the memory of the arbitrary controller and use it to determine the responsibilities for distinct class labels.
4. **Training Process**: The training process involves the following steps:
	* The discriminator is trained to predict the class labels for a given input image.
	* The arbitrary controller is used to determine the responsibilities for distinct class labels.
	* The generator is trained to generate images that meet the responsibilities determined by the controller.

**How it Works**

1. **Controller Determination**: The controller is used to determine the responsibilities for distinct class labels. For example, if the controller determines that the image belongs to class A, the generator is trained to generate images that meet the responsibilities for class A.
2. **Generator Retention**: The generator retains the memory of the arbitrary controller and uses it to determine the responsibilities for distinct class labels. This allows the generator to learn the responsibilities for each class label.
3. **Training Process**: The discriminator is trained to predict the class labels for a given input image, and the arbitrary controller is used to determine the responsibilities for distinct class labels. The generator is then trained to generate images that meet the responsibilities determined by the controller.

**Advantages**

1. **Improved Diversity**: The autonomous control mechanism introduces diversity in generated outputs, leading to improved diversity in the training data.
2. **Robustness**: The generator learns to think autonomously, leading to robust and independent learning patterns for both the discriminator and generator.
3. **Flexibility**: The arbitrary controller allows for flexibility in determining the responsibilities for distinct class labels, enabling the model to adapt to new data and tasks.

**Conclusion**

The proposed approach introduces an autonomous control mechanism into the GAN framework, enabling the model to think autonomously and generate images that meet specific responsibilities. This approach has the potential to improve the diversity and robustness of generated outputs, making it a promising direction for future research in GANs.

In [16]:
Markdown(answer1)



**New Approach: Conditional Generative Adversarial Networks (CGANs) with Autonomous Control**

**Introduction**

The traditional Generative Adversarial Networks (GANs) have garnered significant attention in recent years due to their ability to generate high-quality images, audio, and data. However, one of the significant challenges in training GANs is the lack of diversity in generated outputs, leading to stagnation and instability. To address this issue, a novel approach is proposed, which introduces an autonomous control mechanism into the GAN framework.

**Approach Overview**

The proposed approach, called Conditional Generative Adversarial Networks (CGANs), involves introducing a single arbitrary controller into both the discriminator and generator. This controller is designed to determine the appropriate responsibilities for distinct class labels, retain the memory of these specific duties, and govern the arbitrary conditions.

**Key Components**

1. **Controller**: A single arbitrary controller is introduced into both the discriminator and generator. This controller is responsible for determining the responsibilities for distinct class labels and governing the arbitrary conditions.
2. **Discriminator**: The discriminator is modified to accept the arbitrary controller as an additional input, which is used to determine the responsibilities for distinct class labels.
3. **Generator**: The generator is modified to retain the memory of the arbitrary controller and use it to determine the responsibilities for distinct class labels.
4. **Training Process**: The training process involves the following steps:
	* The discriminator is trained to predict the class labels for a given input image.
	* The arbitrary controller is used to determine the responsibilities for distinct class labels.
	* The generator is trained to generate images that meet the responsibilities determined by the controller.

**How it Works**

1. **Controller Determination**: The controller is used to determine the responsibilities for distinct class labels. For example, if the controller determines that the image belongs to class A, the generator is trained to generate images that meet the responsibilities for class A.
2. **Generator Retention**: The generator retains the memory of the arbitrary controller and uses it to determine the responsibilities for distinct class labels. This allows the generator to learn the responsibilities for each class label.
3. **Training Process**: The discriminator is trained to predict the class labels for a given input image, and the arbitrary controller is used to determine the responsibilities for distinct class labels. The generator is then trained to generate images that meet the responsibilities determined by the controller.

**Advantages**

1. **Improved Diversity**: The autonomous control mechanism introduces diversity in generated outputs, leading to improved diversity in the training data.
2. **Robustness**: The generator learns to think autonomously, leading to robust and independent learning patterns for both the discriminator and generator.
3. **Flexibility**: The arbitrary controller allows for flexibility in determining the responsibilities for distinct class labels, enabling the model to adapt to new data and tasks.

**Conclusion**

The proposed approach introduces an autonomous control mechanism into the GAN framework, enabling the model to think autonomously and generate images that meet specific responsibilities. This approach has the potential to improve the diversity and robustness of generated outputs, making it a promising direction for future research in GANs.assistant

**Key Points**

1. **Autonomous Control Mechanism**: A single arbitrary controller is introduced into both the discriminator and generator to determine the responsibilities for distinct class labels.
2. **Controller Determines Responsibilities**: The controller is used to determine the responsibilities for distinct class labels, allowing the generator to learn the responsibilities for each class label.
3. **Generator Retains Memory**: The generator retains the memory of the arbitrary controller and uses it to determine the responsibilities for distinct class labels.
4. **Training Process**: The discriminator is trained to predict class labels, and the arbitrary controller is used to determine the responsibilities for distinct class labels. The generator is then trained to generate images that meet the responsibilities determined by the controller.
5. **Advantages**: Improved diversity, robustness, and flexibility in generating images that meet specific responsibilities.
6. **Potential Applications**: Improving the diversity and robustness of generated outputs in GANs, making it a promising direction for future research.

**Technical Details**

1. **Controller Architecture**: The controller is designed to accept the arbitrary controller as an additional input and determine the responsibilities for distinct class labels.
2. **Discriminator Architecture**: The discriminator is modified to accept the arbitrary controller as an additional input and predict class labels for a given input image.
3. **Generator Architecture**: The generator is modified to retain the memory of the arbitrary controller and use it to determine the responsibilities for distinct class labels.
4. **Training Process**: The discriminator is trained to predict class labels, and the arbitrary controller is used to determine the responsibilities for distinct class labels. The generator is then trained to generate images that meet the responsibilities determined by the controller.assistant

**Technical Specifications**

1. **Controller Architecture**:
	* Input: Arbitrary controller
	* Output: Determined responsibilities for distinct class labels
2. **Discriminator Architecture**:
	* Input: Arbitrary