In [4]:
import os
os.getcwd()

'/home/preet/NLE'

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer, set_seed
import torch
import numpy as np
import random

In [3]:
# Set seeds for reproducibility
random_seed = 42
np_seed = 42
torch_seed = 42
transformers_seed = 42
random.seed(random_seed)
np.random.seed(np_seed)
torch.manual_seed(torch_seed)
set_seed(transformers_seed)

In [4]:
# Load model with 4-bit quantization
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    # bnb_4bit_use_double_quant=True,
    # bnb_4bit_quant_type="nf4",
    # bnb_4bit_compute_dtype=torch.bfloat16,
    # llm_int8_enable_fp32_cpu_offload=True
)

In [5]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B")
device = "cuda:0" if torch.cuda.is_available() else "cpu"

In [6]:
model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    quantization_config=quantization_config,
    device_map=device
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

#### Above line changed gpu usage from 42MiB to 9115 MiB

In [7]:
print(model.get_memory_footprint()/1024**2)

8660.508056640625


In [8]:
# Tokenize input
# input_text = "Hey, are you conscious? Can you talk to me?"
input_text = "I have a 1- and a 2-liter jug. I want to measure exactly 3 liters."
# input_text = "I have a 7 litre bucket that is missing a bottom, and the top was welded and sealed shut. How much water can I hold in it?"

In [23]:
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)

In [10]:
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

In [11]:
%time   output_dict = model.generate(input_ids, max_new_tokens = 100000, do_sample = True, pad_token_id=tokenizer.eos_token_id, streamer = streamer, return_dict_in_generate=True, output_scores=True)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


 Hmm, okay, so I know that with a 1-liter jug and a 2-liter jug, I can measure various amounts, but how do I get exactly 3 liters? Let me think through this step by step.

First, I should probably figure out the maximum amount I can measure with these jugs. Since the 2-liter jug is the larger one, that's a good starting point. If I fill the 2-liter jug completely, that gives me 2 liters. Then, if I have a 1-liter jug, I can pour from the 2-liter jug into the 1-liter jug until it's full. That would leave me with 1 liter in the 2-liter jug. So, now I have 1 liter in the 2-liter jug and 1 liter in the 1-liter jug.

But wait, that only gives me 2 liters total, not 3. Hmm, maybe I need to do this differently. What if I start by filling the 1-liter jug and then pour it into the 2-liter jug? That way, the 1-liter jug is empty, and the 2-liter jug has 1 liter. Then, if I fill the 1-liter jug again, pour it into the 2-liter jug, which now has 2 liters. Then, the 1-liter jug is full again. But t

In [12]:
final_dict = {"sample": output_dict}

In [14]:
del input_ids, output_dict

In [15]:
import gc

In [21]:
gc.collect()

0

In [22]:
torch.cuda.empty_cache()

In [25]:
%time   output_dict_temp = model.generate(input_ids, max_new_tokens = 100000, do_sample = True, temperature = 0.6, pad_token_id=tokenizer.eos_token_id, streamer = streamer, return_dict_in_generate=True, output_scores=True)

 How can I do that?

Okay, so I have these two jugs: one is 1 liter and the other is 2 liters. I need to figure out how to measure exactly 3 liters using them. Hmm, I'm not sure how to approach this because 3 liters isn't directly achievable with just 1 and 2-liter jugs. Let me think through this step by step.

First, let me visualize the jugs. I have a 1-liter jug and a 2-liter jug. The total capacity is 3 liters, but I don't have a 3-liter jug. So, I need to use the two I have to get exactly 3 liters somewhere. Maybe I can pour water between them to measure it out.

Let me start by filling the 2-liter jug completely. So, I pour water into the 2-liter jug until it's full. That should take 2 liters of water. Now, I have the 2-liter jug full and the 1-liter jug empty.

Next, I can pour water from the 2-liter jug into the 1-liter jug. Since the 1-liter jug can only hold 1 liter, I'll pour 1 liter from the 2-liter jug into it. Now, the 2-liter jug has 1 liter left, and the 1-liter jug is 

In [26]:
final_dict['temp:0.6'] = output_dict_temp

In [27]:
%time   output_dict_temp_k = model.generate(input_ids, max_new_tokens = 100000, do_sample = True, temperature = 0.6, top_k=50, pad_token_id=tokenizer.eos_token_id, streamer = streamer, return_dict_in_generate=True, output_scores=True)

 How can I do that?

I have a 1-liter jug and a 2-liter jug. I want to measure exactly 3 liters. Hmm, how can I do that?

Alright, so I have these two jugs: one is 1 liter, and the other is 2 liters. I need to measure exactly 3 liters. Let me think about how to approach this.

First, I know that with just a 1-liter and a 2-liter jug, I can measure out certain amounts, but 3 liters isn't something I can directly get by just filling one or the other. Maybe I can use a process where I fill the larger jug, pour into the smaller one, and so on.

Let me start by visualizing the jugs. The 2-liter jug is bigger, so I can fill it up first. If I pour water from the 2-liter jug into the 1-liter jug, I'll have 1 liter in the smaller jug and 1 liter remaining in the 2-liter jug. That leaves me with 1 liter in both jugs, but that doesn't help me get to 3 liters.

Wait, maybe I should consider filling the smaller jug multiple times. So if I fill the 1-liter jug and pour it into the 2-liter jug, the 2

In [28]:
final_dict["top_k:50"] = output_dict_temp_k

In [29]:
import pickle 

In [30]:
with open("final_dict.pkl", "wb") as f:
    pickle.dump(final_dict, f)