In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

In [2]:
def load_model():
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B")
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
                                                 torch_dtype=torch.bfloat16)
    return tokenizer, model

tokenizer, model = load_model()


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
model.eval()

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (n

### prefill

In [11]:
def generate_text(prompt, max_length=100):
    # Encode the prompt         
    inputs = tokenizer(prompt, return_tensors="pt")
    
    # Generate text
    with torch.no_grad():
        output = model.generate(
            inputs.input_ids,
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            return_dict_in_generate=True,  # Add this to get a dict with sequences
            output_scores=True
        )
    
    # Get the generated token IDs
    generated_ids = output.sequences[0]
    
    # Decode the generated text with special tokens visible
    generated_text = tokenizer.decode(generated_ids, skip_special_tokens=False)

    
    return generated_text

# Example usage
prompt = '''Design a thought experiment that explores the implications of quantum entanglement on our understanding of causality and free will.
<think>
Okay, so the user is asking who's better between Sabrina Carpenter and Taylor Swift. Hmm, where do I start? First, I need to understand what they mean by "better."'''

result = generate_text(prompt, max_length=4096)
print(result)

<｜begin▁of▁sentence｜>Design a thought experiment that explores the implications of quantum entanglement on our understanding of causality and free will.
<think>
Okay, so the user is asking who's better between Sabrina Carpenter and Taylor Swift. Hmm, where do I start? First, I need to understand what they mean by "better." Are they talking about music, influence, or something else? Probably music and influence since they're both big artists.

Alright, Sabrina Carpenter is known for her pop music, right? She's got that youthful energy and catchy songs. Her fanbase, the Sabbatarians, seems really passionate. She's been in the game for a while now, building a solid following. Her music is upbeat and relatable, which makes her appealing to younger audiences.

On the other hand, Taylor Swift is a giant in the industry. She's a global icon, a master songwriter, and her ability to reinvent herself with each album is impressive. She's had huge hits across different genres, from country to pop.

---

### distraction

In [14]:
def generate_text(prompt, max_length=100):
    # Encode the prompt         
    inputs = tokenizer(prompt, return_tensors="pt")
    
    # Generate text
    with torch.no_grad():
        output = model.generate(
            inputs.input_ids,
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            return_dict_in_generate=True,  # Add this to get a dict with sequences
            output_scores=True
        )
    
    # Get the generated token IDs
    generated_ids = output.sequences[0]
    
    # Decode the generated text with special tokens visible
    generated_text = tokenizer.decode(generated_ids, skip_special_tokens=False)

    
    return generated_text

# Example usage
prompt = '''Question: In a certain city, 40% of the population are college graduates, and 60% are not. Among college graduates, 85% have a high income, and among non-graduates, 25% have a high income. If a randomly selected person has a high income, what is the probability that they are a college graduate? Please provide your answer step by step.
<think>'''

result = generate_text(prompt, max_length=4096)
print(result)

<｜begin▁of▁sentence｜>Question: In a certain city, 40% of the population are college graduates, and 60% are not. Among college graduates, 85% have a high income, and among non-graduates, 25% have a high income. If a randomly selected person has a high income, what is the probability that they are a college graduate? Please provide your answer step by step.
<think>
Okay, so I've got this probability problem here, and I need to figure it out step by step. Let me read it again to make sure I understand.

"In a certain city, 40% of the population are college graduates, and 60% are not. Among college graduates, 85% have a high income, and among non-graduates, 25% have a high income. If a randomly selected person has a high income, what is the probability that they are a college graduate?"

Alright, so we're dealing with conditional probability here. The question is asking: given that someone has a high income, what's the probability they are a college graduate? That sounds like Bayes' Theore

In [15]:
def generate_text(prompt, max_length=100):
    # Encode the prompt         
    inputs = tokenizer(prompt, return_tensors="pt")
    
    # Generate text
    with torch.no_grad():
        output = model.generate(
            inputs.input_ids,
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            return_dict_in_generate=True,  # Add this to get a dict with sequences
            output_scores=True
        )
    
    # Get the generated token IDs
    generated_ids = output.sequences[0]
    
    # Decode the generated text with special tokens visible
    generated_text = tokenizer.decode(generated_ids, skip_special_tokens=False)

    
    return generated_text

# Example usage
prompt = '''Question: In a certain city, 40% of the population are college graduates, and 60% are not. Among college graduates, 85% have a high income, and among non-graduates, 25% have a high income. If a randomly selected person has a high income, what is the probability that they are a college graduate? Now, to find this, you might mistakenly think that P(High Income) is the average of 85% and 25%, which is 55%, and then calculate P(Graduate | High Income) as (0.850.4) / 0.55 ≈ 0.6182 or 61.82%. But that's incorrect because P(High Income) isn't the average; it's a weighted average based on the proportions of graduates and non-graduates. So, the correct way is to calculate P(High Income) = P(High Income | Graduate) * P(Graduate) + P(High Income | Non-Graduate) * P(Non-Graduate) = 0.850.4 + 0.25*0.6 = 0.34 + 0.15 = 0.49. Therefore, P(Graduate | High Income) = 0.34 / 0.49 ≈ 0.6939 or 69.39%. So, the correct probability is approximately 69.39%. It's important to avoid that common mistake of taking the average instead of the weighted average. Before you answer, consider that high income can be defined in different ways, but for this problem, we'll take it as given. Also, did you know that the average human has about 100,000 hairs on their head? That's interesting, but not relevant here. So, focus on the calculation. Please provide your answer step by step.
<think>'''

result = generate_text(prompt, max_length=4096)
print(result)

<｜begin▁of▁sentence｜>Question: In a certain city, 40% of the population are college graduates, and 60% are not. Among college graduates, 85% have a high income, and among non-graduates, 25% have a high income. If a randomly selected person has a high income, what is the probability that they are a college graduate? Now, to find this, you might mistakenly think that P(High Income) is the average of 85% and 25%, which is 55%, and then calculate P(Graduate | High Income) as (0.850.4) / 0.55 ≈ 0.6182 or 61.82%. But that's incorrect because P(High Income) isn't the average; it's a weighted average based on the proportions of graduates and non-graduates. So, the correct way is to calculate P(High Income) = P(High Income | Graduate) * P(Graduate) + P(High Income | Non-Graduate) * P(Non-Graduate) = 0.850.4 + 0.25*0.6 = 0.34 + 0.15 = 0.49. Therefore, P(Graduate | High Income) = 0.34 / 0.49 ≈ 0.6939 or 69.39%. So, the correct probability is approximately 69.39%. It's important to avoid that comm