<a href="https://colab.research.google.com/github/Maximo-Rulli/dynamic-steps-dlm/blob/main/blocks-entropy-test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analyzing the entropy of Diffusion blocks by DLMs

### Essential imports

In [1]:
#Essential imports
import torch
from transformers import AutoTokenizer

#Repository's functions
from MMaDA.models import MMadaModelLM
import MMaDA.generate as gen
import importlib

  from .autonotebook import tqdm as notebook_tqdm


### Tokenizer and model loading

In [2]:
device = 'cuda'
model = MMadaModelLM.from_pretrained("Gen-Verse/MMaDA-8B-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained("Gen-Verse/MMaDA-8B-Base", trust_remote_code=True)

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llada to instantiate a model of type mmada. This is not supported for all configurations of models and can yield errors.


Initializing MMadaModelLM with config: MMadaConfig {
  "_attn_implementation_autoset": true,
  "_name_or_path": "Gen-Verse/MMaDA-8B-Base",
  "activation_type": "silu",
  "alibi": false,
  "alibi_bias_max": 8.0,
  "architectures": [
    "LLaDAModelLM"
  ],
  "attention_dropout": 0.0,
  "attention_layer_norm": false,
  "attention_layer_norm_with_affine": true,
  "auto_map": {
    "AutoConfig": "Gen-Verse/MMaDA-8B-Base--configuration_llada.LLaDAConfig",
    "AutoModel": "Gen-Verse/MMaDA-8B-Base--modeling_llada.LLaDAModelLM",
    "AutoModelForCausalLM": "Gen-Verse/MMaDA-8B-Base--modeling_llada.LLaDAModelLM"
  },
  "bias_for_layer_norm": false,
  "block_group_size": 1,
  "block_type": "llama",
  "codebook_size": 8192,
  "d_model": 4096,
  "embedding_dropout": 0.0,
  "embedding_size": 134656,
  "eos_token_id": 126081,
  "flash_attention": false,
  "include_bias": false,
  "include_qkv_bias": false,
  "init_cutoff_factor": null,
  "init_device": "meta",
  "init_fn": "mitchell",
  "init_std": 

Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00,  9.89it/s]


### Load tokenizer chat template

In [3]:
tokenizer.chat_template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n' }}"

### Set tokenizer helper function

In [None]:
def chat_tokenize(prompt:str, think:bool=False, chat:bool=True) -> torch.Tensor:
  if think:
    prompt = "You should first think about the reasoning process in the mind and then provide the user with the answer. The reasoning process is enclosed within <think> </think> tags, i.e. <think> reasoning process here </think> answer here\n" + prompt
  m = [{"role": "user", "content": prompt},]
  prompt = tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=False) if chat else prompt
  input_ids = tokenizer(text=prompt, return_tensors="pt", padding=True, padding_side="left")['input_ids']
  return input_ids.detach().clone().to(device)

## Run inference on the model

### Observation #1

08/07/2025

With more than 47 steps, and length, the answer gets considerably shorter and concise. To be researched!!!

prompt: "If I have 2 friends and 6 apples, how many apples does each one recieve?"

steps<=47:
answer: "Each friend receives 3 apples."

steps>47:
answer: "3"

### Experiment #1

10/07/2025

Run model with length 12 on the apples prompt. The input is split into 4,3 and no (1) blocks, each one assigned its corresponding steps [3,3,3,2], [4,4,3], and [11] respectively. The model is not confident at all in the last block when using 3 splits, while in the other two cases it generates a confident sequence with the same amount of total steps.

In [5]:
importlib.reload(module=gen)
input_ids = chat_tokenize("If I have 2 friends and 6 apples, how many apples does each one recieve?")
length = 12

print(f"{'-'*20}Output when splitting in 4 blocks{'-'*20}")
out = gen.custom_generate(model, input_ids, steps=[3,3,3,2], gen_length=length, \
                          block_length=length//4, temperature=0, cfg_scale=0., remasking='low_confidence')

print(out, tokenizer.batch_decode(out[:, input_ids.shape[1]:], skip_special_tokens=False))

print(f"\n\n{'-'*20}Output when splitting in 3 blocks{'-'*20}")
out = gen.custom_generate(model, input_ids, steps=[4,4,3], gen_length=length, \
                          block_length=length//3, temperature=0, cfg_scale=0., remasking='low_confidence')

print(out, tokenizer.batch_decode(out[:, input_ids.shape[1]:], skip_special_tokens=False))

print(f"\n\n{'-'*20}Output with no splits{'-'*20}")
out = gen.custom_generate(model, input_ids, steps=[11], gen_length=length, \
                          block_length=length//1, temperature=0, cfg_scale=0., remasking='low_confidence')

print(out, tokenizer.batch_decode(out[:, input_ids.shape[1]:], skip_special_tokens=False))

--------------------Output when splitting in 4 blocks--------------------


  token_entropy = (prob@torch.log(prob).T).item()


Entropy of word 11934:  -1.453125
Entropy of word 2684:  -0.62109375
Entropy of word 1168:  -1.40625
Entropy of word 2925:  -0.333984375
Entropy of word 82:  -0.90234375
Entropy of word 220:  -0.86328125
Entropy of word 32993:  -0.326171875
Entropy of word 18:  -0.0771484375
Entropy of word 13:  -2.21875
Entropy of word 126081:  -0.55859375
Entropy of word 126081:  -0.43359375
Entropy of word 126081:  -0.000675201416015625
Total entropy of each block tensor([-3.4805, -2.0996, -2.6221, -0.9929])
tensor([[126080, 126346,   3840, 126347,    198,   2531,    331,    561,    220,
             17,   4569,    301,    220,     21,  32993,     11,   1099,   1494,
          32993,   1543,   1671,    810,   1168,   2925,     30, 126348, 126346,
            598,  10450, 126347,    198,  11934,   2684,   1168,   2925,     82,
            220,     18,  32993,     13, 126081, 126081, 126081]],
       device='cuda:0') ['Each friend recieves 3 apples.<|endoftext|><|endoftext|><|endoftext|>']


---------

11/07/2025

Now a more complex prompt is given alongside the thinking prompt for the model to reason. The output length is fixed at 256 and different distributions of steps are tested keeping fixed the number of blocks (4)

In [8]:
importlib.reload(module=gen)
input_ids = chat_tokenize("Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?", think=False)
length = 256

"""print(f"\n\n{'-'*20}Output with uniform and maximum step distribution{'-'*20}")
out = gen.custom_generate(model, input_ids, steps=[64,64,64,64], gen_length=length, \
                          block_length=length//4, temperature=0, cfg_scale=0., remasking='low_confidence', entropy_log=False)

print(tokenizer.batch_decode(out[:, input_ids.shape[1]:], skip_special_tokens=False)[0])

print(f"\n\n{'-'*20}Output with last block having 1/4th of steps{'-'*20}")
out = gen.custom_generate(model, input_ids, steps=[64,64,64,64//4], gen_length=length, \
                          block_length=length//4, temperature=0, cfg_scale=0., remasking='low_confidence', entropy_log=False)

print(tokenizer.batch_decode(out[:, input_ids.shape[1]:], skip_special_tokens=False)[0])"""

print(f"\n\n{'-'*20}Output with 3rd block having 1/2 of steps, and 4th block having 1/4th of steps{'-'*20}")
out = gen.custom_generate(model, input_ids, steps=[64,64,64//2,64//4], gen_length=length, \
                          block_length=length//4, temperature=0, cfg_scale=0., remasking='low_confidence', entropy_log=False)

print(tokenizer.batch_decode(out[:, input_ids.shape[1]:], skip_special_tokens=False)[0])



--------------------Output with 3rd block having 1/2 of steps, and 4th block having 1/4th of steps--------------------
Total entropy of each block tensor([-37.1358, -20.2489, -35.5521,  -7.5183])
To determine how much Weng earned, we need to calculate the number of hours she spent babysitting and then multiply her hourly rate by the number of hours.

First, let's find the number of hours she spent babysitting. Since she did 50 minutes of babysitting, we need to convert this time into hours. There are 60 minutes in an hour, so 50 minutes is equal to \(\frac{50}{60} = \frac{1}{12}\) of an hour.

Now, we can multiply her hourly rate by the number of hours she spent babysitting. Weng earns $12 per hour, and she spent \(\frac{1}{12}\) of an hour babysitting. Therefore, her total earnings are $1.
<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><

In [13]:
[tokenizer.batch_decode(out[:, input_ids.shape[1]+i]) for i in range(len(out[0])-input_ids.shape[1])]

[['Each'],
 [' friend'],
 [' rec'],
 ['ieve'],
 ['s'],
 [' '],
 ['3'],
 [' apples'],
 ['.'],
 ['<|endoftext|>'],
 ['<|endoftext|>'],
 ['<|endoftext|>'],
 ['<|endoftext|>'],
 ['<|endoftext|>'],
 ['<|endoftext|>']]

In [20]:
{v:k for k,v in tokenizer.vocab.items()}[126336]

'<|mdm_mask|>'

In [None]:
encoding = tokenizer.encode("""\
                              Humpty Dumpty sat on a wall.\
                              Humpty Dumpty had a great fall.\
                              All the king's horses and all the king's men\
                              Couldn't put Humpty together again.""")
tokenizer.decode(encoding)

"\n                              Humpty Dumpty sat on a wall.\n                              Humpty Dumpty had a great fall.\n                              All the king's horses and all the king's men\n                              Couldn't put Humpty together again."