# Demonstrating Coconut Approach with Janus Model

This notebook demonstrates how to use the Chain of Continuous Thought (Coconut) approach with the Janus model. The Coconut approach allows the model to reason in a continuous latent space rather than being restricted to token-by-token reasoning in language space.

## Setup and Imports

First, let's import the necessary libraries and load the Janus model.

In [1]:
from janus_wraper import *
import torch
from transformers import AutoModelForCausalLM, AutoConfig
from Janus.janus.models import MultiModalityCausalLM, VLChatProcessor
from Janus.janus.utils.io import load_pil_images
model_path = '/Users/nover/models/deepseek-ai/Janus-Pro-7B'

# specify the path to the model
# model_path = "/projectnb/cs598/projects/cool_proj/model"
# device = 'cuda'

if torch.cuda.is_available():
    device = 'cuda'
elif torch.mps.is_available():
    device = 'mps'
else:
    device = 'cpu'
device = 'cpu'
chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)

gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True,
)
t_dtype = torch.bfloat16 if device=='cuda' or device=='mps' else torch.float16
gpt = gpt.to(t_dtype).to(device).eval()



Python version is above 3.10, patching the collections module.
Python version is above 3.10, patching the collections module.


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.50, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Some kwargs in processor config are unused and will not have any effect: ignore_id, mask_prompt, image_tag, num_imag

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [2]:

tokenizer = chat_processor.tokenizer
print("Valid tokens in the tokenizer:", tokenizer.convert_ids_to_tokens(range(len(tokenizer))))
print("Special tokens (non-printing):")
print(f"<unk>: {tokenizer.unk_token}, <bos>: {tokenizer.bos_token}, <eos>: {tokenizer.eos_token}")



Special tokens (non-printing):
<unk>: None, <bos>: <｜begin▁of▁sentence｜>, <eos>: <｜end▁of▁sentence｜>


## Understanding the Coconut Approach

The Coconut approach (Chain of Continuous Thought) allows the model to reason in a continuous latent space rather than being restricted to token-by-token reasoning in language space. This approach:

1. Uses the last hidden state of the model as a representation of the reasoning state ("continuous thought")
2. Feeds this continuous thought directly back to the model as the next input embedding
3. Allows the model to perform multiple reasoning steps in the latent space before generating a final answer

This approach has several advantages:
- The continuous thought can encode multiple alternative reasoning paths simultaneously
- It enables breadth-first search (BFS) reasoning patterns
- It can be more efficient, requiring fewer tokens for complex reasoning tasks

## Example 1: Text-Only Reasoning with Coconut

In [3]:
from janus_wraper import janus_pro_generate
from PIL import Image

# Define a complex reasoning question
complex_question = "If a train travels at 60 miles per hour, how far will it travel in 2.5 hours?"

image = Image.open("img.jpg")

# Generate response using standard approach
standard_response = janus_pro_generate(
    chat_processor,
    gpt,
    device=device,
    input_text=complex_question,
    input_images=[image],
    output_mode="text",
    use_coconut=False,
)

print("Standard Response:")
print(standard_response)

Standard Response:
To find out how far a train travels in 2.5 hours at a speed of 60 miles per hour, you can use the formula:

\[ \text{Distance} = \text{Speed} \times \text{Time} \]

Given:
- Speed = 60 miles per hour
- Time = 2.5 hours

\[ \text{Distance} = 60 \times 2.5 = 150 \text{ miles} \]

So, the train will travel 150 miles in 2.5 hours.


In [4]:
# Generate response using Coconut approach
coconut_response = janus_pro_generate(
    chat_processor,
    gpt,
    input_text=complex_question,
    input_images=[image],
    device='cuda' if torch.cuda.is_available() else 'cpu',
    output_mode="text",
    temperature=0.1,
    use_coconut=True,  # Enable continuous thought reasoning
    num_continuous_thoughts=3  # Use 3 continuous thought steps
)

print("Coconut Response:")
print(coconut_response)

RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1, 128] because the unspecified dimension size -1 can be any value and is ambiguous

## Example 2: Visual Reasoning with Coconut

Now let's try a visual reasoning task using an image.

In [None]:
# Load an example image
image_path = "img.jpg"  # Replace with your image path
image = Image.open(image_path)

# Display the image
image.show()

In [6]:
# Define a complex visual reasoning question
visual_question = "What objects are in this image and how are they arranged? Explain the spatial relationships."

# Generate response using standard approach
standard_visual_response = janus_pro_generate(
    vl_chat_processor,
    vl_gpt,
    input_text=visual_question,
    input_image=image,
    device='cuda' if torch.cuda.is_available() else 'cpu',
    output_mode="text",
    temperature=0.1,
    use_coconut=False
)

print("Standard Visual Response:")
print(standard_visual_response)

NameError: name 'janus_pro_generate' is not defined

In [None]:
# Generate response using Coconut approach
coconut_visual_response = janus_pro_generate(
    vl_chat_processor,
    vl_gpt,
    input_text=visual_question,
    input_image=image,
    device='cuda' if torch.cuda.is_available() else 'cpu',
    output_mode="text",
    temperature=0.1,
    use_coconut=True,
    num_continuous_thoughts=4  # Use 4 continuous thought steps for more complex visual reasoning
)

print("Coconut Visual Response:")
print(coconut_visual_response)

## Example 3: Logical Reasoning with Coconut

Let's try a logical reasoning problem that requires planning and backtracking.

In [None]:
# Define a logical reasoning problem
logical_problem = """
Every grimpus is a yimpus. Every worpus is a jelpus. Every zhorpus is a sterpus. 
Alex is a grimpus. Every lumpus is a yumpus. 
Question: Is Alex a gorpus or bompus?
"""

# Generate response using standard approach
standard_logical_response = janus_pro_generate(
    vl_chat_processor,
    vl_gpt,
    input_text=logical_problem,
    device='cuda' if torch.cuda.is_available() else 'cpu',
    output_mode="text",
    temperature=0.1,
    use_coconut=False
)

print("Standard Logical Response:")
print(standard_logical_response)

In [None]:
# Generate response using Coconut approach
coconut_logical_response = janus_pro_generate(
    vl_chat_processor,
    vl_gpt,
    input_text=logical_problem,
    device='cuda' if torch.cuda.is_available() else 'cpu',
    output_mode="text",
    temperature=0.1,
    use_coconut=True,
    num_continuous_thoughts=5  # Use more continuous thoughts for complex logical reasoning
)

print("Coconut Logical Response:")
print(coconut_logical_response)

## Experimenting with Different Numbers of Continuous Thoughts

Let's see how the number of continuous thoughts affects the reasoning process.

In [None]:
# Define a complex math problem
math_problem = "If a rectangle has a length of 12 cm and a width of 8 cm, what is its area and perimeter?"

# Try with different numbers of continuous thoughts
for num_thoughts in [1, 2, 3, 5]:
    response = janus_pro_generate(
        vl_chat_processor,
        vl_gpt,
        input_text=math_problem,
        device='cuda' if torch.cuda.is_available() else 'cpu',
        output_mode="text",
        temperature=0.1,
        use_coconut=True,
        num_continuous_thoughts=num_thoughts
    )
    
    print(f"\nResponse with {num_thoughts} continuous thoughts:")
    print(response)

## Conclusion

The Coconut approach allows the Janus model to reason in a continuous latent space, which can lead to more effective reasoning for complex problems. By using continuous thoughts, the model can:

1. Encode multiple potential reasoning paths simultaneously
2. Perform breadth-first search-like reasoning
3. Potentially provide more accurate answers for problems that require planning and backtracking

This approach is particularly effective for logical reasoning tasks and other problems that benefit from exploring multiple reasoning paths before committing to a final answer.