In [1]:
# %% [markdown]
# # Fine-tuning Janus-Pro-7B on Instruction–Response Data
#
# In this notebook we fine‑tune the Janus‑Pro‑7B model (a multi‑modality causal LM from deepseek‑ai) using two simple text-only examples:
#
# - **Instruction:** “How do I reset my password?”  
#   **Response:** “Go to settings and click 'Reset Password'.”
#
# - **Instruction:** “What are the platform rules?”  
#   **Response:** “Follow community guidelines and be respectful.”
#
# We format these as conversations using the VLChatProcessor SFT template, fine‑tune the model using a LoRA adapter via PEFT, save the adapter to disk, and then reload it for inference.
#
# > **Warning:** This notebook uses only two examples for demonstration. For a real use case you should use a larger dataset and carefully adjust training parameters.

# %% [code]
import os
import torch
import numpy as np
from datasets import Dataset
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
)
from janus.models import MultiModalityCausalLM, VLChatProcessor
from peft import get_peft_model, LoraConfig



Python version is above 3.10, patching the collections module.


In [2]:
# Set the model path
model_path = "deepseek-ai/Janus-Pro-7B"

# Load the VLChatProcessor (which holds the conversation templates) and tokenizer
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

# Load the base model (as a MultiModalityCausalLM) and move to GPU in bfloat16
# (Note: For training, do NOT call eval() yet)
base_model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
base_model = base_model.to(torch.bfloat16).cuda()

# Enable gradient checkpointing on your base model (if supported)
# base_model.gradient_checkpointing_enable()


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Some kwargs in processor config are unused and will not have any effect: ignore_id, add_special_token, num_image_tok

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  return self.fget.__get__(instance, owner)()


In [3]:
# %% [markdown]
# ## Prepare the Training Data
#
# We define two instruction–response pairs and format them as conversations using the SFT template.
#
# The formatting function creates a conversation with a `<|User|>` message and a `<|Assistant|>` reply.

# %% [code]
# Define our tiny training set
train_data = [
    {
        "instruction": "How do I reset my password?",
        "response": "Go to settings and click 'Reset Password'."
    },
    {
        "instruction": "What are the platform rules?",
        "response": "Follow community guidelines and be respectful."
    }
]

def format_conversation(instruction: str, response: str) -> str:
    conversation = [
        {"role": "<|User|>", "content": instruction},
        {"role": "<|Assistant|>", "content": response},
    ]
    # Apply the SFT template from the VLChatProcessor.
    # This returns a formatted text prompt that the model was originally trained with.
    formatted = vl_chat_processor.apply_sft_template_for_multi_turn_prompts(
        conversations=conversation,
        sft_format=vl_chat_processor.sft_format,
        system_prompt="",
    )
    return formatted

# Format each training example
formatted_texts = [format_conversation(item["instruction"], item["response"]) for item in train_data]


In [4]:

# %% [markdown]
# ## Tokenize the Data
#
# We tokenize the formatted conversations and set the `labels` equal to the input IDs for causal language modeling.
#
# (In a more advanced setup you might choose to mask parts of the prompt.)

# %% [code]
def tokenize_function(text: str):
    # Adjust max_length as needed (here we use 1024 tokens)
    tokenized = tokenizer(text, truncation=True, max_length=128)
    # For causal LM training, labels are the same as input_ids.
    tokenized["labels"] = tokenized["input_ids"].copy()
    return tokenized

# Tokenize each example
tokenized_data = [tokenize_function(txt) for txt in formatted_texts]

# Create a Hugging Face Dataset
dataset = Dataset.from_list(tokenized_data)

print("Number of training examples:", len(dataset))


Number of training examples: 2


In [5]:
# import torch
# import torch.nn as nn

# class JanusForCausalLMWrapper(nn.Module):
#     def __init__(self, model, tokenizer):
#         super().__init__()
#         self.model = model
#         self.tokenizer = tokenizer
#         # Use the embedding layer from model.language_model.
#         if hasattr(model, "language_model") and callable(getattr(model.language_model, "get_input_embeddings", None)):
#             self.embed = model.language_model.get_input_embeddings()
#         else:
#             raise NotImplementedError(
#                 "The provided model does not support get_input_embeddings via model.language_model."
#             )

#     @property
#     def config(self):
#         # Delegate the config attribute to the underlying model.
#         return self.model.config

#     def prepare_inputs_for_generation(self, input_ids, **kwargs):
#         """
#         Delegates to the underlying model's prepare_inputs_for_generation if available,
#         otherwise returns a fallback dict.
#         """
#         if hasattr(self.model, "prepare_inputs_for_generation"):
#             return self.model.prepare_inputs_for_generation(input_ids, **kwargs)
#         else:
#             # Fallback: simply return the input_ids in a dict.
#             return {"input_ids": input_ids, **kwargs}

#     def forward(self, input_ids=None, attention_mask=None, labels=None, **kwargs):
#         # If input_ids are provided, convert them to embeddings.
#         kwargs.pop("inputs_embeds", None)

#         if input_ids is not None:
#             inputs_embeds = self.embed(input_ids)
#         else:
#             inputs_embeds = kwargs.get("inputs_embeds")
#             if inputs_embeds is None:
#                 raise ValueError("You must provide either input_ids or inputs_embeds.")
#         # Forward the call to the underlying model using inputs_embeds.
#         return self.model(
#             inputs_embeds=inputs_embeds,
#             attention_mask=attention_mask,
#             labels=labels,
#             **kwargs
#         )


import torch
import torch.nn as nn

class JanusForCausalLMWrapper(nn.Module):
    def __init__(self, model, tokenizer):
        """
        Wraps the Janus-Pro-7B model to allow Trainer/PEFT to work with input_ids.
        Assumes that:
          - The embedding layer is at model.language_model.get_input_embeddings()
          - The actual transformer forward is in model.language_model.model
        """
        super().__init__()
        self.model = model
        self.tokenizer = tokenizer
        # Get the input embeddings from the language_model submodule.
        if (
            hasattr(model, "language_model")
            and hasattr(model.language_model, "get_input_embeddings")
            and callable(model.language_model.get_input_embeddings)
        ):
            self.embed = model.language_model.get_input_embeddings()
        else:
            raise NotImplementedError("The provided model does not have language_model.get_input_embeddings.")

    @property
    def config(self):
        # Expose the config (delegate to the underlying model if available, or to language_model)
        if hasattr(self.model, "config"):
            return self.model.config
        elif hasattr(self.model, "language_model") and hasattr(self.model.language_model, "config"):
            return self.model.language_model.config
        else:
            raise AttributeError("No config attribute found in the model.")

    def prepare_inputs_for_generation(self, input_ids, **kwargs):
        # Delegate if possible; otherwise, return a basic dict.
        if hasattr(self.model, "prepare_inputs_for_generation"):
            return self.model.prepare_inputs_for_generation(input_ids, **kwargs)
        else:
            return {"input_ids": input_ids, **kwargs}

    def forward(self, input_ids=None, attention_mask=None, labels=None, **kwargs):
        """
        Converts input_ids to embeddings via the inner embedding layer,
        then delegates the forward pass to the transformer submodule.
        """
        # Remove any "inputs_embeds" from kwargs to avoid duplication.
        kwargs.pop("inputs_embeds", None)
        if input_ids is not None:
            # Compute embeddings from input_ids.
            inputs_embeds = self.embed(input_ids)
        else:
            raise ValueError("input_ids must be provided")
        # Delegate the forward pass to the inner transformer module.
        return self.model.language_model.model(
            inputs_embeds=inputs_embeds,
            attention_mask=attention_mask,
            labels=labels,
            **kwargs
        )


In [9]:
# Assuming base_model and tokenizer have been loaded.
wrapped_model = JanusForCausalLMWrapper(base_model, tokenizer)

from peft import get_peft_model, LoraConfig

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # Adjust as needed for your model
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(wrapped_model, lora_config)

print("Trainable parameters:")
model.print_trainable_parameters()

Trainable parameters:
trainable params: 3,932,160 || all params: 7,424,300,683 || trainable%: 0.0530


In [10]:

# # %% [markdown]
# # ## Configure PEFT (LoRA)
# #
# # We now set up a LoRA configuration. The target modules (e.g. `"q_proj"` and `"v_proj"`)
# # may need adjustment depending on your model’s architecture.
# #
# # In this example we use:
# #
# # - `r=8` and `lora_alpha=32` for low-rank adaptation  
# # - A dropout of `0.1`  
# # - No bias update (`bias="none"`)
# #
# # The task type is set to `"CAUSAL_LM"`.

# # %% [code]
# lora_config = LoraConfig(
#     r=8,
#     lora_alpha=32,
#     target_modules=["q_proj", "v_proj"],  # Adjust these as needed for Janus-Pro-7B
#     lora_dropout=0.1,
#     bias="none",
#     task_type="CAUSAL_LM",
# )

# # Wrap the base model with the PEFT adapter
# model = get_peft_model(base_model, lora_config)
# print("Trainable parameters:")
# model.print_trainable_parameters()


In [11]:
from transformers import TrainingArguments, Trainer

deepspeed_config = {
    "train_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 4,  # accumulate gradients over multiple steps if needed
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {"device": "cpu", "pin_memory": True},
        "offload_param": {"device": "cpu", "pin_memory": True},
    },
    "fp16": {
        "enabled": True
    }
}

training_args = TrainingArguments(
    output_dir="./janus_peft",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    learning_rate=5e-5,
    logging_steps=1,
    save_steps=10,
    fp16=True,
    report_to="none",
    deepspeed=None,  # Disable DeepSpeed integration
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

trainer.train()


OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 23.54 GiB total capacity; 23.00 GiB already allocated; 1.81 MiB free; 23.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

In [None]:

# %% [markdown]
# ## Save the Fine-tuned Adapter
#
# Once training is complete, we save the adapter (the fine-tuned parameters) to disk.

# %% [code]
adapter_save_path = "./janus_peft_finetuned"
model.save_pretrained(adapter_save_path)
print(f"Adapter saved to {adapter_save_path}")


In [None]:

# %% [markdown]
# ## Inference: Load and Use the Fine-tuned Model
#
# In this cell we reload the base Janus-Pro-7B model and then load the fine-tuned adapter.
# We define a simple text generation function that uses the fine-tuned model to produce a response.

# %% [code]
from peft import PeftModel

# Reload the base model
base_model_loaded = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
base_model_loaded = base_model_loaded.to(torch.bfloat16).cuda()

# Load the fine-tuned adapter into the base model
model_loaded = PeftModel.from_pretrained(base_model_loaded, adapter_save_path)
model_loaded.eval()

def generate_text(prompt: str, max_new_tokens: int = 100, temperature: float = 0.7) -> str:
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs = {k: v.to(model_loaded.device) for k, v in inputs.items()}
    outputs = model_loaded.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=temperature,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# %% [markdown]
# ## Test the Fine-tuned Model
#
# Let’s test the fine-tuned model on one of our instructions.
#
# (The prompt should be formatted with the SFT template. Here we simply re-use our formatting function.)

# %% [code]
test_instruction = "How do I reset my password?"
test_formatted = format_conversation(test_instruction, "")  # No answer provided
print("Test prompt (formatted):")
print(test_formatted)
print("\nGenerated response:")
print(generate_text(test_formatted))