<a href="https://colab.research.google.com/github/weedge/doraemon-nb/blob/main/achatbot_deepseekVL2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# install

In [None]:
!cd /content && rm -rf achatbot && git clone --recursive https://github.com/ai-bot-pro/achatbot.git

In [6]:
%cd /content/achatbot

/content/achatbot


In [None]:
!bash scripts/pypi_achatbot.sh dev


In [8]:
!pip install -q "dist/achatbot-0.0.8.7-py3-none-any.whl[llm_transformers_manual_vision_deepseekvl2]"

In [9]:
!pip show transformers

Name: transformers
Version: 4.38.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft, sentence-transformers


# download model


In [6]:
from google.colab import userdata
HF_TOKEN=userdata.get('HF_TOKEN')

In [None]:
!huggingface-cli login --token $HF_TOKEN --add-to-git-credential

In [8]:
!huggingface-cli download deepseek-ai/deepseek-vl2-tiny --quiet --local-dir /content/models/deepseek-ai/deepseek-vl2-tiny

/content/models/deepseek-ai/deepseek-vl2-tiny


In [None]:
!huggingface-cli download deepseek-ai/deepseek-vl2-small --quiet --local-dir /content/models/deepseek-ai/deepseek-vl2-small

In [None]:
!huggingface-cli download deepseek-ai/deepseek-vl2 --quiet --local-dir /content/models/deepseek-ai/deepseek-vl2

# demo

In [None]:
!mkdir -p /content/images/
!wget https://github.com/deepseek-ai/DeepSeek-VL2/blob/main/images/visual_grounding_1.jpeg?raw=true -O /content/images/visual_grounding.jpeg

In [4]:
import torch

def print_model_params(model: torch.nn.Module, extra_info=""):
    # print the number of parameters in the model
    model_million_params = sum(p.numel() for p in model.parameters()) / 1e6
    print(model)
    print(f"{extra_info} {model_million_params} M parameters")

In [5]:
import sys
import os
import torch
from transformers import AutoModelForCausalLM

#os.environ['XFORMERS_DISABLED'] = '1'  # Disable xFormers for this run
#os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments'

sys.path.insert(1, "/content/achatbot/deps/DeepSeekVL2")
from deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM
from deepseek_vl2.utils.io import load_pil_images


# specify the path to the model
model_path = "/content/models/deepseek-ai/deepseek-vl2-tiny"
vl_chat_processor: DeepseekVLV2Processor = DeepseekVLV2Processor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
print_model_params(vl_gpt)

Python version is above 3.10, patching the collections module.


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Add pad token = ['<｜▁pad▁｜>'] to the tokenizer
<｜▁pad▁｜>:2
Add image token = ['<image>'] to the tokenizer
<image>:128815
Add grounding-related tokens = ['<|ref|>', '<|/ref|>', '<|det|>', '<|/det|>', '<|grounding|>'] to the tokenizer with input_ids
<|ref|>:128816
<|/ref|>:128817
<|det|>:128818
<|/det|>:128819
<|grounding|>:128820
Add chat tokens = ['<|User|>', '<|Assistant|>'] to the tokenizer with input_ids
<|User|>:128821
<|Assistant|>:128822

DeepseekVLV2ForCausalLM(
  (vision): VisionTransformer(
    (patch_embed): PatchEmbed(
      (proj): Conv2d(3, 1152, kernel_size=(14, 14), stride=(14, 14))
      (norm): Identity()
    )
    (pos_drop): Dropout(p=0.0, inplace=False)
    (patch_drop): Identity()
    (norm_pre): Identity()
    (blocks): Sequential(
      (0): Block(
        (norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=1152, out_features=3456, bias=True)
          (q_norm): Identity()
          (k_nor

In [6]:
## single image conversation example

conversation = [
    {
        "role": "User",
        "content": "<image>\n<|ref|>The giraffe at the back.<|/ref|>.",
        "images": ["/content/images/visual_grounding.jpeg"],
    },
    {"role": "<|Assistant|>", "content": ""},
]

## multiple images (or in-context learning) conversation example
# conversation = [
#     {
#         "role": "User",
#         "content": "<image_placeholder>A dog wearing nothing in the foreground, "
#                    "<image_placeholder>a dog wearing a santa hat, "
#                    "<image_placeholder>a dog wearing a wizard outfit, and "
#                    "<image_placeholder>what's the dog wearing?",
#         "images": [
#             "images/dog_a.png",
#             "images/dog_b.png",
#             "images/dog_c.png",
#             "images/dog_d.png",
#         ],
#     },
#     {"role": "Assistant", "content": ""}
# ]

# load images and prepare for inputs
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(
    conversations=conversation,
    images=pil_images,
    force_batchify=True,
    system_prompt=""
).to(vl_gpt.device, dtype=torch.bfloat16)
#print(prepare_inputs)
# run image encoder to get the image embeddings
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)

# run the model to get the response
outputs = vl_gpt.language.generate(
    #input_ids = prepare_inputs["input_ids"],
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True,
    #position_ids=torch.arange(inputs_embeds.shape[1], device=vl_gpt.device).unsqueeze(0)
)

answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(f"{prepare_inputs['sft_format'][0]}", answer)


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


<|User|>: <image>
<|ref|>The giraffe at the back.<|/ref|>.

<|Assistant|>: The giraffe at the back.[[580, 270, 999, 900]]


# test

In [18]:
!LLM_DEVICE=cuda LLM_MODEL_NAME_OR_PATH=/content/models/deepseek-ai/deepseek-vl2-tiny \
    python -m unittest test.core.llm.test_transformers_v_deepseek.TestTransformersVJanus.test_chat_completion_prompts

Python version is above 3.10, patching the collections module.
2025-02-06 14:30:18,071 - git.cmd - DEBUG - /usr/local/lib/python3.11/dist-packages/git/cmd.py:1253 - execute - Popen(['git', 'version'], cwd=/content/achatbot, stdin=None, shell=False, universal_newlines=False)
2025-02-06 14:30:18,073 - git.cmd - DEBUG - /usr/local/lib/python3.11/dist-packages/git/cmd.py:1253 - execute - Popen(['git', 'version'], cwd=/content/achatbot, stdin=None, shell=False, universal_newlines=False)
2025-02-06 14:30:18,116 - wandb.docker.auth - DEBUG - /usr/local/lib/python3.11/dist-packages/wandb/docker/auth.py:50 - find_config_file - Trying paths: ['/root/.docker/config.json', '/root/.dockercfg']
2025-02-06 14:30:18,117 - wandb.docker.auth - DEBUG - /usr/local/lib/python3.11/dist-packages/wandb/docker/auth.py:57 - find_config_file - No config file found
2025-02-06 14:30:19,005 - chat-bot - INFO - /content/achatbot/src/common/factory.py:69 - get_engine_by_tag - use llm_transformers_manual_vision_deepse