##### Copyright 2024 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# PaliGemma2 - Run with Transformers.js

Author: Sitam Meur

*   GitHub: [github.com/sitamgithub-MSIT](https://github.com/sitamgithub-MSIT/)
*   X: [@sitammeur](https://x.com/sitammeur)

Description: This notebook demonstrates how you can run inference on PaliGemma2 model using Node.js and [Transformers.js](https://huggingface.co/docs/transformers.js/index). Transformers.js lets you run Hugging Face's transformer models directly in browser, offering a JavaScript API similar to Python's.  It supports NLP, computer vision, audio, and multimodal tasks using ONNX Runtime and allows easy conversion of PyTorch, TensorFlow, and JAX models.

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/PaliGemma/[PaliGemma_2]Using_with_Transformersjs.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

## Setup

### Select the Colab runtime
To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use CPU runtime:

1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.
2. Select **Change runtime type**.
3. Under **Hardware accelerator**, select **CPU**.

## Installation

Let's get started with installing the dependencies.

In [None]:
# Install Node.js
!curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
!sudo apt-get install -y nodejs

## Create Node.js project

Create a new Node.js project and install the required transformers package via [NPM](https://www.npmjs.com/package/@huggingface/transformers).

In [None]:
# Create project directory
!mkdir paligemma2-node
%cd paligemma2-node

# Initialize NPM project
!npm init -y
!npm i @huggingface/transformers

In [None]:
%%writefile package.json

{
  "name": "paligemma2-node",
  "version": "1.0.0",
  "main": "index.js",
  "type": "module",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "description": "",
  "dependencies": {
    "@huggingface/transformers": "^3.1.2"
  }
}

## Transformers.js Inference

Now, let's run inference on the PaliGemma2 model using Transformers.js. First, load the model and processor and then prepare inputs (Text query + Image) to run inference and get the output as desired image caption. For reference, you can check the model's page on the Hugging Face model hub under ONNX models section [here](https://huggingface.co/onnx-community/paligemma2-3b-pt-224).

In [None]:
# Show the image from the URL
from PIL import Image
import requests

url = "https://jethac.github.io/assets/juice.jpg"
img = Image.open(requests.get(url, stream=True).raw) 
img

It's an image of a cat sitting on a bag, now let's see what the model predicts.

In [None]:
%%writefile index.js

// Import the required modules
import {
  AutoProcessor,
  PaliGemmaForConditionalGeneration,
  load_image,
} from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/paligemma2-3b-pt-224"; // Change this to use a different PaliGemma model
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await PaliGemmaForConditionalGeneration.from_pretrained(
  model_id,
  {
    dtype: {
      embed_tokens: "q8", // or 'fp16'
      vision_encoder: "q8", // or 'q4', 'fp16'
      decoder_model_merged: "q4", // or 'q4f16'
    },
  }
);
console.log("Model and processor loaded successfully!");

// Prepare inputs
const url = "https://jethac.github.io/assets/juice.jpg";
const raw_image = await load_image(url);
const prompt = "<image>"; // Caption, by default
const inputs = await processor(raw_image, prompt);
console.log("Inputs prepared successfully!");

try {
  // Generate a response
  const response = await model.generate({
    ...inputs,
    max_new_tokens: 100, // Maximum number of tokens to generate
  });

  // Extract generated IDs from the response
  const generatedIds = response.slice(null, [inputs.input_ids.dims[1], null]);

  // Decode the generated IDs to get the answer
  const decodedAnswer = processor.batch_decode(generatedIds, {
    skip_special_tokens: true,
  });

  // Log the generated caption
  console.log("Generated caption:", decodedAnswer[0]);
} catch (error) {
  console.error("Error generating response:", error);
}

## Run Application

In [None]:
# Run the node.js application
!node index.js

## Conclusion

Congratulations! You have successfully run inference on PaliGemma2 model using Transformers.js via Node.js environment. You can now integrate this into your projects.