# Google Enhanced Multimodal Machine Learning 2

Google Enhanced Multimodal Machine Learning, GEMMA2 for short, is an AI model designed to process multiple data modalities.

## 🛠️ Supported Hardware

This notebook can run in a CPU or in a GPU.

✅ AMD Instinct™ Accelerators  
✅ AMD Radeon™ RX/PRO Graphics Cards  
⚠️ AMD EPYC™ Processors  
⚠️ AMD Ryzen™ (AI) Processors  

Suggested hardware: **AMD Instinct™ Accelerators**, this notebook can run in a CPU as well but inference is CPU will be slow.

## ⚡ Recommended Software Environment

::::{tab-set}

:::{tab-item} Linux
- [Install Docker container](https://amdresearch.github.io/aup-ai-tutorials//env/env-gpu.html)
- [Install PyTorch](https://amdresearch.github.io/aup-ai-tutorials//env/env-cpu.html)
:::

::::

## 🎯 Goals

- Show you how to download a model from HuggingFace
- Run Gemma2 on an AMD platform
- Prompt the model

:::{seealso}
- [Gemma2](https://huggingface.co/google/gemma-2-9b)
- [Gemma 2: Improving Open Language Models at a Practical Size](https://arxiv.org/abs/2408.00118)
- [Gemma: Open Models Based on Gemini Research and Technology](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)
:::

## 🚀 Run Gemma2 on an AMD Platform

Import the necessary packages

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

To use Gemma2, you need to [request access](https://huggingface.co/google/gemma-2-9b) as well as to login into Hugging Face in this notebook to use it.

You will need to [Authenticate](https://huggingface.co/docs/huggingface_hub/en/quick-start#authentication) with your [Hugging Face access token](https://huggingface.co/docs/hub/en/security-tokens), either enter it as an argument to the `login` function. Or, enter it on the text box. Make sure you uncheck the `Add token as git credential`.

In [None]:
from huggingface_hub import login
login(token=None)

Check if GPU is available for acceleration.

```{note}
Running the model on a GPU is strongly recommended. If your device is `cpu`, the model token generation will be slow.
```

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'{device=}')

Download the model

In [None]:
model_id = "google/gemma-2-9b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map=device,
    torch_dtype=torch.bfloat16
)

Ask Gemma2 to write a poem about ML. Note how we tokenize the prompt and then we de-tokenize the model response.

In [None]:
prompt_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(prompt_text, return_tensors="pt").to(device)

outputs = model.generate(**input_ids, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You can also check the raw tokens generated by the model

In [None]:
outputs[0][:32]

----------
Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved.

SPDX-License-Identifier: MIT