<img src="https://storage.googleapis.com/dm-educational/assets/ai_foundations/GDM-Labs-banner-image-C6-white-bg.png">

# Lab: Hitting a Wall

<a href='https://colab.research.google.com/github/google-deepmind/ai-foundations/blob/master/course_7/gdm_lab_7_3_hitting_a_wall.ipynb' target='_parent'><img src='https://colab.research.google.com/assets/colab-badge.svg' alt='Open In Colab'/></a>

Investigate the hardware limitations of training large models and experience first-hand out-of-memory errors.

15 minutes

## Overview

Earlier in this course, you saw that larger models like Gemma-4B can produce higher-quality generations than smaller models like Gemma-1B. What would it take to fine-tune a large model like Gemma-4B yourself?

In the previous course, you successfully fine-tuned Gemma-1B using LoRA. In this lab, you will attempt to apply the same fine-tuning process to the larger Gemma-4B model. This exercise is designed to give you a first-hand experience of the "memory wall". This describes a critical hardware limitation, which, as you will observe, motivates the need for the efficiency techniques you are learning in this course.

### What you will learn

By the end of this lab, you will be able to:

* Describe why even with LoRA you cannot fine-tune bigger models.

* The motivation for using memory-saving techniques.

## Tasks


**In this lab, you will**:

* Attempt to fine-tune the base Gemma-4B model.

* Encounter and reflect on the resulting out-of-memory (OOM) error.

 **This lab needs to be run on a GPU. Choose a T4 GPU.** See the section "How to use Google Colaboratory (Colab)" below for instructions on how to do this.

 **You also need a Kaggle account** to download the weights of the Gemma 3 model. See the section "Set up a Kaggle account" for instructions on how to do this.


## How to use Google Colaboratory (Colab)

Google Colaboratory (also known as Google Colab) is a platform that allows you to run Python code in your browser. The code is written in **cells** that are excuted on a remote server.

To run a cell, hover over a cell and click on the `run` button to its left. The run button is the circle with the triangle (â–¶). Alternatively, you can also click on a cell and use the keyboard combination Ctrl+Return (or âŒ˜+Return if you are using a Mac).

To try this out, run the following cell. This should print today's day of the week below it.

In [None]:
from datetime import datetime
print(f"Today is {datetime.today():%A}.")

Note that the **order in which you run the cells matters**. When you are working through a lab, make sure to always run all cells in order, otherwise the code might not work. If you take a break while working on a lab, Colab may disconnect you and in that case, you have to execute all cells again before  continuing your work. To make this easier, you can select the cell you are currently working on and then choose _Runtime â†’ Run before_  from the menu above (or use the keyboard combination Ctrl/âŒ˜ + F8). This will re-execute all cells before the current one.

### Using Colab with a GPU

Follow these steps to run the activities in this lab on a GPU:

1.  In the top menu bar, click on **Runtime**.
2.  Select **Change runtime type** from the dropdown menu.
3.  In the pop-up window under **Hardware accelerator**, select **T4 GPU**.
4.  Click **Save**.

Your Colab session will now restart with GPU access.

Note that access to GPUs is limited and at times, you may not be able to run this lab on a GPU. All activities will still work but they will run slower and you will have to wait longer for some of the cells to finish running.

## Set up a Kaggle account



To run this notebook, you will have to sign up for [Kaggle](https://www.kaggle.com), a platform that hosts datasets and models for machine learning. You will also need to sign the agreement for using the Gemma 3 model. This is required so that you can download the weights of the Gemma model for fine-tuning.

### Step 1: Create your Kaggle account

* Go to the Kaggle website: https://www.kaggle.com

* Click the "Register" button in the top-right corner.

* You can sign up using your Google account (recommended for easy Colab integration) or by entering an email and password.

* Follow the on-screen prompts to complete your registration and verify your email.

### Step 2: Sign the Gemma 3 model agreement

* Make sure you are logged into your new Kaggle account.

* Go directly to the Gemma 3 model card page: https://www.kaggle.com/models/keras/gemma3/keras/

* You should see a "Request Access" button.

* Click the button, read through the license agreement, and if you are happy with the terms click "Accept" to gain access to the model. You must do this before the API will let you download the model.

### Step 3: Generate your Kaggle API key

* From any Kaggle page, click on your profile picture or icon in the top-right corner.

* Select "Account" from the drop-down menu.

* Scroll down to the "API" section.

* Click the "Create New API Token" button.

* This will immediately download a file named `kaggle.json` to your computer. This file contains your username and your secret API key. Keep it safe.

### Step 4: Set your API Key in  Colab

* Click the "key" icon ðŸ”‘ in the left-hand sidebar.

* You will see the "Secrets" panel.

* Now, open the kaggle.json file you downloaded on your computer. It is a simple text file and will look like this:

   ```json
   {"username":"YOUR_KAGGLE_USERNAME","key":"YOUR_KAGGLE_API_KEY"}
   ```
* In the Colab Secrets panel, create two new secrets:

   1. Name: `KAGGLE_USERNAME`

      Value: Copy and paste `YOUR_KAGGLE_USERNAME` from your `kaggle.json` file.

   2. Name: `KAGGLE_KEY`

      Value: Copy and paste `YOUR_KAGGLE_API_KEY` from your `kaggle.json` file.

* For both secrets, make sure the "Notebook access" toggle is switched on.


## Imports

In this lab, you will use the Keras package for loading a Gemma model and the Pandas package, for loading the dataset.

Run the following cell to import the required packages.



In [None]:
%%capture
# Install the custom package for this course.
!pip install "git+https://github.com/google-deepmind/ai-foundations.git@main"

import os # For setting system variables.

from google.colab import userdata # For using Colab secrets.

os.environ["KAGGLE_USERNAME"] = userdata.get("KAGGLE_USERNAME")
os.environ["KAGGLE_KEY"] = userdata.get("KAGGLE_KEY")

os.environ["KERAS_BACKEND"] = "jax"  # Set the Keras backend to JAX.
# Disables the command buffer pre-allocation to free up memory.
os.environ["XLA_FLAGS"] = "--xla_gpu_enable_command_buffer="
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="0.9"

import keras # For defining and training models.
import keras_nlp # For loading the Keras implementation of Gemma.
import pandas as pd # For loading the dataset.
from textwrap import fill # For formatting long paragraphs.
from ai_foundations import formatting # For formatting the training data.

keras.utils.set_random_seed(812)  # For Keras layers.

## Fine-tune Gemma-4B

In the previous course, you learned to fine-tune Gemma-1B using LoRA for building a revison study flashcard generator. In this lab, you will apply the same fine-tuning techniques to fine-tune Gemma-4B.

Run the following cell to:

* Define the `format_question` function to format model inputs.
* Prepare the Africa Galore dataset for fine-tuning.

In [None]:
def format_question(
    question: str,
    sot = "<start_of_turn>",
    eot = "<end_of_turn>"
) -> str:
    """
    Formats a question for prompting the model and adds special delimiters at
    the start and end of the question.

    Args:
      text: The question to be formatted.
      sot: The token to mark the start of a turn.
      eot: The token to mark the end of a turn.

    Returns:
      Formatted string of the question.
    """

    formatted_q = f"{sot}user\n{question}{eot}\n"

    return formatted_q

# Load the question-answer dataset.
africa_galore_qa = pd.read_json(
    "https://storage.googleapis.com/dm-educational/assets/ai_foundations/africa_galore_qa_v2.json"
)

questions = []  # List of formatted questions.
answers = []  # List of formatted answers.

for idx, row in africa_galore_qa.iterrows():
    # Run the format_qa function from the previous lab to format the question
    # and the answer.
    question, answer = formatting.format_qa(row)
    questions.append(question)
    answers.append(answer)

# Show the first set of input and output.
print(questions[0])
print(fill(answers[0], replace_whitespace=False))

# Prepare the data dictionary for fine-tuning Gemma.
data = {
    "prompts": questions,
    "responses": answers
}

## Activity 1: Load Gemma-4B

------
> ðŸ’» **Your task**:
>
> Run the next cell to load in the base Gemma-4B model with full 32-bit precision, where all numbers are represented as 32bit floating point numbers.
>
> Does the cell run successfully? Reflect on the output and why this might be happening.
>
------

In [None]:
# Load the Gemma3-4B Keras model.
model = keras_nlp.models.Gemma3CausalLM.from_preset(preset="gemma3_4b_text")
model.summary()

### What did you observe?

You should have encountered an error saying `RESOURCE_EXHAUSTED` as well a message indicating how much memory the process attempted to allocate.

------
> **ðŸ’­ Reflection:**
>
>Given this output, briefly reflect on these questions:
>
>1. What do you think the error means at a high level?
>
>2. Based on what you have read so far about GPU memory consumers, write down a hypothesis for why this happens.
>
------

### Out-of-memory errors

The `RESOURCE_EXHAUSTED` error is the technical name for an out-of-memory (OOM) error. It means that the GPU ran out of memory while trying to load the model.

This error highlights that for a model of this size, you cannot even load the model with standard settings. However, this issue is not insurmountable. It is the exact problem that modern efficiency techniques are designed to solve. In the next article, you will learn what takes up so much memory in the GPU. Later on, you will also observe how using techniques like mixed precision can help you to overcome this memory wall.

In case you are wondering why it was possible to load the Gemma-4B model in the first lab of this course, note that that lab already used some memory efficiency techniques in the background.

## Summary

This lab provided a practical demonstration of the hardware limitations involved in training large models. You experienced what it is like to hit the "memory wall" again. You further observed that when attempting to fine-tune a large model, it can fail with an out-of-memory error even at the point of loading the parameter weights. In the next activity, you will learn more about the main components that consume of GPU memory.