##### Copyright 2024 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Fine-Tuning Gemma for Retrieval-Augmented Generation with JORA

Scaling Large Language Models (LLMs) for retrieval-based tasks, particularly in Retrieval-Augmented Generation (RAG), poses significant memory challenges, especially when fine-tuning extensive prompt sequences.

[Gemma](https://ai.google.dev/gemma) is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Existing open-source libraries support full-model inference and fine-tuning across multiple GPUs but often fall short in efficiently distributing parameters required for retrieved context. To address this limitation, [JORA](https://github.com/aniquetahir/JORA) introduced a novel framework for Parameter-Efficient Fine-Tuning (PEFT) of Llama/Gemma models using distributed training, leveraging [JAX](https://jax.readthedocs.io/en/latest/). This framework uniquely utilizes JAX's just-in-time (JIT) compilation and tensor-sharding for efficient resource management, enabling accelerated fine-tuning with reduced memory requirements. This advancement significantly improves the scalability and feasibility of fine-tuning LLMs for complex RAG applications, even on systems with limited GPU resources.

The experiments demonstrate more than **12x improvement in runtime** compared to [Hugging Face](https://huggingface.co/docs/transformers/en/main_classes/trainer)/[DeepSpeed](https://github.com/microsoft/DeepSpeed) implementations with four GPUs while consuming less than half the VRAM per GPU.

In this tutorial, you will understand the end-to-end process of fine-tuning a [Gemma](https://github.com/google/gemma) model using JORA and converting the trained model back to the [Hugging Face](https://huggingface.co/) format for inference.

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_2]Finetune_with_JORA.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>
<br><br>

[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)]("https://www.kaggle.com/notebooks/welcome?src=https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_2]Finetune_with_JORA.ipynb")

## Setup


### Selecting the Runtime Environment

To start, you can choose either **Google Colab** or **Kaggle** as your platform. Select one, and proceed from there.

- #### **Google Colab** <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Google_Colaboratory_SVG_Logo.svg/1200px-Google_Colaboratory_SVG_Logo.svg.png" alt="Google Colab" width="30"/>

  1. Click **Open in Colab**.
  2. You'll need access to a [**Colab Pro/Pro+**](https://colab.research.google.com/signup) runtime with sufficient resources to run the Gemma model.
  3. In the menu, go to **Runtime** > **Change runtime type**.
  4. Ensure that the **GPU** is set to **A100**.

- #### **Kaggle** <img src="https://upload.wikimedia.org/wikipedia/commons/7/7c/Kaggle_logo.png" alt="Kaggle" width="40"/>

  1. Click **Open in Kaggle**.
  2. Click on **Session options** in the right sidebar.
  3. Under **Accelerator**, select **GPU T4 x2**.
     - Note: This instance comes with **15 GB x2** (15 GB for each T4 GPU) of VRAM and **30 GB** of RAM.
  4. Save the settings, and the notebook will restart with GPU support.

### Gemma setup

#### **Kaggle Models**

To complete this tutorial and download and fine-tune using the necessary Kaggle Gemma Flax models, you'll first need to complete the setup instructions at [Gemma setup](https://ai.google.dev/gemma/docs/setup). The Gemma setup instructions show you how to do the following:

* Get access to Gemma on kaggle.com.
* Select a Colab/Kaggle runtime with sufficient resources to run
  the Gemma model.
* You'll generate and configure a Kaggle username and an API key as Colab secrets later in the guide.

#### **Hugging Face Hub**

You'll also be logging in to Hugging Face Hub to download the exact Gemma model used while fine-tuning so that you can convert the Flax model to the Hugging Face format and run inference later. Let's get you set up with Gemma:

1. **Hugging Face Account:**  If you don't already have one, you can create a free Hugging Face account by clicking [here](https://huggingface.co/join).
2. **Gemma Model Access:** Head over to the [Gemma model page](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b) and accept the usage conditions.
3. **Colab/Kaggle with Gemma Power:**  For this tutorial, you'll need a Colab/Kaggle runtime with enough resources to handle the Gemma model. Choose an appropriate runtime when starting your Colab/Kaggle session.
4. **Hugging Face Token:**  Generate a Hugging Face access (preferably `write` permission) token by clicking [here](https://huggingface.co/settings/tokens). This token will come in handy later.

After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment.

### Configure Your Credentials

To access private models and datasets, you need to log in to the Hugging Face (HF) and Kaggle ecosystem.

- #### **Google Colab** <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Google_Colaboratory_SVG_Logo.svg/1200px-Google_Colaboratory_SVG_Logo.svg.png" alt="Google Colab" width="30"/>
  If you're using Colab, you can securely store your Hugging Face token (`HF_TOKEN`) using the Colab Secrets manager:
  1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel. <img src="https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg" alt="The Secrets tab is found on the left panel." width=50%>
  2. **Add Hugging Face Token**:
    - Create a new secret with the **name** `HF_TOKEN`.
    - Copy/paste your token key into the **Value** input box of `HF_TOKEN`.
    - **Toggle** the button on the left to allow notebook access to the secret
  3. **Add Kaggle Token**:
    - Same as before, but you repeat it for `KAGGLE_USERNAME` and `KAGGLE_KEY`.


- #### **Kaggle** <img src="https://upload.wikimedia.org/wikipedia/commons/7/7c/Kaggle_logo.png" alt="Kaggle" width="40"/>
  To securely use your Hugging Face token (`HF_TOKEN`) in this notebook, you'll need to add it as a secret in your Kaggle environment:  
  1. Open your Kaggle notebook and locate the **Addons** menu at the top in your notebook interface.
  2. Click on **Secrets** to manage your environment secrets.  
  <img src="https://i.imgur.com/vxrtJuM.png" alt="The Secrets option is found at the top." width=50%>
  3. **Add Hugging Face Token**:
      - Click on the **Add secret** button.
      - In the **Label** field, enter `HF_TOKEN`.  
      - In the **Value** field, paste your Hugging Face token.
      - Click **Save** to add the secret.
  4. **Add Kaggle Token**:
      - Same as before, but you repeat it for `KAGGLE_USERNAME` and `KAGGLE_KEY`.

In [None]:
import os
import sys

if 'google.colab' in sys.modules:
    from google.colab import userdata
    # Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
    # vars as appropriate for your system.
    os.environ["HF_TOKEN"] = userdata.get("HF_TOKEN")
    os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
    os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
elif os.path.exists('/kaggle/working'):
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    os.environ['HF_TOKEN'] = user_secrets.get_secret("HF_TOKEN")
    os.environ["KAGGLE_USERNAME"] = user_secrets.get_secret('KAGGLE_USERNAME')
    os.environ["KAGGLE_KEY"] = user_secrets.get_secret('KAGGLE_KEY')
else:
    raise RuntimeError(
        "Unsupported runtime environment detected.\n"
        "This notebook currently supports execution on Google Colab or Kaggle.\n"
        "Please ensure you are running in one of these environments.\n"
        "If you are running locally or on a different platform, manually set the following environment variables:\n"
        " - HF_TOKEN\n"
        " - KAGGLE_USERNAME\n"
        " - KAGGLE_KEY\n\n"
        "You can set environment variables in your terminal or within your Python notebook before running any cells."
    )

# Disable progress bar to prevent verbose logging by kagglehub
os.environ["TQDM_DISABLE"] = "1"

### Clone **JORA** and install dependencies

In [None]:
# Clone the JORA repository and install the requirements
!git clone https://github.com/aniquetahir/JORA.git
%cd JORA
!pip install -q -e .

# Install google-deepmind/gemma as it's a required dependency for JORA
!pip install -q git+https://github.com/google-deepmind/gemma.git

# Install the appropriate JAX version
JAX_VERSION = "0.4.33"
!pip install -U --pre -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html \
  jax==$JAX_VERSION jaxlib==$JAX_VERSION \
  jax-cuda12-plugin[with_cuda]==$JAX_VERSION jax-cuda12-pjrt==$JAX_VERSION

Cloning into 'JORA'...
remote: Enumerating objects: 299, done.[K
remote: Counting objects: 100% (299/299), done.[K
remote: Compressing objects: 100% (216/216), done.[K
remote: Total 299 (delta 151), reused 203 (delta 71), pack-reused 0 (from 0)[K
Receiving objects: 100% (299/299), 6.99 MiB | 17.66 MiB/s, done.
Resolving deltas: 100% (151/151), done.
/content/JORA
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.2/87.2 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.1/57.1 MB[0m [31m42.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.1/320.1 kB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.9/94.9 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Import the dependencies

In [None]:
# Patch JORA's initialisation.py file to be compatible with the latest JAX version

!sed -i "s/jax\.config\.update('jax_default_matmul_precision', *jax\.lax\.Precision\.HIGHEST)/jax.config.update('jax_default_matmul_precision', 'bfloat16')/" jora/lib/proc_init_utils/initialisation.py

In [None]:
import kagglehub
import jax
import jora
import pathlib
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import snapshot_download

## Download the Gemma Model

Now, you can download the Gemma model using `kagglehub`:

In [None]:
VARIANT = "gemma2-2b-it"
GEMMA_PATH = kagglehub.model_download(f'google/gemma-2/Flax/{VARIANT}')
print('GEMMA_PATH:', GEMMA_PATH)

Downloading 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

Downloading from https://www.kaggle.com/api/v1/models/google/gemma-2/Flax/gemma2-2b-it/1/download/gemma2-2b-it/_METADATA...
Downloading from https://www.kaggle.com/api/v1/models/google/gemma-2/Flax/gemma2-2b-it/1/download/gemma2-2b-it/_CHECKPOINT_METADATA...
Downloading from https://www.kaggle.com/api/v1/models/google/gemma-2/Flax/gemma2-2b-it/1/download/gemma2-2b-it/ocdbt.process_0/d/bf69258061ae5f35eb7a5669fe6877d4...
Downloading from https://www.kaggle.com/api/v1/models/google/gemma-2/Flax/gemma2-2b-it/1/download/gemma2-2b-it/d/b5a4695f4be0a2f41ec1e25616ebd7e7...
Downloading from https://www.kaggle.com/api/v1/models/google/gemma-2/Flax/gemma2-2b-it/1/download/gemma2-2b-it/ocdbt.process_0/d/834bb4bf1e3854eb09f6208c95c071b2...
Downloading from https://www.kaggle.com/api/v1/models/google/gemma-2/Flax/gemma2-2b-it/1/download/gemma2-2b-it/descriptor/descriptor.pbtxt...
Downloading from https://www.kaggle.com/api/v1/models/google/gemma-2/Flax/gemma2-2b-it/1/download/gemma2-2b-it/ocdbt.pro

In [None]:
# Note: JORA only supports loading Gemma and Gemma 1.1 models at the moment
# Let's add an entry for `gemma2-2b-it` so that the Gemma 2 model can be
# discoverable by JORA

# Allow JORA to discover the newly downloaded Gemma 2 model
JORA_GEMMA_VERSIONS = jora.lib.gemma.gemma_config.GEMMA_VERSIONS
JORA_GEMMA_VERSIONS = JORA_GEMMA_VERSIONS.add('gemma2-2b-it')
print(jora.lib.gemma.gemma_config.GEMMA_VERSIONS)

JORA_GEMMA_MODEL_MAPPING = jora.lib.gemma.common.model_config_mapping
JORA_GEMMA_MODEL_MAPPING = JORA_GEMMA_MODEL_MAPPING.update({
    'gemma2-2b-it': jora.lib.gemma.gemma_config.GemmaConfig2B
})
print(jora.lib.gemma.common.model_config_mapping)

{'7b-it', '2b-it', 'gemma2-2b-it', '7b', '2b'}
{'2b': GemmaConfig(n_heads=8, n_kv=1), '2b-it': GemmaConfig(n_heads=8, n_kv=1), '7b': GemmaConfig(n_heads=16, n_kv=16), '7b-it': GemmaConfig(n_heads=16, n_kv=16), '1.1-2b-it': GemmaConfig(n_heads=8, n_kv=1), '1.1-7b-it': GemmaConfig(n_heads=16, n_kv=16), 'gemma2-2b-it': GemmaConfig(n_heads=8, n_kv=1)}


**Note:** By default, `kagglehub` stores the model in the `~/.cache/kagglehub` directory.

Verify that JAX recognizes the GPU devices:

In [None]:
print(jax.devices())

[CudaDevice(id=0)]


## Configure JORA and Prepare the Dataset

Here, you'll configure the Gemma model and also the training process for **LoRA** fine-tuning.

In order to fine-tune Gemma, you will use the **Alpaca** dataset. Ensure you have the dataset file `alpaca_data_cleaned.json` in the appropriate directory. You can download it from [here](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data_cleaned.json) or use the one that's bundled in the repository. For demonstration purposes, let's use the bundled one.

**Credits:** [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json)

The `generate_alpaca_dataset` function is used to generate the dataset from an Alpaca format JSON file. This helps with instruct format training since the dataset processing, tokenization, and batching is handled by the library. Alternatively, torch `Dataset` and `DataLoader` can be used for custom datasets.


In [None]:
# Configure the model and training parameters
config = jora.ParagemmaConfig(
    # Feel free to tweak these parameters
    N_EPOCHS=1,
    LORA_R=8,
    # Note: The `LORA_DROPOUT` parameter is currently not configurable.
    # https://github.com/aniquetahir/JORA?tab=readme-ov-file#contributing
    LORA_ALPHA=16,
    LR=1e-5,
    BATCH_SIZE=2,
    N_ACCUMULATION_STEPS=8,
    GEMMA_MODEL_PATH=GEMMA_PATH,
    MAX_SEQ_LEN=512,
    MODEL_VERSION=VARIANT
)

# Path to the Alpaca dataset
dataset_path = 'jora/alpaca_data_cleaned.json'

# Generate the dataset with a 20% split for prototyping.
# When running on Kaggle, set split_percentage to 0.005 to use a smaller subset
# for quicker demonstration purposes.
dataset = jora.generate_alpaca_dataset_gemma(
    dataset_path, 'train', config,
    # Change the split percentage to '0.005` if you're on Kaggle
    split_percentage=0.2,
    alpaca_mix=0.3
)

Processing data...


The `ParagemmaConfig` class is used to set up the configuration for training while `generate_alpaca_dataset_gemma` processes the dataset, handles tokenization, and prepares it for training.

In [None]:
config

ParagemmaConfig(GEMMA_MODEL_PATH='/root/.cache/kagglehub/models/google/gemma-2/Flax/gemma2-2b-it/1', MODEL_VERSION='gemma2-2b-it', NUM_SHARDS=None, LORA_R=8, LORA_ALPHA=16, LORA_DROPOUT=0.05, LR=1e-05, BATCH_SIZE=2, N_ACCUMULATION_STEPS=8, MAX_SEQ_LEN=512, N_EPOCHS=1, SEED=420, CACHE_SIZE=30)

## Fine-tune Gemma with **JORA**

Now, you can proceed to fine-tuning the model using the `train_lora_gemma` function which initiates the fine-tuning process using LoRA (Low-Rank Adaptation). The checkpoints will be saved in the folder specified by `checkpoint_path`.

In [None]:
# Path to the trained LoRA weights
checkpoint_path = 'checkpoints'
jora.train_lora_gemma(config, dataset, checkpoint_path)

Successfully loaded and sharded model parameters!


Output()

**Note**: Fine-tuning on the entire dataset can be time-consuming and may exceed available GPU quotas on **Kaggle** or consume significant compute units on **Google Colab**. Using a smaller split helps in managing resource usage and staying within platform-imposed limits.

## Convert the model to the **Hugging Face Format**

After fine-tuning, you need to convert the trained model to the Hugging Face format for compatibility with the Hugging Face ecosystem so that you can easily run inference later.

**Usage:**

```python
lorize_huggingface(HUGGINGFACE_PATH, JAX_PATH, SAVE_PATH, gemma=True)
```

- **HUGGINGFACE_PATH**: Path to the Hugging Face Gemma model (the base model before fine-tuning).
- **JAX_PATH**: Path to the LoRA merged parameters (the trained LoRA weights).
- **SAVE_PATH**: Path to save the fine-tuned Hugging Face Gemma model.
- **gemma**: Flag indicating you're working with a Gemma model.

First, specify the paths:

In [None]:
# Specify the repository
repo_id = "google/gemma-2-2b-it"
local_dir = 'pretrained'

snapshot_download(
    repo_id=repo_id,
    local_dir=local_dir,
    revision="main",
    ignore_patterns=['*.gguf']
)

HUGGINGFACE_PATH = local_dir
JAX_PATH = 'checkpoints/jax_lora_final.pickle'
SAVE_PATH = 'gemma-ft'

Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/29.1k [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/1.57k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

Then, run the converter:

In [None]:
from jora.hf.__main__ import lorize_huggingface

lorize_huggingface(HUGGINGFACE_PATH, JAX_PATH, SAVE_PATH, gemma=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

model loaded
model saved to gemma-ft


- The `jora.hf` module converts the JAX-trained model back to the Hugging Face format.
- It merges the LoRA weights with the original model parameters.
- The converted model is saved in the specified `SAVE_PATH`.

## Load the Model and Generate Text

Finally, you can load the converted model using Hugging Face's Transformers library.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(HUGGINGFACE_PATH)
model = AutoModelForCausalLM.from_pretrained(SAVE_PATH, device_map="auto")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



Here, both the tokenizer and the model are first loaded and then the model is moved automatically to the appropriate device. Finally, you generate text using the model while relying on the Alpaca prompt format:

In [None]:
# Define the Alpaca prompt template
alpaca_prompt = """\
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
"""

# Function to generate response
def generate_response(instruction, input_text="", max_new_tokens=384):
    prompt = alpaca_prompt.format(instruction, input_text)
    device = "cuda"
    inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
    outputs = model.generate(inputs, max_new_tokens=max_new_tokens)
    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(text)

In [None]:
generate_response(
    instruction="Identify 3 common mistakes in the following sentence. Suggest changes.",
    input_text="She seems to believe that the real key to sucsess is working smart and hard."
)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Identify 3 common mistakes in the following sentence. Suggest changes.

### Input:
She seems to believe that the real key to sucsess is working smart and hard.

### Response:
1. "sucsess" should be "success"
2. "seems to believe" is a weak phrase.
3. "working smart and hard" is a cliché.


In [None]:
generate_response(
    instruction="Make a prediction about what will happen in the next paragraph.",
    input_text="Mary had been living in the small town for many years and had never seen anything like what was coming.",
)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Make a prediction about what will happen in the next paragraph.

### Input:
Mary had been living in the small town for many years and had never seen anything like what was coming.

### Response:
She will be surprised by the event.


In [None]:
generate_response(
    instruction="Identify a suitable <verb> in the following sentence.",
    input_text="The cat <verb> in the garden.",
)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Identify a suitable <verb> in the following sentence.

### Input:
The cat <verb> in the garden.

### Response:
played


In [None]:
generate_response(
    instruction="Explain why the quote is appropriate or not for a yoga class.",
    input_text="Don't quit. Suffer now and live the rest of your life as a champion.",
)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Explain why the quote is appropriate or not for a yoga class.

### Input:
Don't quit. Suffer now and live the rest of your life as a champion.

### Response:
This quote is not appropriate for a yoga class because it promotes a competitive mindset and ignores the importance of self-compassion and acceptance.


## Push the model to your Hugging Face Hub


Optionally, Hugging Face allows to you easily store trained models in their hub.

In [None]:
# Note: The token needs to have "write" permission
#       You can check it here:
#       https://huggingface.co/settings/tokens
# Uncomment and run this if you wish to publish the model to Hugging Face Hub
# model.push_to_hub("my-gemma-finetuned-model")

In this tutorial, you have learnt how to fine-tune a Gemma model using JORA and convert it to the Hugging Face model format for inference. By leveraging JAX's JIT compilation and tensor-sharding capabilities, you can achieve efficient resource management, enabling accelerated fine-tuning with reduced memory requirements.