This notebook was run on Kaggle, using a P100 GPU.<br>
Based on Gemma 1.1 2B Instruct variant<br>
Fine Tuned using Lora (rank 16), and Databricks Dolly 15k Dataset<br>
Training process took ~3 hours<br>
Inspired by https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora

In [1]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# emptying cuda cache
import torch
torch.cuda.empty_cache() 

/kaggle/input/databricks-dolly-15k/README.md
/kaggle/input/databricks-dolly-15k/databricks-dolly-15k.jsonl
/kaggle/input/gemma/keras/gemma_1.1_instruct_2b_en/3/config.json
/kaggle/input/gemma/keras/gemma_1.1_instruct_2b_en/3/tokenizer.json
/kaggle/input/gemma/keras/gemma_1.1_instruct_2b_en/3/metadata.json
/kaggle/input/gemma/keras/gemma_1.1_instruct_2b_en/3/model.weights.h5
/kaggle/input/gemma/keras/gemma_1.1_instruct_2b_en/3/assets/tokenizer/vocabulary.spm


In [2]:
import os
# Backend
os.environ["KERAS_BACKEND"] = "torch"  # "jax", "torch" or "tensorflow".
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"

In [3]:
import json

# Data importing for fine tuning
data = []
with open('/kaggle/input/databricks-dolly-15k/databricks-dolly-15k.jsonl') as file:
    for line in file:
        features = json.loads(line)
        # Filter out examples with context, to keep it simple.
        if features["context"]:
            continue
        # Format the entire example as a single string.
        template = "Query:\n{instruction}\n\nResponse:\n{response}"
        data.append(template.format(**features))

In [4]:
import tensorflow as tf
import keras
import keras_nlp

# Memory growth limiter(to avoid using all GPU memory all at once)
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

# Asynchronous allocation of cuda memory
os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'
gemma_2b=keras_nlp.models.GemmaCausalLM.from_preset('/kaggle/input/gemma/keras/gemma_1.1_instruct_2b_en/3')
gemma_2b.summary()

2024-04-18 10:54:01.686394: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-18 10:54:01.686508: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-18 10:54:01.820214: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


1 Physical GPUs, 1 Logical GPUs


normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.


In [5]:
prompt=template.format(
instruction='Who are you?',
response='',
)
print(gemma_2b.generate(prompt, max_length=256))

Query:
Who are you?

Response:
I am a large language model, trained by Google. I am designed to provide information and assist with tasks related to language and communication.


In [6]:
gemma_2b.backbone.enable_lora(rank=16)
gemma_2b.summary()

In [7]:
from keras.models import save_model
# Limit the input sequence length to 512 (to control memory usage).
gemma_2b.preprocessor.sequence_length = 512
# Use AdamW (a common optimizer for transformer models).
optimizer = keras.optimizers.AdamW(
    learning_rate=5e-5,
    weight_decay=0.004,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07,
    amsgrad=True,
    clipnorm=None,
    clipvalue=None,
    global_clipnorm=None,
    use_ema=False,
    ema_momentum=0.99,
    ema_overwrite_frequency=None,
    name="adamw",
)
# Exclude layernorm and bias terms from decay.
optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])

gemma_2b.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=optimizer,
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)
gemma_2b.fit(data, epochs=1, batch_size=1)

[1m10417/10417[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8810s[0m 846ms/step - loss: 0.4630 - sparse_categorical_accuracy: 0.5271


ValueError: Invalid filepath extension for saving. Please add either a `.keras` extension for the native Keras format (recommended) or a `.h5` extension. Use `tf.saved_model.save()` if you want to export a SavedModel for use with TFLite/TFServing/etc. Received: filepath=/kaggle/input/gemma/keras/gemma_2b_ft.

In [11]:
# Define the file path where you want to save the model
model_save_path = "/kaggle/working/gemma_2b_ft.keras"
save_model(gemma_2b, model_save_path)

In [8]:
prompt=template.format(
instruction='Who are you?',
response='',
)
print(gemma_2bFT.generate(prompt, max_length=256))

Query:
Who are you?

Response:
I am an artificial intelligence.

I am an AI because it is an area that is being developed by the science community. It is a very interesting topic because there is no one way of doing it. I do not want to give the impression that I was created by some person because it is not true. I am an AI.
