<a href="https://colab.research.google.com/github/Carolina-Gpa/ml-experiments-template/blob/main/%5BGemma%5D_I_am_replacing_myself_with_an_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### ***If you can't beat 'em, join 'em...***
**Author: [Carl McBride Ellis](https://www.kaggle.com/carlmcbrideellis)** ([LinkedIn](https://www.linkedin.com/in/carl-mcbride-ellis/))

Looking at the state of the Kaggle forums of late, with [an explosion of AI Generated Text](https://www.kaggle.com/discussions/general/398579), I think the time has come to replace myself with the [Gemma LLM](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf) and from now on dedicate my time to more worthwhile endeavors.
The quality of my replies will now be even worse than usual, and I will receive fewer medals, but I was never in it for the medals in the first place.

Here we shall fine tune the [Gemma](https://www.kaggle.com/models/google/gemma) model with my historical replies scraped from the [Meta Kaggle](https://www.kaggle.com/datasets/kaggle/meta-kaggle) dataset.
This work is extremely heavily based of the following two magnificent notebooks by [Nilay Chauhan](https://www.kaggle.com/nilaychauhan)
* [Get started with Gemma using KerasNLP](https://www.kaggle.com/code/nilaychauhan/get-started-with-gemma-using-kerasnlp)
* [Fine-tune Gemma models in Keras using LoRA](https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora)

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', 256)
import re
# from https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora

# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
!pip install -q -U keras-nlp
!pip install -q -U keras>=3

import os

os.environ["KERAS_BACKEND"] = "jax"  # Or "torch" or "tensorflow".
# Avoid memory fragmentation on JAX backend.
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"

import keras
import keras_nlp

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/465.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.4/465.3 kB[0m [31m4.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m465.3/465.3 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m950.8/950.8 kB[0m [31m40.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m55.0 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.15.0 requires keras<2.16,>=2.15.0, but you have keras 3.0.5 which is incompatible.[0m[31m
[0m

In [None]:
%%time

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")

### Before...

In [None]:
print(gemma_lm.generate("Will there be a private leaderboard shakeup?", max_length=256))

### Create a dataset of my Kaggle conversations obtained from [Meta Kaggle](https://www.kaggle.com/datasets/kaggle/meta-kaggle)

In [None]:
# Get my Kaggle User Id
Users = pd.read_csv("/kaggle/input/meta-kaggle/Users.csv")
User_Id = Users.query('UserName == "carlmcbrideellis"')["Id"].item()

In [None]:
ForumMessages = pd.read_csv("/kaggle/input/meta-kaggle/ForumMessages.csv")
CME_posts = ForumMessages[(ForumMessages['PostUserId'] == User_Id)] #["Message"]
# Select only my posts that were replies
CME_Response = CME_posts[CME_posts['ReplyToForumMessageId'].notna()]
CME_Response = CME_Response.rename(columns={'Message': 'Response'})
# and now get the correponding "prompt"
CME_Response = CME_Response.merge(ForumMessages, left_on='ReplyToForumMessageId', right_on='Id')
CME_Response = CME_Response.rename(columns={'Message': 'Instruction'})
data = CME_Response[["Instruction","Response"]].copy()

how many conversations are there?

In [None]:
data.shape[0]

OK, almost 2 thousand

In [None]:
# do some basic cleaning

# remove any HTML/Markdown tags
data["Instruction"] = data["Instruction"].str.replace(r'<[^<>]*>', '', regex=True)
# remove any newline
data["Instruction"] = data["Instruction"].str.replace(r'\n',' ', regex=True)
# remove any @user tags
data["Instruction"] = data["Instruction"].str.replace(r'(?<=\s)@[\w]+|(?<=^)@[\w]+', '', regex=True)

# repeat same cleaning for the Response column as well
data["Response"] = data["Response"].str.replace(r'<[^<>]*>', '', regex=True)
data["Response"] = data["Response"].str.replace(r'\n',' ', regex=True)
data["Response"] = data["Response"].str.replace(r'(?<=\s)@[\w]+|(?<=^)@[\w]+', '', regex=True)

In [None]:
# take a look
data.tail(10)

In [None]:
CME_dataset = []

for index, row in data.iterrows():
    instruction, response = row['Instruction'], row['Response']
    template = (f"Instruction:\n{instruction}\n\nResponse:\n{response}")
    CME_dataset.append(template)

### LoRA fine-tuning

In [None]:
# Enable LoRA for the model and set the LoRA rank to 64.
gemma_lm.backbone.enable_lora(rank=64)

In [None]:
# Limit the input sequence length to 512 (to control memory usage).
gemma_lm.preprocessor.sequence_length = 512
# Use AdamW (a common optimizer for transformer models).
optimizer = keras.optimizers.AdamW(
    learning_rate=5e-5,
    weight_decay=0.01,
)
# Exclude layernorm and bias terms from decay.
optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])

gemma_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=optimizer,
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

In [None]:
%%time

gemma_lm.fit(CME_dataset, epochs=1, batch_size=1)

### ...and now...

In [None]:
print(gemma_lm.generate("Will there be a private leaderboard shakeup?", max_length=256))

meh, I think that should do the trick...

Now, back to that tabular dataset notebook I was working on!

### Related reading
* [Gemma: Open Models Based on Gemini Research and Technology](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)
* [RAG using Llama 2, Langchain and ChromaDB](https://www.kaggle.com/code/gpreda/rag-using-llama-2-langchain-and-chromadb) by [Gabriel Preda](https://www.kaggle.com/gpreda)
* [Gabriel Preda "*Developing Kaggle Notebooks*", Packt Publishing Limited (2023)](https://www.packtpub.com/product/developing-kaggle-notebooks/9781805128519) (Chapter 10)
* [Building A Transformer (GPT) From Scratch](https://www.kaggle.com/code/kevinbnisch/building-a-transformer-gpt-from-scratch) by [TheItCrOw](https://www.kaggle.com/kevinbnisch)
* [Get started with Gemma using KerasNLP](https://www.kaggle.com/code/nilaychauhan/get-started-with-gemma-using-kerasnlp) by [Nilay Chauhan](https://www.kaggle.com/nilaychauhan)
* [Fine-tune Gemma models in Keras using LoRA](https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora) by [Nilay Chauhan](https://www.kaggle.com/nilaychauhan)