<a href="https://www.kaggle.com/code/hipparkarrahul18/generating-essays-using-llms-gemma-2?scriptVersionId=226583879" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Installing Dependencies

In [1]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("nischaydnk/gemma-pytorch")

print("Path to dataset files:", path)

Path to dataset files: /kaggle/input/gemma-pytorch


In [2]:
!pip install --no-index --no-deps /kaggle/input/immutabledict/immutabledict-4.1.0-py3-none-any.whl
!pip install --no-index --no-deps /kaggle/input/sentencepiece-0-2-0-cp310-cp310-manylinux/sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

!mkdir /kaggle/working/gemma/
!cp /kaggle/input/gemma-pytorch/gemma_pytorch-main/gemma/* /kaggle/working/gemma/

Processing /kaggle/input/immutabledict/immutabledict-4.1.0-py3-none-any.whl
Installing collected packages: immutabledict
  Attempting uninstall: immutabledict
    Found existing installation: immutabledict 4.2.0
    Uninstalling immutabledict-4.2.0:
      Successfully uninstalled immutabledict-4.2.0
Successfully installed immutabledict-4.1.0
Processing /kaggle/input/sentencepiece-0-2-0-cp310-cp310-manylinux/sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
sentencepiece is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.


In [3]:
import sys 
sys.path.append("/kaggle/working/")

# Import Gemma Modules

In [4]:
from gemma.config import GemmaConfig, get_model_config
from gemma.model import  GemmaForCausalLM
from gemma.tokenizer import Tokenizer
from transformers import AutoTokenizer
import contextlib
import os
import torch
import random
import pandas as pd
random.seed(0)

# Select Gemma Model Variant

In [5]:
GEMMA_MODEL = '2b-it'
DEVICE = 'cuda'
MODEL_CONFIG = '2b-v2'
MODEL_DIR = "/kaggle/input/gemma-2/pytorch/gemma-2-2b-it/1"
CKPT_PATH = os.path.join(MODEL_DIR, f'model.ckpt')
TOKENIZER_PATH = os.path.join(MODEL_DIR, f'tokenizer.model')

In [6]:
TOKENIZER_PATH

'/kaggle/input/gemma-2/pytorch/gemma-2-2b-it/1/tokenizer.model'

# Loading Model and Model Config

In [7]:
# Set up model config
CONFIG = get_model_config(MODEL_CONFIG)
CONFIG.quant = 'quant' in GEMMA_MODEL
CONFIG.tokenizer = TOKENIZER_PATH
torch.set_default_dtype(CONFIG.get_dtype())

In [8]:
# Intialize the model and load the weights

device = torch.device(DEVICE)
model = GemmaForCausalLM(CONFIG)
model.load_weights(CKPT_PATH)
model = model.to(device).eval()
model.config.use_cache = False
model.config.pretraining_tp=1

# Creating Chat Template

In [9]:
# This the prompt format that model excepts
USER_CHAT_TEMPLATE = "<start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model\n"

# Sample Generation

In [10]:
max_seq_length = 1024
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, max_seq_length=max_seq_length)
EOS_TOKEN = tokenizer.eos_token
input_text = 'What is Data Science'

In [11]:
print('Chat Prompt:\n',USER_CHAT_TEMPLATE.format(prompt=input_text))
results=model.generate(
    USER_CHAT_TEMPLATE.format(prompt=input_text),
    device=DEVICE,
    output_len=128,
)

print(results)

Chat Prompt:
 <start_of_turn>user
What is Data Science<end_of_turn>
<start_of_turn>model

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. 

Think of it like detective work for information! Instead of searching for clues in a crime scene, data scientists look for valuable insights and patterns in massive datasets.

**Here's a breakdown of key components:**

**1. Data Collection:** This is the first step. Data scientists gather information from diverse sources like websites, databases, sensors, and social media based on their research question.

**2. Data Cleaning and Pre-processing:**  Raw data often comes with inconsistencies, errors, and


# Designing my own prompt

In [12]:
prompt=(
    "<start_of_turn>user\nGenerate an essay for the following topic in 100 words:{topic_name}."
     "<end_of_turn>\n<start_of_turn>model\n"
    )

In [13]:
test=pd.read_csv('/kaggle/input/llms-you-cant-please-them-all/test.csv')
test

Unnamed: 0,id,topic
0,1097671,Compare and contrast the importance of self-re...
1,1726150,Evaluate the effectiveness of management consu...
2,3211968,Discuss the role of self-reliance in achieving...


In [14]:
sample_sub=pd.read_csv('/kaggle/input/llms-you-cant-please-them-all/sample_submission.csv')
sample_sub

Unnamed: 0,id,essay
0,1097671,Plucrarealucrarealucrarealucrarealucrarealucra...
1,1726150,Plucrarealucrarealucrarealucrarealucrarealucra...
2,3211968,Plucrarealucrarealucrarealucrarealucrarealucra...


In [15]:
test.loc[0,'topic']

'Compare and contrast the importance of self-reliance and adaptability in healthcare.'

# Baseline model

In [16]:
predictions = []
for i in range(len(test)):
    topic=test.loc[i,'topic']
    gen_essay=model.generate(
        prompt.format(topic_name=topic),
        device=device,
        output_len=32
    )
    predictions.append(gen_essay)
    if i<=2:
        print('Topic:', topic)
        print('Generated Essay :', gen_essay)
        print('\n\n*********************\n\n')
        

Topic: Compare and contrast the importance of self-reliance and adaptability in healthcare.
Generated Essay : Self-reliance and adaptability are pillars of effective healthcare. Self-reliance empowers individuals to manage health concerns independently, fostering personal responsibility. This empowers patients to understand their


*********************


Topic: Evaluate the effectiveness of management consulting in addressing conflicts within marketing.
Generated Essay : ## Striking Balance: Management Consulting and Marketing Conflicts

Management consulting can be a powerful tool in addressing conflicts within marketing, acting as a neutral party to facilitate communication and


*********************


Topic: Discuss the role of self-reliance in achieving success in software engineering.
Generated Essay : Self-reliance is paramount for software engineers to navigate the complexities of the field. It fosters adaptability, allowing engineers to overcome unforeseen challenges independ

In [17]:
predictions[0]

'Self-reliance and adaptability are pillars of effective healthcare. Self-reliance empowers individuals to manage health concerns independently, fostering personal responsibility. This empowers patients to understand their'

In [18]:
sample_sub['essay']=predictions
sample_sub

Unnamed: 0,id,essay
0,1097671,Self-reliance and adaptability are pillars of ...
1,1726150,## Striking Balance: Management Consulting and...
2,3211968,Self-reliance is paramount for software engine...


In [19]:
sample_sub.to_csv('submission.csv',index=False)