### Installation

In [1]:
%%capture
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

!pip install pip3-autoremove
!pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu124
!pip install unsloth
!pip install --upgrade transformers==4.52.4

### Unsloth

In [2]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048  # Choose any! We auto support RoPE Scaling internally!
dtype = (
    None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
)
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be False.

fourbit_models = [
    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-27b-it-unsloth-bnb-4bit",

    # Other popular models!
    "unsloth/Llama-3.1-8B",
    "unsloth/Llama-3.2-3B",
    "unsloth/Llama-3.3-70B",
    "unsloth/mistral-7b-instruct-v0.3",
    "unsloth/Phi-4",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-07-06 03:36:09.565468: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1751772969.967579      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1751772970.082641      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.6.12: Fast Gemma3 patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.


model.safetensors:   0%|          | 0.00/4.56G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/210 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/70.0 [00:00<?, ?B/s]

chat_template.json: 0.00B [00:00, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

preprocessor_config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/670 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth: Making `base_model.model.model.vision_tower.vision_model` require gradients


<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Alpaca.ipynb)

For text completions like novel writing, try this [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb).

In [4]:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma-3",
)

In [5]:
from datasets import load_dataset
dataset = load_dataset("neo4j/text2cypher-2024v1", split = "train")

README.md: 0.00B [00:00, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/7.33M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/835k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/39554 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/4833 [00:00<?, ? examples/s]

In [6]:
# this convert the dataset into dictonary
# if it can identify the role and converarion 
# it convert to that format
from unsloth.chat_templates import standardize_data_formats
dataset = standardize_data_formats(dataset)

In [7]:
dataset[100]

{'question': 'Find nodes that are at the end of a path starting at Author where affiliation is unspecified and traversing through Keyword with name tree (optimality criteria: minimum mean-squared error)\n\nalternative keyword suggestions:\n- multiscale superpopulation models\n- independent innovations trees\n- water-!',
 'schema': 'Graph schema: Relevant node labels and their properties (with datatypes) are:\nAuthor {affiliation: STRING}\nKeyword {name: STRING}',
 'cypher': "MATCH (a:Author{affiliation:'unspecified'})-[*]->(d:Keyword{name:'tree (optimality criteria: minimum mean-squared error)  alternative keyword suggestions: - multiscale superpopulation models - independent innovations trees - water-'})-[*]->(n) RETURN n",
 'data_source': 'neo4jLabs_functional_cypher',
 'instance_id': 'instance_id_6791',
 'database_reference_alias': None}

In [8]:
# transfer to the desired format for apply template 
# conversation format
def to_conversation_format(example):
    return {
        "conversations": [
            {
                "role": "user",
                "content": (
                    "Given the context, generate a Cypher query for the following question\n"
                    f"context:{example['schema']}\n"
                    f"question:{example['question']}"
                )
            },
            {
                "role": "assistant",
                "content": example['cypher']
            }
        ]
    }

In [9]:
dataset = dataset.map(to_conversation_format)

Map:   0%|          | 0/39554 [00:00<?, ? examples/s]

In [10]:
# now the conversations has the correct format 
dataset[100]

{'question': 'Find nodes that are at the end of a path starting at Author where affiliation is unspecified and traversing through Keyword with name tree (optimality criteria: minimum mean-squared error)\n\nalternative keyword suggestions:\n- multiscale superpopulation models\n- independent innovations trees\n- water-!',
 'schema': 'Graph schema: Relevant node labels and their properties (with datatypes) are:\nAuthor {affiliation: STRING}\nKeyword {name: STRING}',
 'cypher': "MATCH (a:Author{affiliation:'unspecified'})-[*]->(d:Keyword{name:'tree (optimality criteria: minimum mean-squared error)  alternative keyword suggestions: - multiscale superpopulation models - independent innovations trees - water-'})-[*]->(n) RETURN n",
 'data_source': 'neo4jLabs_functional_cypher',
 'instance_id': 'instance_id_6791',
 'database_reference_alias': None,
 'conversations': [{'content': 'Given the context, generate a Cypher query for the following question\ncontext:Graph schema: Relevant node labels a

In [11]:
# apply the correct prompt template to the gemma it model
def formatting_prompts_func(examples):
   convos = examples["conversations"]
   texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('<bos>') for convo in convos]
   return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True)

Map:   0%|          | 0/39554 [00:00<?, ? examples/s]

In [12]:
dataset[100]["text"]

"<start_of_turn>user\nGiven the context, generate a Cypher query for the following question\ncontext:Graph schema: Relevant node labels and their properties (with datatypes) are:\nAuthor {affiliation: STRING}\nKeyword {name: STRING}\nquestion:Find nodes that are at the end of a path starting at Author where affiliation is unspecified and traversing through Keyword with name tree (optimality criteria: minimum mean-squared error)\n\nalternative keyword suggestions:\n- multiscale superpopulation models\n- independent innovations trees\n- water-!<end_of_turn>\n<start_of_turn>model\nMATCH (a:Author{affiliation:'unspecified'})-[*]->(d:Keyword{name:'tree (optimality criteria: minimum mean-squared error)  alternative keyword suggestions: - multiscale superpopulation models - independent innovations trees - water-'})-[*]->(n) RETURN n<end_of_turn>\n"

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [13]:
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences.
    args=SFTConfig(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        num_train_epochs = 5, # Set this for 1 full training run.
        max_steps=60,
        learning_rate=2e-4,
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",  # Use this for WandB etc
    ),
)

Unsloth: Switching to float32 training since model cannot work with float16


Unsloth: Tokenizing ["text"] (num_proc=4):   0%|          | 0/39554 [00:00<?, ? examples/s]

### Apply Unsloth's train_on_completions

In [14]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<start_of_turn>user\n",
    response_part = "<start_of_turn>model\n",
)

Map (num_proc=4):   0%|          | 0/39554 [00:00<?, ? examples/s]

In [15]:
tokenizer.decode(trainer.train_dataset[100]["input_ids"])

"<bos><start_of_turn>user\nGiven the context, generate a Cypher query for the following question\ncontext:Graph schema: Relevant node labels and their properties (with datatypes) are:\nAuthor {affiliation: STRING}\nKeyword {name: STRING}\nquestion:Find nodes that are at the end of a path starting at Author where affiliation is unspecified and traversing through Keyword with name tree (optimality criteria: minimum mean-squared error)\n\nalternative keyword suggestions:\n- multiscale superpopulation models\n- independent innovations trees\n- water-!<end_of_turn>\n<start_of_turn>model\nMATCH (a:Author{affiliation:'unspecified'})-[*]->(d:Keyword{name:'tree (optimality criteria: minimum mean-squared error)  alternative keyword suggestions: - multiscale superpopulation models - independent innovations trees - water-'})-[*]->(n) RETURN n<end_of_turn>\n"

In [16]:
# get only the answer
tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100]["labels"]]).replace(tokenizer.pad_token, " ")

"                                                                                                                   MATCH (a:Author{affiliation:'unspecified'})-[*]->(d:Keyword{name:'tree (optimality criteria: minimum mean-squared error)  alternative keyword suggestions: - multiscale superpopulation models - independent innovations trees - water-'})-[*]->(n) RETURN n<end_of_turn>\n"

In [18]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 39,554 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 32,788,480 of 4,000,000,000 (0.82% trained)


Step,Training Loss
1,2.6237
2,1.8966
3,1.8595
4,2.1358
5,2.0109
6,2.7135
7,2.4205
8,2.288
9,2.1321
10,2.0259


In [19]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
12.492 GB of memory reserved.


### Inference with the trained model

In [7]:
# check for the inference
from datasets import load_dataset
dataset = load_dataset("neo4j/text2cypher-2024v1", split = "test")

# this convert the dataset into dictonary
# if it can identify the role and converarion 
# it convert to that format
from unsloth.chat_templates import standardize_data_formats
dataset = standardize_data_formats(dataset)

# transfer to the desired format for apply template 
# conversation format
def to_conversation_format(example):
    return {
        "conversations": [
            {
                "role": "user",
                "content": (
                    "Given the context, generate a Cypher query for the following question\n"
                    f"context:{example['schema']}\n"
                    f"question:{example['question']}"
                )
            },
        ]
    }


README.md: 0.00B [00:00, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/7.33M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/835k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/39554 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/4833 [00:00<?, ? examples/s]

In [8]:
# map only the user question 
dataset = dataset.map(to_conversation_format)

Map:   0%|          | 0/4833 [00:00<?, ? examples/s]

In [9]:
# add tokenization and standard format
# apply the correct prompt template to the gemma it model
def formatting_prompts_func(examples):
   convos = examples["conversations"]
   texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('<bos>') for convo in convos]
   return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True)

Map:   0%|          | 0/4833 [00:00<?, ? examples/s]

In [10]:
dataset[10]['text']

"<start_of_turn>user\nGiven the context, generate a Cypher query for the following question\ncontext:Graph schema: Relevant node labels and their properties (with datatypes) are:\nArticle {abstract: STRING}\nDOI {}\n\nRelevant relationships are:\n{'start': Article, 'type': HAS_DOI, 'end': DOI }\nquestion:Search for the abstract values from 20 Article that are linked to DOI via HAS_DOI and return abstract along with the respective DOI counts!<end_of_turn>\n"

In [11]:
type(dataset[10]['text'])

str

In [17]:
# Add the <bos> token at the beginning of the sequence
text_dataset = tokenizer.apply_chat_template(
    dataset['conversations'],
    add_generation_prompt = True, # Must add for generation
)

In [19]:
text_dataset[:3]

['<bos><start_of_turn>user\nGiven the context, generate a Cypher query for the following question\ncontext:Node properties:\n- **Product**\n  - `productName`: STRING Example: "Chai"\n  - `quantityPerUnit`: STRING Example: "10 boxes x 20 bags"\n  - `unitsOnOrder`: INTEGER Min: 0, Max: 100\n  - `supplierID`: STRING Example: "1"\n  - `productID`: STRING Example: "1"\n  - `discontinued`: BOOLEAN \n  - `categoryID`: STRING Available options: [\'1\', \'2\', \'7\', \'6\', \'8\', \'4\', \'3\', \'5\']\n  - `reorderLevel`: INTEGER Min: 0, Max: 30\n  - `unitsInStock`: INTEGER Min: 0, Max: 125\n  - `unitPrice`: FLOAT Min: 2.5, Max: 263.5\n- **Category**\n  - `picture`: STRING Available options: [\'0x151C2F00020000000D000E0014002100FFFFFFFF4269746D\']\n  - `categoryID`: STRING Available options: [\'1\', \'2\', \'3\', \'4\', \'5\', \'6\', \'7\', \'8\']\n  - `description`: STRING Available options: [\'Soft drinks, coffees, teas, beers, and ales\', \'Sweet and savory sauces, relishes, spreads, and se\

In [16]:
text = dataset[1]['text']
print(text)

<start_of_turn>user
Given the context, generate a Cypher query for the following question
context:{"ASSIGNED_TO": {"count": 27, "properties": {}, "type": "relationship"}, "Machine": {"count": 9, "labels": [], "properties": {"Making_Year": {"unique": false, "indexed": false, "type": "INTEGER", "existence": false}, "Machine_ID": {"unique": false, "indexed": false, "type": "INTEGER", "existence": false}, "value_points": {"unique": false, "indexed": false, "type": "FLOAT", "existence": false}, "Class": {"unique": false, "indexed": false, "type": "STRING", "existence": false}, "quality_rank": {"unique": false, "indexed": false, "type": "INTEGER", "existence": false}, "Team": {"unique": false, "indexed": false, "type": "STRING", "existence": false}, "Machine_series": {"unique": false, "indexed": false, "type": "STRING", "existence": false}}, "type": "node", "relationships": {"ASSIGNED_TO": {"count": 27, "direction": "in", "labels": ["RepairAssignment"], "properties": {}}}}, "RepairAssignment

In [20]:
type(text_dataset[:3])

list

In [24]:
from unsloth.chat_templates import get_chat_template

# Prepare tokenizer with gemma-3 chat template
tokenizer = get_chat_template(tokenizer, chat_template="gemma-3")

# Apply template
# text = tokenizer.apply_chat_template(
#     messages,
#     add_generation_prompt=True,
# )

# Generate response
outputs = model.generate(
    **tokenizer(text_dataset[3], return_tensors="pt").to("cuda"),
    max_new_tokens=256,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
)

# print(outputs)
outputs = tokenizer.batch_decode(outputs)
print(outputs)

['<bos><start_of_turn>user\nGiven the context, generate a Cypher query for the following question\ncontext:{"GasStation": {"count": 11, "labels": [], "properties": {"Vice_Manager_Name": {"unique": false, "indexed": false, "type": "STRING", "existence": false}, "Station_ID": {"unique": false, "indexed": false, "type": "INTEGER", "existence": false}, "Manager_Name": {"unique": false, "indexed": false, "type": "STRING", "existence": false}, "Location": {"unique": false, "indexed": false, "type": "STRING", "existence": false}, "Open_Year": {"unique": false, "indexed": false, "type": "INTEGER", "existence": false}, "Representative_Name": {"unique": false, "indexed": false, "type": "STRING", "existence": false}}, "type": "node", "relationships": {"OWNS": {"count": 6, "direction": "in", "labels": ["Company"], "properties": {"Rank_of_the_Year": {"indexed": false, "type": "INTEGER", "existence": false, "array": false}}}}}, "OWNS": {"count": 6, "properties": {"Rank_of_the_Year": {"indexed": fals

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [30]:
# inference 
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(text_dataset[6], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
output = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

<bos><start_of_turn>user
Given the context, generate a Cypher query for the following question
context:Relevant node labels and their properties (with datatypes) are:
Article {title: STRING}
Journal {name: STRING}

Relevant relationships are:
{'start': Article, 'type': PUBLISHED_IN, 'end': Journal }


Relevant relationship properties (with datatypes) are:
PUBLISHED_IN {pages: STRING}
question:Calculate the average name for Journal that is linked to Article via PUBLISHED_IN where pages is 1-31 and has title date before December 31, 2020!<end_of_turn>
<start_of_turn>model
```cypher
MATCH (a:Article)-[:PUBLISHED_IN]->(j:Journal)
WHERE a.title date < "2021-01-01" AND j.name IS NOT NULL
RETURN avg(j.name) AS average_journal_name
```<end_of_turn>


In [None]:
text_da

### Save the text inference from the dataset and save it to csv

In [64]:
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
import pandas as pd

FastLanguageModel.for_inference(model)

generated_outputs = []
inference_outputs = []

for text in text_dataset[:128]:
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to("cuda")

    output_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=1.0,
        top_p=0.95,
        top_k=64
    )

    # Decode generated output
    generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=False)

    # Extract between tags
    start_tag = "<start_of_turn>model"
    end_tag = "<end_of_turn>"

    start_index = generated_text.find(start_tag)
    end_index = generated_text.find(end_tag, start_index)

    if start_index != -1 and end_index != -1:
        extracted = generated_text[start_index + len(start_tag):end_index].strip()
    else:
        extracted = generated_text.strip()  # fallback

    generated_outputs.append(generated_text)
    inference_outputs.append(extracted)

# # Optional: Save to DataFrame
# df = pd.DataFrame({
#     "input": text_dataset[:3],
#     "generated_output": generated_outputs,
#     "inference": inference_outputs
# })

# # Save to file
# df.to_csv("inference_results.csv", index=False)

In [48]:
print(inference_outputs[0])
print((len(inference_outputs)))

```cypher
MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)
WITH s, COLLECT(p.unitPrice) AS prices
ORDER BY AVG(prices) DESC
LIMIT 5
RETURN s.companyName, AVG(prices) AS averageUnitPrice
```
128


In [41]:
print(dataset.column_names)

['question', 'schema', 'cypher', 'data_source', 'instance_id', 'database_reference_alias', 'conversations', 'text']


In [49]:
# Optional: Save to DataFrame
df = pd.DataFrame({
    "reference": dataset['cypher'][:128],
    "inference": inference_outputs
})

# Save to file
df.to_csv("inference_results.csv", index=False)

In [50]:
# Load the CSV file
df = pd.read_csv("/kaggle/working/inference_results.csv")
print(df.head())

                                           reference  \
0  MATCH (s:Supplier)-[:SUPPLIES]->(p:Product) WI...   
1  MATCH (t:Technician) WHERE NOT EXISTS ((:Repai...   
2  MATCH (n:Topic) WHERE NOT n.label STARTS WITH ...   
3                  MATCH (c:Company) RETURN count(c)   
4  MATCH (a:Article)-[:MENTIONS]->(o:Organization...   

                                           inference  
0  ```cypher\nMATCH (s:Supplier)-[:SUPPLIES]->(p:...  
1  ```cypher\nMATCH (t:Technician)\nWHERE NOT t.A...  
2  ```cypher\nMATCH (t:Topic)\nWHERE NOT t.label ...  
3  ```cypher\nMATCH (c:Company)\nRETURN count(c)\...  
4  ```cypher\nMATCH (o:Organization {name: "New E...  


In [51]:
! pip install evaluate

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting evaluate
  Downloading evaluate-0.4.4-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.4-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.4


In [53]:
! pip install rouge_score

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=5dd39ad8dd6cb92021067425ae3fd5a0db07e91969e97f94d8ef2be0b8b10be5
  Stored in directory: /root/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e163e1bd1f89919284ab0d0c1475475148
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [54]:
import evaluate

# initialize the belu and rouge scores
bleu = evaluate.load("bleu")
rouge = evaluate.load("rouge")

# get the coolumns as list
references = df["reference"].tolist()
predictions = df["inference"].tolist()

In [55]:
bleu_result = bleu.compute(predictions=predictions, references=[[ref] for ref in references])
print("BLEU Score:", bleu_result["bleu"])

BLEU Score: 0.30396810910793887


In [56]:
rouge_result = rouge.compute(predictions=predictions, references=references)
print("ROUGE Scores:")
for k, v in rouge_result.items():
    print(f"{k}: {v:.4f}")

ROUGE Scores:
rouge1: 0.6505
rouge2: 0.4419
rougeL: 0.5867
rougeLsum: 0.5862


### Clean the inference ( \n , """cypher...) 

In [59]:
# Clean the 'inference' column
df["inference"] = df["inference"].str.replace("```cypher", "", regex=False)
df["inference"] = df["inference"].str.replace("```", "", regex=False)
df["inference"] = df["inference"].str.replace("\\n", " ", regex=False)
df["inference"] = df["inference"].str.replace(r"\s+", " ", regex=True).str.strip()

# Optional: strip leading/trailing whitespace
df["inference"] = df["inference"].str.strip()

# Preview the cleaned output
print(df.head())

                                           reference  \
0  MATCH (s:Supplier)-[:SUPPLIES]->(p:Product) WI...   
1  MATCH (t:Technician) WHERE NOT EXISTS ((:Repai...   
2  MATCH (n:Topic) WHERE NOT n.label STARTS WITH ...   
3                  MATCH (c:Company) RETURN count(c)   
4  MATCH (a:Article)-[:MENTIONS]->(o:Organization...   

                                           inference  
0  MATCH (s:Supplier)-[:SUPPLIES]->(p:Product) WI...  
1  MATCH (t:Technician) WHERE NOT t.ASSIGNED_TO R...  
2  MATCH (t:Topic) WHERE NOT t.label STARTS WITH ...  
3                  MATCH (c:Company) RETURN count(c)  
4  MATCH (o:Organization {name: "New Energy Group...  


In [60]:
# get the coolumns as list
references = df["reference"].tolist()
predictions = df["inference"].tolist()

In [61]:
bleu_result = bleu.compute(predictions=predictions, references=[[ref] for ref in references])
print("BLEU Score:", bleu_result["bleu"])

BLEU Score: 0.33668088266058416


In [62]:
rouge_result = rouge.compute(predictions=predictions, references=references)
print("ROUGE Scores:")
for k, v in rouge_result.items():
    print(f"{k}: {v:.4f}")

ROUGE Scores:
rouge1: 0.6683
rouge2: 0.4549
rougeL: 0.6029
rougeLsum: 0.6019


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [44]:
model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

['lora_model/processor_config.json']

In [None]:
from peft import PeftModel

model.push_to_hub("Hansiduyapa/cypher_query_generator", token=)
tokenizer.push_to_hub("Hansiduyapa/cypher_query_generator", token=)

README.md:   0%|          | 0.00/602 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/131M [00:00<?, ?B/s]

Saved model to https://huggingface.co/Hansiduyapa/cypher_query_generator


  0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

### Load the LoRA adapters to the Gemma model

Load Gemma adapters from hugging face and use it for inference

In [6]:
from peft import PeftModel
lora_model = PeftModel.from_pretrained(model, "Hansiduyapa/cypher_query_generator")

In [62]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(lora_model) # Enable native 2x faster inference
inputs = tokenizer(dataset[13]['text'], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

<bos><start_of_turn>user
Given the context, generate a Cypher query for the following question
context:Node properties:
User {betweenness: FLOAT, location: STRING, followers: INTEGER, following: INTEGER, profile_image_url: STRING, screen_name: STRING, name: STRING, url: STRING, statuses: INTEGER}
Me {profile_image_url: STRING, betweenness: FLOAT, following: INTEGER, url: STRING, location: STRING, followers: INTEGER, screen_name: STRING, name: STRING}
Tweet {created_at: DATE_TIME, id: INTEGER, id_str: STRING, text: STRING, favorites: INTEGER, import_method: STRING}
Hashtag {name: STRING}
Link {url: STRING}
Source {name: STRING}
Relationship properties:
SIMILAR_TO {score: FLOAT}
The relationships:
(:User)-[:FOLLOWS]->(:User)
(:User)-[:FOLLOWS]->(:Me)
(:User)-[:POSTS]->(:Tweet)
(:User)-[:INTERACTS_WITH]->(:User)
(:User)-[:SIMILAR_TO]->(:User)
(:User)-[:SIMILAR_TO]->(:Me)
(:Me)-[:FOLLOWS]->(:User)
(:Me)-[:POSTS]->(:Tweet)
(:Me)-[:INTERACTS_WITH]->(:User)
(:Me)-[:RT_MENTIONS]->(:User)
(:Me)