# Fine-tuning Llama for Datetime Extraction

I'm going to fine-tune a Llama model to convert natural language time expressions into structured datetime formats.


In [1]:
!uv pip install transformers unsloth trl


[2mUsing Python 3.11.13 environment at: /usr[0m
[2K[2mResolved [1m83 packages[0m [2min 967ms[0m[0m
[2K[37m⠙[0m [2mPreparing packages...[0m (0/20)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/20)
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/20)
[2mshtab     [0m [32m[2m------------------------------[0m[0m     0 B/13.88 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/20)
[2mshtab     [0m [32m------------------------------[2m[0m[0m 13.88 KiB/13.88 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/20)
[2mshtab     [0m [32m------------------------------[2m[0m[0m 13.88 KiB/13.88 KiB
[2mcut-cross-entropy[0m [32m---------------------[2m---------[0m[0m 14.91 KiB/22.14 KiB
[2K[3A[37m⠙[0m [2mPreparing packages...[0m (0/20)
[2mshtab     [0m [32m------------------------------[2m[0m[0m 13.88 KiB/13.88 KiB
[2mcut-cross-entropy[0m [32m---------------------[2m---------[0m[0m 14.91 KiB/22.14 KiB
[2mnvidia-cuda-nvrtc-cu12

In [2]:
import torch
import json
from unsloth import FastLanguageModel
from datasets import Dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth.chat_templates import get_chat_template

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
🦥 Unsloth Zoo will now patch everything to make training faster!


## Loading the Base Model

I'm loading the Llama 3.2 3B Instruct model with 4-bit quantization to save memory:

In [3]:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True
 )



==((====))==  Unsloth 2025.8.1: Fast Llama patching. Transformers: 4.54.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.35G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

## Setting up LoRA Adapter

I'm configuring the model with LoRA (Low-Rank Adaptation) for efficient fine-tuning and setting up the chat template:

In [4]:

#r is rank on low rank adaptation, r=16 is perfect balance between high and low ranks
model = FastLanguageModel.get_peft_model(
    model, r=16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
)

tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")


Unsloth 2025.8.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
Unsloth 2025.8.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


## Data Loading and Processing

I'm creating functions to load datetime conversion data from a JSONL file and format it for training:

In [8]:
def load_datetime_data(file_path):
    """Load datetime conversion data from text file"""
    conversations = []

    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            line = line.strip()
            if line:
                try:
                    data = json.loads(line)

                    # Create conversation format
                    conversation = [
                        {
                            "role": "user",
                            "content": f"Convert the following natural language time expression to datetime format: '{data['input']}'"
                        },
                        {
                            "role": "assistant",
                            "content": json.dumps(data['output'], indent=2)
                        }
                    ]
                    conversations.append(conversation)

                except json.JSONDecodeError:
                    print(f"Skipping invalid JSON line: {line}")
                    continue

    return conversations

def create_training_dataset(conversations):
    """Convert conversations to training format"""
    formatted_texts = []

    for conversation in conversations:
        # Apply chat template to each conversation
        formatted_text = tokenizer.apply_chat_template(
            conversation,
            tokenize=False,
            add_generation_prompt=False
        )
        formatted_texts.append(formatted_text)

    # Create dataset
    dataset = Dataset.from_dict({"text": formatted_texts})
    return dataset

# Load your datetime data (replace with your actual file path)
print("Loading datetime conversion data...")
conversations = load_datetime_data("/content/time_parsing_dataset.jsonl")
print(f"Loaded {len(conversations)} training examples")

# Create training dataset
dataset = create_training_dataset(conversations)
print("Dataset created successfully")
print(f"Sample training text:\n{dataset[0]['text'][:500]}...")


Loading datetime conversion data...
Loaded 200 training examples
Dataset created successfully
Sample training text:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Convert the following natural language time expression to datetime format: 'this morning at 7-10'<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{
  "start": "2025-08-05T07:00:00",
  "end": "2025-08-05T10:00:00"
}<|eot_id|>...


## Setting up the Trainer

I'm configuring the SFTTrainer with optimized parameters for better learning:

In [9]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=10,
        max_steps=100,  # Increased for better learning
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=10,
        output_dir="datetime_model_outputs",
        save_steps=50,
        dataloader_drop_last=False,
    ),
)

Unsloth: Tokenizing ["text"]:   0%|          | 0/200 [00:00<?, ? examples/s]

## Training the Model

Now I'm starting the training process, saving the fine-tuned model

In [10]:
print("Starting training...")
trainer.train()

# Save the fine-tuned model
print("Saving fine-tuned model...")
model.save_pretrained("datetime_finetuned_model")
tokenizer.save_pretrained("datetime_finetuned_model")

Starting training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 200 | Num Epochs = 4 | Total steps = 100
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 24,313,856 of 3,237,063,680 (0.75% trained)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mvvishus1717[0m ([33mvvishus17[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
10,2.6954
20,0.7411
30,0.4682
40,0.4039
50,0.3036
60,0.2149
70,0.1918
80,0.1738
90,0.1598
100,0.159


Saving fine-tuned model...


('datetime_finetuned_model/tokenizer_config.json',
 'datetime_finetuned_model/special_tokens_map.json',
 'datetime_finetuned_model/chat_template.jinja',
 'datetime_finetuned_model/tokenizer.json')

## Additional Testing

I'm running additional tests with the fine-tuned model to verify its performance:

In [11]:
# Load model for inference
print("Loading model for inference...")
inference_model, inference_tokenizer = FastLanguageModel.from_pretrained(
    model_name="./datetime_finetuned_model",
    max_seq_length=2048,
    load_in_4bit=True
)

# Test the fine-tuned model
test_prompts = [
    "tomorrow at 3 PM",
    "next Friday evening",
    "in 2 hours",
    "last Monday morning",
    "Christmas Day 2025"
]

print("\n" + "="*50)
print("Testing fine-tuned model:")
print("="*50)

for prompt in test_prompts:
    formatted_prompt = inference_tokenizer.apply_chat_template(
        [{
            "role": "user",
            "content": f"Convert the following natural language time expression to datetime format: '{prompt}'"
        }],
        tokenize=False,
        add_generation_prompt=True
    )

    model_inputs = inference_tokenizer(formatted_prompt, return_tensors="pt").to("cuda")

    with torch.no_grad():
        generated_ids = inference_model.generate(
            **model_inputs,
            max_new_tokens=256,
            temperature=0.1,  # Lower temperature for more consistent output
            do_sample=True,
            pad_token_id=inference_tokenizer.pad_token_id,
            eos_token_id=inference_tokenizer.eos_token_id,
        )

    # Extract only the generated part (remove input prompt)
    generated_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
    response = inference_tokenizer.decode(generated_ids, skip_special_tokens=True)

    print(f"\nInput: {prompt}")
    print(f"Output: {response.strip()}")
    print("-" * 30)

print("\nFine-tuning completed successfully!")

Loading model for inference...
==((====))==  Unsloth 2025.8.1: Fast Llama patching. Transformers: 4.54.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

Testing fine-tuned model:

Input: tomorrow at 3 PM
Output: {
  "start": "2025-08-06T15:00:00",
  "end": "2025-08-06T15:00:00"
}
------------------------------

Input: next Friday evening
Output: {
  "start": "2025-08-15T18:00:00",
  "end": "2025-08-15T21:00:00"
}
------------------------------

Input: in 2 hours
Output: {
  "start": "2025-08-05T14:00:00",
  "end": "2025-08-05T14:00:00"
}
------------------------------

Input: last Monday morning
Output: {
  "start": "2025-08-11T06:00:00",
  "end": "2025-08-