To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ‚≠ê <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ‚≠ê
</div>

To install Unsloth your local device, follow [our guide](https://docs.unsloth.ai/get-started/install-and-update). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


Long-Context GRPO for reinforcement learning ‚Äî train stably at massive sequence lengths. Fine-tune models with up to 7x more context length efficiently. [Read Blog](https://unsloth.ai/docs/new/grpo-long-context)

3√ó faster training with optimized sequence packing ‚Äî higher throughput with no quality loss.[Read Blog](https://unsloth.ai/docs/new/3x-faster-training-packing)

500k context-length fine-tuning ‚Äî push long-context models further with memory-efficient training. [Read Blog](https://unsloth.ai/docs/new/500k-context-length-fine-tuning)

Introducing FP8 precision training for faster RL inference. [Read Blog](https://docs.unsloth.ai/new/fp8-reinforcement-learning).

Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [1]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9]{1,}\.[0-9]{1,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.33.post1" if v=="2.9" else "0.0.32.post2" if v=="2.8" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

### Unsloth

In [2]:
from unsloth import FastVisionModel # FastLanguageModel for LLMs
import torch

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit", # Llama 3.2 vision support
    "unsloth/Llama-3.2-11B-Vision-bnb-4bit",
    "unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit", # Can fit in a 80GB card!
    "unsloth/Llama-3.2-90B-Vision-bnb-4bit",

    "unsloth/Pixtral-12B-2409-bnb-4bit",              # Pixtral fits in 16GB!
    "unsloth/Pixtral-12B-Base-2409-bnb-4bit",         # Pixtral base model

    "unsloth/Qwen2-VL-2B-Instruct-bnb-4bit",          # Qwen2 VL support
    "unsloth/Qwen2-VL-7B-Instruct-bnb-4bit",
    "unsloth/Qwen2-VL-72B-Instruct-bnb-4bit",

    "unsloth/llava-v1.6-mistral-7b-hf-bnb-4bit",      # Any Llava variant works!
    "unsloth/llava-1.5-7b-hf-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastVisionModel.from_pretrained(
    #"unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit",
    "unsloth/Qwen2.5-VL-7B-Instruct", # The official unsloth specific model
    load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA.
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2026.1.4: Fast Qwen2_5_Vl patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/6.90G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/791 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

video_preprocessor_config.json:   0%|          | 0.00/935 [00:00<?, ?B/s]

chat_template.json: 0.00B [00:00, ?B/s]

We now add LoRA adapters for parameter efficient finetuning - this allows us to only efficiently train 1% of all parameters.

**[NEW]** We also support finetuning ONLY the vision part of the model, or ONLY the language part. Or you can select both! You can also select to finetune the attention or the MLP layers!

In [3]:
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = True, # False if not finetuning vision layers
    finetune_language_layers   = True, # False if not finetuning language layers
    finetune_attention_modules = True, # False if not finetuning attention layers
    finetune_mlp_modules       = True, # False if not finetuning MLP layers

    r = 16,           # The larger, the higher the accuracy, but might overfit
    lora_alpha = 16,  # Recommended alpha == r at least
    lora_dropout = 0,
    bias = "none",
    random_state = 3407, # Changer √ßa m√™me si √ßa change rien
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
    # target_modules = "all-linear", # Optional now! Can specify a list if needed
)

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
import json
import os

base_path = '/content/drive/MyDrive/EVAHAN/train_data/'
dataset_files = ['Dataset_A.json', 'Dataset_B.json', 'Dataset_C.json']
                # ['Dataset_A.json', 'Dataset_C.json']
                # ['Dataset_B.json']
combined_dataset = []

for filename in dataset_files:
    filepath = os.path.join(base_path, filename)
    with open(filepath, 'r') as f:
        data = json.load(f)
        combined_dataset.extend(data)

print(f"Successfully loaded and combined {len(dataset_files)} datasets.")
print(f"Total items in combined dataset: {len(combined_dataset)}")

Successfully loaded and combined 3 datasets.
Total items in combined dataset: 15000


In [22]:
from sklearn.model_selection import train_test_split

# --- NEW: Select a 5% subset of the combined_dataset for faster processing ---
subset_size = 0.05 # 5% of the total data
subset_data, _ = train_test_split(combined_dataset, test_size=1 - subset_size, random_state=42)

print(f"Using a {subset_size*100}% subset of the data: {len(subset_data)} samples")

# Split into training (75%) and temporary (25%) datasets from the subset
train_data, temp_data = train_test_split(subset_data, test_size=0.25, random_state=42)

# Split temporary (25%) into testing (15% of total) and validation (10% of total)
# (0.15 / 0.25 = 0.6 for test_size, since temp_data is 25% of the subset data)
test_data, val_data = train_test_split(temp_data, test_size=0.4, random_state=42) # 0.4 of 25% is 10% of total

print(f"Training set size: {len(train_data)}")
print(f"Testing set size: {len(test_data)}")
print(f"Validation set size: {len(val_data)}")

Using a 5.0% subset of the data: 750 samples
Training set size: 562
Testing set size: 112
Validation set size: 76


In [7]:
import PIL.Image
print("PIL.Image imported.")

PIL.Image imported.


In [23]:
from tqdm import tqdm # h√©h√©
import json # Import json for creating the assistant's JSON response
import os
import cv2 # NEW: Import OpenCV for image processing
import numpy as np # NEW: Import numpy for image processing

instruction = """Analyze the provided image of ancient Chinese bamboo slips or other materials.

**Task Guidelines:**
1. **Transcription:** Transcribe the characters into standard Traditional Chinese (Unicode). Do not modernize the grammar.
2. **Legibility:** - If a character is an archaic variant, use the standard Traditional Chinese equivalent.
    - If a character is completely illegible due to damage, use '‚ñ°'.
3. **Output:** Return ONLY the JSON object below.

```json
{
  "transcription": "TEXT_HERE",
  "notes": "Brief notes on damage/layout"
}
"""

# 1. **Reading Order:** If there is text, the text is generally written vertically (top to bottom) and arranged in columns from right to left. Respect this strict reading order.

base_path = '/content/drive/MyDrive/EVAHAN/train_data/' # Ensure base_path is accessible

# Original convert_to_conversation function (commented out)
# def convert_to_conversation(sample):
#     conversation = [
#         { "role": "user",
#           "content" : [
#             {"type" : "text",  "text"  : instruction},
#             {"type" : "image", "image" : sample["image"]} ]
#         },
#         { "role" : "assistant",
#           "content" : [
#             {"type" : "text",  "text"  : sample["text"]} ]
#         },
#     ]
#     return { "messages" : conversation }
# pass

# Adapted convert_to_conversation function
def convert_to_conversation_new(sample):
    if "text" not in sample:
        # Skip samples that do not have a 'text' key
        return None

    image_path = os.path.join(base_path, sample["image_path"])
    # PEUT-√äTRE MODIFIER √áA ?
    try:
    #    # Original:
      image = PIL.Image.open(image_path).convert("RGB")
    #    # NEW: Load image using OpenCV
    #    image_cv2 = cv2.imread(image_path)
    #    if image_cv2 is None:
    #        print(f"Error: Could not load image {image_path}")
    #        return None

    #    # Convert to grayscale
    #    gray_image = cv2.cvtColor(image_cv2, cv2.COLOR_BGR2GRAY)

        # Apply adaptive thresholding (binarization) to remove shadows
        # ADAPTIVE_THRESH_GAUSSIAN_C: uses a gaussian weighted sum of neighborhood values
        # THRESH_BINARY: the type of thresholding applied
        # 255: max value to use with THRESH_BINARY
        # 11: block size (size of neighborhood to calculate threshold for)
        # 2: constant subtracted from the mean or weighted mean
    #    binarized_image = cv2.adaptiveThreshold(gray_image, 255,
    #                                             cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
    #                                             cv2.THRESH_BINARY, 11, 2)

        # Convert OpenCV image (numpy array) back to PIL Image
     #   image = PIL.Image.fromarray(binarized_image).convert("RGB") # Ensure RGB for model input

    except Exception as e:
        print(f"Error processing image {image_path}: {e}")
        return None # Skip samples with problematic images

    # Construct the assistant's response as a JSON string based on the instruction format
    assistant_response_dict = {
        "transcription": sample["text"],
        "notes": "" # Assuming no 'notes' provided in the raw dataset, default to empty string
    }
    assistant_response_json_string = json.dumps(assistant_response_dict, ensure_ascii=False) # ensure_ascii=False to preserve Chinese characters

    conversation = [
        { "role": "user",
          "content" : [
            {"type" : "text",  "text"  : instruction},
            {"type" : "image", "image" : image} ]
        },
        { "role" : "assistant",
          "content" : [
            {"type" : "text",  "text"  : assistant_response_json_string} ]
        },
    ]
    return { "messages" : conversation }

# Original application of convert_to_conversation (commented out)
# converted_dataset = [convert_to_conversation(sample) for sample in dataset]

# Apply the new function to the training and validation datasets
# Filter out None values in case of image loading errors or missing 'text' key
converted_train_dataset = [convert_to_conversation_new(sample) for sample in tqdm(train_data) if convert_to_conversation_new(sample) is not None]
converted_val_dataset = [convert_to_conversation_new(sample) for sample in tqdm(val_data) if convert_to_conversation_new(sample) is not None]

print(f"Converted training dataset size: {len(converted_train_dataset)}")
print(f"Converted validation dataset size: {len(converted_val_dataset)}")

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 562/562 [02:37<00:00,  3.56it/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 76/76 [00:16<00:00,  4.66it/s]

Converted training dataset size: 376
Converted validation dataset size: 41





In [24]:
train_data[0]

{'image_path': 'Dataset_B/b_2575.jpg',
 'regions': [{'label': 'seal',
   'text': '',
   'points': [[211, 12], [408, 12], [408, 207], [211, 207]]},
  {'label': 'book_edge',
   'text': '',
   'points': [[2, 15], [43, 15], [43, 785], [2, 785]]}]}

In [25]:
converted_train_dataset[0]

{'messages': [{'role': 'user',
   'content': [{'type': 'text',
     'text': 'Analyze the provided image of ancient Chinese bamboo slips or other materials.\n\n**Task Guidelines:**\n1. **Transcription:** Transcribe the characters into standard Traditional Chinese (Unicode). Do not modernize the grammar.\n2. **Legibility:** - If a character is an archaic variant, use the standard Traditional Chinese equivalent.\n    - If a character is completely illegible due to damage, use \'‚ñ°\'.\n3. **Output:** Return ONLY the JSON object below.\n\n```json\n{\n  "transcription": "TEXT_HERE",\n  "notes": "Brief notes on damage/layout"\n}\n'},
    {'type': 'image',
     'image': <PIL.Image.Image image mode=RGB size=1414x75>}]},
  {'role': 'assistant',
   'content': [{'type': 'text',
     'text': '{"transcription": "Á™àÁ™àËíºËãî‰∏ÄË∑ØÈááÈááÈªÑËä±ÂÖ©Èñã‰∏ç‰Ωø‰πóËªíÈ∂¥ÈÅéÂè™ÈÄöÈÄÅ", "notes": ""}'}]}]}

In [27]:
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig

FastVisionModel.for_training(model) # Enable for training!

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    data_collator = UnslothVisionDataCollator(model, tokenizer), # Must use!
    train_dataset = converted_train_dataset, # Updated to use the new training dataset
    eval_dataset = converted_val_dataset,    # Added for evaluation
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # Set num_train_epochs to 1 and max_steps to -1 to train for one full epoch.
        # Adjust these values based on desired training duration and dataset size.
        max_steps = -1,                    # Set to -1 to train for num_train_epochs
        num_train_epochs = 1,              # Train for 1 full epoch
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.001,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",     # For Weights and Biases

        # You MUST put the below items for vision finetuning:
        remove_unused_columns = False,
        dataset_text_field = "",
        dataset_kwargs = {"skip_prepare_dataset": True},
        max_length = 2048,
    ),
)

Unsloth: Model does not have a default image size - using 512


In [31]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
7.484 GB of memory reserved.


In [28]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 376 | Num Epochs = 1 | Total steps = 47
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 51,521,536 of 8,343,688,192 (0.62% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,4.1459
2,4.1641
3,4.0701
4,3.6878
5,3.2252
6,2.8665
7,2.4637
8,2.1551
9,1.9196
10,1.6814


In [32]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

526.1084 seconds used for training.
8.77 minutes used for training.
Peak reserved memory = 7.484 GB.
Peak reserved memory for training = 0.0 GB.
Peak reserved memory % of max memory = 50.77 %.
Peak reserved memory for training % of max memory = 0.0 %.


In [29]:
FastVisionModel.for_inference(model) # Enable for inference!

# Comment out original image and instruction
# image = dataset[2]["image"]
# instruction = "Write the LaTeX representation for this image."

# Select an example from the test_data split
import os
import PIL.Image

test_example = test_data[0] # Get the first example from the test_data split

# Load the image from the test example using its path and the base_path
image_path = os.path.join(base_path, test_example["image_path"])
image = PIL.Image.open(image_path).convert("RGB")

# Use the globally defined instruction for the model
instruction = instruction

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

{"transcription": "ÂÖºÂê´Ëê¨ÊÆäÈùúÁ∑£‰∏çÂêåÁöÜÂÖ∂Âæ∑", "notes": ""}<|im_end|>


In [33]:
FastVisionModel.for_inference(model) # Ensure model is in inference mode

import json
from tqdm import tqdm
import os
import PIL.Image

print("Comparing model inference with ground truth for the first 20 test examples:")

# Iterate through the first 20 examples of the test_data
for i, test_example in tqdm(enumerate(test_data[:20]), total=20, desc="Performing inference"): # Changed to `test_data[:20]`
    # a. Load the image using its image_path
    image_path = os.path.join(base_path, test_example["image_path"])
    try:
        image = PIL.Image.open(image_path).convert("RGB")
    except Exception as e:
        print(f"Error loading image {image_path}: {e}")
        continue

    # b. Construct the messages list for the tokenizer
    messages = [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": instruction}
        ]}
    ]

    # c. Apply the chat template to messages
    input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)

    # d. Prepare inputs for the model
    inputs = tokenizer(
        image,
        input_text,
        add_special_tokens = False,
        return_tensors = "pt",
    ).to("cuda")

    # e. Generate the model's output
    outputs = model.generate(**inputs, max_new_tokens = 128,
                       use_cache = True, temperature = 1.5, min_p = 0.1)

    # f. Decode the generated tokens to text
    # Skip the prompt and only decode the generated part
    generated_text_tokens = outputs[0][len(inputs["input_ids"][0]):]
    model_output_raw = tokenizer.decode(generated_text_tokens, skip_special_tokens=True)

    # g. Parse the model's text output as a JSON string
    # The model output may include <|im_end|> token, which needs to be removed before JSON parsing
    model_output_cleaned = model_output_raw.replace("<|im_end|>", "").strip()
    predicted_transcription = "N/A (parsing error)"
    try:
        parsed_output = json.loads(model_output_cleaned)
        predicted_transcription = parsed_output.get("transcription", "N/A (transcription key missing)")
    except json.JSONDecodeError as e:
        print(f"JSON decoding error for output: {model_output_cleaned} - {e}")
        predicted_transcription = f"JSON Error: {model_output_cleaned}"

    # h. Extract the ground truth transcription
    ground_truth_transcription = test_example.get("text", "N/A (ground truth missing)")

    # i. Print the comparison
    print(f"\n--- Example {i+1} ---")
    print(f"Model Output:     {predicted_transcription}")
    print(f"Ground Truth:     {ground_truth_transcription}")

Comparing model inference with ground truth for the first 20 test examples:


Performing inference:   5%|‚ñå         | 1/20 [00:05<01:37,  5.13s/it]


--- Example 1 ---
Model Output:     È°õÂê´Ëê¨ÊÆäÈùúË∫Å‰∏çÂêåÁöÜÂÖ∂Âæ≥
Ground Truth:     Ë∂£ËàçËê¨ÊÆäÈùúË∫Å‰∏çÂêåÁï∂ÂÖ∂Ê¨£


Performing inference:  10%|‚ñà         | 2/20 [00:09<01:25,  4.74s/it]


--- Example 2 ---
Model Output:     ÁôæÈáåÂª∂Ê¥•Ê≠≥Ë¢´ÁÅΩÈáéÁî∞ÁßãÈªçÂçäËï©Ëé±Âπ∏Â≠ò
Ground Truth:     ÁôæÈáåÂª∂Ê¥•Ê≠≤Ë¢´ÁÅΩÈáéÁî∞ÁßãÈªçÂçäËíøËêäÂπ∏Â≠ò


Performing inference:  15%|‚ñà‚ñå        | 3/20 [00:12<01:06,  3.91s/it]


--- Example 3 ---
Model Output:     Ë£Ω
Ground Truth:     N/A (ground truth missing)


Performing inference:  20%|‚ñà‚ñà        | 4/20 [00:16<01:03,  3.97s/it]


--- Example 4 ---
Model Output:     ÂøóÂÇ≥Áï•Êö®ÊªáÊ∫™ÊùéÂÖ¨Âªü
Ground Truth:     ÂøóÂÇ≥Áï•Êö®ÊªÑÊ∫üÊùéÂÖ¨ÁÇ∫


Performing inference:  25%|‚ñà‚ñà‚ñå       | 5/20 [00:20<01:00,  4.05s/it]


--- Example 5 ---
Model Output:     Êù±ÊòåÈÅì‰∏≠Ê¨°ÈüªÁ≠îÂ§©Âπ≥Âæ©Ëè¥‰∫åÈ¶ñ
Ground Truth:     Êù±ÊòåÈÅì‰∏≠Ê¨°ÈüªËçÖÂ§©Âπ≥Âæ©§≤Ö‰∫åÈ¶ñ


Performing inference:  30%|‚ñà‚ñà‚ñà       | 6/20 [00:24<00:56,  4.00s/it]


--- Example 6 ---
Model Output:     Êñπ‰æøÁÑ°ÊâÄÂæóÁÇ∫Êñπ‰æøÂÖ∑ÂõûÂçÑÊô∫
Ground Truth:     Êñπ‰æøÁÑ°ÊâÄÂæóÁÇ∫Êñπ‰æøÈÄ•Âêë‰∏ÄÂàáÊô∫Êô∫


Performing inference:  35%|‚ñà‚ñà‚ñà‚ñå      | 7/20 [00:28<00:49,  3.83s/it]


--- Example 7 ---
Model Output:     ÁæΩÊùñ
Ground Truth:     N/A (ground truth missing)


Performing inference:  40%|‚ñà‚ñà‚ñà‚ñà      | 8/20 [00:32<00:47,  3.97s/it]


--- Example 8 ---
Model Output:     ÂÖ¨Ë¶ã‰πòËôúÈõñÁÑ°ÁÑ∂ÊêçËá≥ÊªøËÇÜ
Ground Truth:     ‰ª•Ëâ≤†ÅÖÊÄßÁ©∫ËàáÂΩºËã¶ÈõÜÊªÖÈÅìËÅñË´¶ÁÑ°


Performing inference:  45%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 9/20 [00:38<00:49,  4.54s/it]


--- Example 9 ---
Model Output:     ÊãæÂÖ´ÈåêÂ§ßÂÖ±‰æõÂΩ°Â∑ù‰øÆÊñπ‰æøÊó†Áîü
Ground Truth:     Êç®ÂçÅÂÖ´‰Ωõ‰∏çÂÖ±Ê≥ïÁÑ°‰∫åÁÇ∫Êñπ‰æøÊó†Áîü


Performing inference:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 10/20 [00:43<00:46,  4.67s/it]


--- Example 10 ---
Model Output:     Á´ãÂëΩÂæóÁîüÈï∑ÊûØÊûùÊ¨≤Ë°ÜËßÄÂ¶ÇÊñØ‰πü
Ground Truth:     Á´ã‰ª§ÂæóÁîüÈï∑ÊïÖÊ≠§Ëà¨†∞•Ê≥¢ÁæÖËúúÂ§öÊñº


Performing inference:  55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 11/20 [00:49<00:45,  5.06s/it]


--- Example 11 ---
Model Output:     Âù§ÂÖ≠ÂÖ≠Â∑ΩÈõ¢ÂÖå‰∏â‰∫§Âπ∂Ê≠£ÂõõÂÖ≠‰∫î‰∫åËá≥Ê≠£Ë•øËá≥Ë•øÂçóË•øÂçóËá≥‰∏úÂåóËá≥‰∏ú
Ground Truth:     N/A (ground truth missing)


Performing inference:  60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 12/20 [00:52<00:36,  4.56s/it]


--- Example 12 ---
Model Output:     ÁñëÊòØÊòüËèØÊÆä‰∏ã‰πùÈúÑ
Ground Truth:     ÁñëÊòØÁæ£Âßù‰∏ã‰πùÈúÑ


Performing inference:  65%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 13/20 [00:57<00:32,  4.67s/it]


--- Example 13 ---
Model Output:     ÈÅìÁÑ°ÈáãÁüúÊ®ΩÈ£≤ÊïÖÊàí‰ΩÜÂØ∂È¨òË´∏Ëà¨Ëã•
Ground Truth:     ÈÅç†ÅÖÁÇ∫Â∞äÁÇ∫Â∞éÊïÖÊàë‰ΩÜÂª£Á®±ËÆÉËà¨†∞•


Performing inference:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 14/20 [01:01<00:26,  4.43s/it]


--- Example 14 ---
Model Output:     Êñπ‰æøÂõûÊóßÁü•Êô∫ÊÖßËÄÖÂÖºÊ≠£Á≠â
Ground Truth:     Êñπ‰æøÂªªÂêë‰∏ÄÂàáÊô∫Êô∫‰øÆÁøíÁÑ°‰∏äÊ≠£Á≠â


Performing inference:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 15/20 [01:05<00:21,  4.32s/it]


--- Example 15 ---
Model Output:     Áü•Â•≥‰∏îÂñúÂãøÊÄùÊ∞¥Êõ≤‰∏≠Ê≠åËÅ≤
Ground Truth:     Êô∫Êô∫ÂÆâ‰ΩèÁúüÂ¶Ç‰πÉËá≥‰∏çÊÄùË≠∞ÁïåÊÖ∂Âñú


Performing inference:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 16/20 [01:10<00:17,  4.44s/it]


--- Example 16 ---
Model Output:     ÂÖâÂ§©Ê•µÂÖâÂáÄÂ§©ÊñΩËØØÂèØÂæóÁî∞ÈùíÁÖßËã•
Ground Truth:     ÂÖâÂ§©Ê•µÂÖâÊ∑®Â§©ÊñΩË®≠ÂèØÂæóÁî±Ê≠§Ëà¨†∞•


Performing inference:  85%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 17/20 [01:13<00:12,  4.15s/it]


--- Example 17 ---
Model Output:     Ê∑±Â∑û‰∫ïÊ≥âÁü≥
Ground Truth:     N/A (ground truth missing)


Performing inference:  90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 18/20 [01:17<00:08,  4.00s/it]


--- Example 18 ---
Model Output:     Âú∞Â£áÊ≠£‰ΩçË±Ü
Ground Truth:     N/A (ground truth missing)


Performing inference:  95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 19/20 [01:21<00:04,  4.20s/it]


--- Example 19 ---
Model Output:     Â§ßÈÅìÂ§ßÂπ∏Â§ßÂæ∑ÂçÅÂπ¥ÁÑ°ÂÖ•ÈùûÊ≥ïÊÇ£Èõ£
Ground Truth:     Â§ßÂ§ß‰∏ìÂ§ßÂçÅÂÖ´‰Ωõ‰∏çÂÖ±Ê≥ïÂñÑ


Performing inference: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [01:29<00:00,  4.46s/it]


--- Example 20 ---
Model Output:     Êô®Áú†ÁîòÁù°‰πÉÂáùÂÖ∂ÁºòÊÅ¨Êó¶ÂÖ∂Èú≤Êú™Á¶ªÂ±±Ê∫™Êú™ÁÖßÊó•
Ground Truth:     Á•ÄÊó£Â≥©‰∏îÁøº‰πÉË≠∞Êí§ÂÖ∂ÂÉè‰∏¶Á•ÄÂõõÂêõ‰∫é‰∏≠ÈªúÈÇ™ÁøºÊ≠£ÂÖ∏ÂàëÊò≠ÁÑâÂ∑•





In [40]:
import os

# Ensure the current directory is where we want the files
!wget -O task_a_c_eva.py https://raw.githubusercontent.com/GoThereGit/EvaHan/refs/heads/main/task_a_c_eva.py
!wget -O task_b_eva.py https://raw.githubusercontent.com/GoThereGit/EvaHan/refs/heads/main/task_b_eva.py

print("Downloaded task_a_c_eva.py and task_b_eva.py")

--2026-02-02 23:24:40--  https://raw.githubusercontent.com/GoThereGit/EvaHan/refs/heads/main/task_a_c_eva.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14094 (14K) [text/plain]
Saving to: ‚Äòtask_a_c_eva.py‚Äô


2026-02-02 23:24:40 (23.4 MB/s) - ‚Äòtask_a_c_eva.py‚Äô saved [14094/14094]

--2026-02-02 23:24:40--  https://raw.githubusercontent.com/GoThereGit/EvaHan/refs/heads/main/task_b_eva.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20275 (20K) [text/plain]
Saving to: ‚Äòtask_b_eva.py‚Äô


2026-02-02 23:24:41 (24.9 MB/

In [55]:
FastVisionModel.for_inference(model) # Ensure model is in inference mode

import json
from tqdm import tqdm
import os
import PIL.Image
import sys

# Ensure the current directory is in sys.path for module discovery
if "/content/" not in sys.path:
    sys.path.append("/content/")

# Force reload of the modules if they were previously loaded
# This helps ensure the latest version of the script is used after download.
if 'task_a_c_eva' in sys.modules:
    del sys.modules['task_a_c_eva']
if 'task_b_eva' in sys.modules:
    del sys.modules['task_b_eva']

# Import the evaluation functions
from task_a_c_eva import calculate_char_metrics # Corrected import
from task_b_eva import LayoutEvaluator

print("Comparing model inference with ground truth and evaluating for the first 20 test examples:")

sum = 0

# Iterate through the first 20 examples of the test_data
for i, test_example in tqdm(enumerate(test_data[:20]), total=20, desc="Performing inference and evaluation"):
    # a. Load the image using its image_path
    image_path = os.path.join(base_path, test_example["image_path"])
    try:
        image = PIL.Image.open(image_path).convert("RGB")
    except Exception as e:
        print(f"Error loading image {image_path}: {e}")
        continue

    # b. Construct the messages list for the tokenizer
    messages = [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": instruction}
        ]}
    ]

    # c. Apply the chat template to messages
    input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)

    # d. Prepare inputs for the model
    inputs = tokenizer(
        image,
        input_text,
        add_special_tokens = False,
        return_tensors = "pt",
    ).to("cuda")

    # e. Generate the model's output
    outputs = model.generate(**inputs, max_new_tokens = 128,
                       use_cache = True, temperature = 1.5, min_p = 0.1)

    # f. Decode the generated tokens to text
    generated_text_tokens = outputs[0][len(inputs["input_ids"][0]):]
    model_output_raw = tokenizer.decode(generated_text_tokens, skip_special_tokens=True)

    # g. Parse the model's text output as a JSON string
    model_output_cleaned = model_output_raw.replace("<|im_end|>", "").strip()
    predicted_transcription = "N/A (parsing error)"
    try:
        # Replace literal newlines with escaped newlines for valid JSON parsing
        model_output_for_json = model_output_cleaned.replace('\n', '\\n')
        parsed_output = json.loads(model_output_for_json)
        predicted_transcription = parsed_output.get("transcription", "N/A (transcription key missing)")
    except json.JSONDecodeError as e:
        print(f"JSON decoding error for output: {model_output_cleaned} - {e}")
        predicted_transcription = f"JSON Error: {model_output_cleaned}"

    # h. Extract the ground truth transcription
    # Check if 'text' key exists, as Dataset B does not have it for transcription evaluation
    ground_truth_transcription = test_example.get("text", "N/A (ground truth missing)")

    # i. Print the comparison
    print(f"\n--- Example {i+1} ---")
    print(f"Model Output:     {predicted_transcription}")
    print(f"Ground Truth:     {ground_truth_transcription}")

    # Conditional evaluation based on dataset
    dataset_name = image_path.split('/')[-2]
    if "Dataset_B" in dataset_name:
        # For Dataset B, the ground truth is for layout detection (regions), not text transcription.
        # The model outputs text transcription, which cannot be directly evaluated with task_b_eva.py (layout evaluator)
        # or against a non-existent 'text' field for Dataset B.
        #print("Evaluation (Dataset B): Not applicable. Model outputs transcription, but ground truth is for layout detection.")
        if ground_truth_transcription != "N/A (ground truth missing)":
            metrics_ac = LayoutEvaluator.evaluate(ground_truth_transcription, predicted_transcription) # Call the correct function
            score_ac = metrics_ac['label_stats']
            print(f"Evaluation (Dataset A/C): Comprehensive Score = {score_ac}")
        else:
            print("Evaluation (Dataset A/C): Skipping due to missing ground truth.")


    else:
        # For Dataset A and C, use calculate_char_metrics and extract comprehensive_score
        if ground_truth_transcription != "N/A (ground truth missing)":
            metrics_ac = calculate_char_metrics(ground_truth_transcription, predicted_transcription) # Call the correct function
            score_ac = metrics_ac.get("comprehensive_score", "N/A (score missing)") # Extract the score
            print(f"Evaluation (Dataset A/C): Comprehensive Score = {score_ac}")
        else:
            print("Evaluation (Dataset A/C): Skipping due to missing ground truth.")


Comparing model inference with ground truth and evaluating for the first 20 test examples:


Performing inference and evaluation:   5%|‚ñå         | 1/20 [00:06<01:59,  6.31s/it]


--- Example 1 ---
Model Output:     ÁõòÁø´Ê∞¥‰∏ÄÂü§ËèúÂçÅ‰∏ÄÁ®Æ‰∫åÂçÅÁ®ÆËó•ËçâÂçÅ‰∏ÄËµ∑Áõ∏Ë£ú‰∫å
Ground Truth:     Â∏•Ëê¨‰∫∫‰∏ÄËªçÊïôÂ£´‰∏âËê¨‰∫∫ÂçÅ‰∫îÈÑïËÄ≥Ëã•‰∫åÂçÅ‰∏ÄÈÑïÂÆúÊúâÂõõËªç‰∏Ä
Evaluation (Dataset A/C): Comprehensive Score = 0.1712


Performing inference and evaluation:  10%|‚ñà         | 2/20 [00:10<01:34,  5.24s/it]


--- Example 2 ---
Model Output:     ÁÑ°ÊÜ´Ë≤†‰πüÈ¢®ÊúÉÊó•ÊµÅÈÜ™ÂíåÊó•Êï£
Ground Truth:     ÁÑ°ÈõÖÈ†å‰πüÈ¢®ÊúÉÊó•ÊµÅÈÜáÂíåÊó•Êï£
Evaluation (Dataset A/C): Comprehensive Score = 0.75


Performing inference and evaluation:  15%|‚ñà‚ñå        | 3/20 [00:15<01:21,  4.78s/it]


--- Example 3 ---
Model Output:     Ë©òÊ®ÇÁπÅÈëøÊó•ÂÖ∂ÂàªË°ÄÁõ∏‰ºêÂÖ±Áõ∏
Ground Truth:     Êï£Á©∫ÁÑ°ËÆäÁï∞Á©∫Â§≤ÊÄßÁ©∫Ëá™Áõ∏Á©∫ÂÖ±Áõ∏
Evaluation (Dataset A/C): Comprehensive Score = 0.2192


Performing inference and evaluation:  20%|‚ñà‚ñà        | 4/20 [00:19<01:11,  4.46s/it]


--- Example 4 ---
Model Output:     ÁÜæÈï∑ÂçÅÂ§ßÊç®ÂçÅÂÖ´‰Ωõ‰∏çÂÖ±Ê≥ïÊåÅÁÑ°
Ground Truth:     ÊÇ≤Â§ßÂñúÂ§ßÊç®ÂçÅÂÖ´‰Ωõ‰∏çÂÖ±Ê≥ï‰∏ñÂ∞ä‰∫ë
Evaluation (Dataset A/C): Comprehensive Score = 0.5778


Performing inference and evaluation:  25%|‚ñà‚ñà‚ñå       | 5/20 [00:22<01:00,  4.06s/it]


--- Example 5 ---
Model Output:     ‰ø°Âπ°
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  30%|‚ñà‚ñà‚ñà       | 6/20 [00:25<00:52,  3.72s/it]


--- Example 6 ---
Model Output:     ‰∏ÉË®ÄÁµ∂Âè•
Ground Truth:     ‰∏ÉË®ÄÁµ∂Âè•
Evaluation (Dataset A/C): Comprehensive Score = 1.0


Performing inference and evaluation:  35%|‚ñà‚ñà‚ñà‚ñå      | 7/20 [00:31<00:58,  4.50s/it]


--- Example 7 ---
Model Output:     Á©çÂºè Ââç‰ΩúÂÖ©Á™óÂïìÈñâ Âπ≥È†Ç ÂõõÁõ¥ ‰∏ã‰ΩúÂπ≥Â∫ïËá∫Â∫ß
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  40%|‚ñà‚ñà‚ñà‚ñà      | 8/20 [00:35<00:52,  4.39s/it]


--- Example 8 ---
Model Output:     Áø∞ËãëÊñ∞Êõ∏Á∫åÈõÜÂç∑ÂõõÂçÅ
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  45%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 9/20 [00:39<00:45,  4.13s/it]


--- Example 9 ---
Model Output:     ÁöáÂ∏ùË°åÂÜ†‰∏â
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 10/20 [00:43<00:40,  4.10s/it]


--- Example 10 ---
Model Output:     ‰πüÁßÅÂõ∫Âêæ‰∏çÂæóËÄåÁü•‰πüÊúâË¨ÇË∂≥‰∏ã‰ª•ÂÖ¨‰ªä
Ground Truth:     ‰πüÁßÅÂõ∫Âêæ‰∏çÂæóËÄåÁü•‰πüÊúâË¨ÇË∂≥‰∏ã‰ª•ÂÖ¨
Evaluation (Dataset A/C): Comprehensive Score = 0.9445


Performing inference and evaluation:  55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 11/20 [00:51<00:47,  5.33s/it]


--- Example 11 ---
Model Output:     Áõ∏ËàáÁæ£Ë≥äÈï∑Â§úÈÅäÊà≤ÊñºÂ≥¥‰∏ä‰∏âÁîüÁÑ°ÈÅøËÉ°‰∏ÄÊ≠ªÈùû
Ground Truth:     ÊòîÂæóÁÑ°ÊÖ®ÁÑ∂Â§™ÊÅØÊµÅÊ∂ï‰πéÁÑ∂Ê≠§Â§©‰∏ã‰πÉÂ∏ùÁõ∏ÂÇ≥Ëá≥‰ªä‰πãÂ§©‰∏ã‰πü‰ªä
Evaluation (Dataset A/C): Comprehensive Score = 0.0


Performing inference and evaluation:  60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 12/20 [00:59<00:48,  6.07s/it]


--- Example 12 ---
Model Output:     Ëë≠Â∑ûÁñÜÂüüÂúñ

Êù±ÂåóÂÆâÁ•ûÊ∞¥Áïå
Áïå‰∏ÄÁôæÂçÅÈáå

Âçó
Êù±‰∫§Ê≠∏Á∏£Áïå‰∫åÈáå
Ë•ø‰∫§Á•ûÊ∞¥Á∏£Áïå
‰∏ÄÁôæÂÖ≠ÂçÅÈáå

Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  65%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 13/20 [01:03<00:38,  5.46s/it]


--- Example 13 ---
Model Output:     Á∂†ËëâÈ¢®ËºïÈõ®ÂÉùÊÑÅ‰∫∫
Ground Truth:     Ë†ªÈ¢®ËúëÈõ®ÂÄçÊÑÅ‰∫∫
Evaluation (Dataset A/C): Comprehensive Score = 0.4743


Performing inference and evaluation:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 14/20 [01:53<01:53, 18.99s/it]


--- Example 14 ---
Model Output:     Âí´Â∞∫Êõ∏Âô®Âêå
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 15/20 [02:00<01:16, 15.31s/it]

JSON decoding error for output: {"transcription": "Â¶ÇÁáÄ<ÊïôÈΩãÂà∂ÈõÜÊØî‰∏äÁåõÂÖ∏<Á∂ì‰∏âÂè£Êá∏Âè£Âè£‚ñ°Âè£‚ñ°",
 "notes": ""} - Expecting property name enclosed in double quotes: line 1 column 43 (char 42)

--- Example 15 ---
Model Output:     JSON Error: {"transcription": "Â¶ÇÁáÄ<ÊïôÈΩãÂà∂ÈõÜÊØî‰∏äÁåõÂÖ∏<Á∂ì‰∏âÂè£Êá∏Âè£Âè£‚ñ°Âè£‚ñ°",
 "notes": ""}
Ground Truth:     Ëá™Ëã•ËÄÄ‰∫∫Êï¨ÊÖïÈõ≤ÈõÜÂÆàÂÄÖÂâçÂ∏≠‰∫∫ÊúâËá™‚ñ°ÈÄÆ‚ñ°‚ñ°Ë¶ã‰πã‚ñ°‚ñ°‚ñ°‚ñ°
Evaluation (Dataset A/C): Comprehensive Score = -0.7718


Performing inference and evaluation:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 16/20 [02:03<00:47, 11.81s/it]


--- Example 16 ---
Model Output:     Âê≥Â†°Á∏£ÁñÜÂüüÂúñ
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  85%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 17/20 [02:12<00:32, 10.92s/it]


--- Example 17 ---
Model Output:     ÊôãÂøóÂ∫èÂçÅ‰∫åÊ¨°Â∫¶Êï∏ÂèäÂ∑ûÈÉ°ÁñÜÊ¨°‰∫ëÁè≠Âõ∫‰ª•ÂçÅ‰∫åÊ¨°ÈÖçÂçÅ‰∫åÈáéÂèàÈ≠èÈô≥ÂçìÊõ¥Ë®ÄÈÉ°ÂúãÊâÄÂÖ•ÂÆøÂ∫¶ÂÖ∂Ë®ÄÁ´ÜË©≥‰ªäÈôÑÊ¨°‰πã
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 18/20 [02:16<00:17,  8.69s/it]


--- Example 18 ---
Model Output:     Á∏ëÁáüËÉåÂàÄ
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation:  95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 19/20 [02:26<00:09,  9.26s/it]


--- Example 19 ---
Model Output:     ËïÉÁéãË¶ãÊù±ÂÆ´ÂúñÊù±ÂÆÆÈñÄÂ§ñÊ†°Â∞âË≠¶Âü∑ÂºïÁè≠ÊñáÂÆò‰æçÁ´ãÂºïÁè≠ÊâøÂÇ≥ËïÉÂúãÂæûÂÆò‰æçÁõ¥ÈñòÁ¶ÆÁîüÂÉïÁ´ãÂæÖÂÆøÈ£üÁè≠‰ΩêÊâøÂÇ≥ÂÆàË°õÁî≤Â£´ÂÖµ‰ªó
Ground Truth:     N/A (ground truth missing)
Evaluation (Dataset A/C): Skipping due to missing ground truth.


Performing inference and evaluation: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [02:30<00:00,  7.54s/it]


--- Example 20 ---
Model Output:     Êõ∏ÂØ´Âª£‰ª§Ê∑ªÂ∏ÉÊòØÂñÑÁî∑Â≠êÂñÑÂ•≥‰∫∫Á≠â
Ground Truth:     Êõ∏ÂØ´Âª£‰ªäÊ∑ªÂ∏ÉÊòØÂñÑÁî∑Â≠êÂñÑÂ¶•Á≠â
Evaluation (Dataset A/C): Comprehensive Score = 0.7862





In [49]:
model.save_pretrained("lora_model_QWEN_EVAHAN")  # Local saving
tokenizer.save_pretrained("lora_model_QWEN_EVAHAN")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

[]