# Serbian Legal Named Entity Recognition (NER) Pipeline - XLM-R-BERTić 5-Fold Cross-Validation

This notebook implements 5-fold cross-validation for the Serbian Legal NER pipeline using XLM-R-BERTić model.
XLM-R-BERTić combines multilingual XLM-RoBERTa with Serbian-specific fine-tuning for better performance on Serbian text.

## Key Features
- **5-Fold Cross-Validation**: Robust evaluation across different data splits
- **XLM-R-BERTić Architecture**: Multilingual model with Serbian specialization
- **Sliding Window Tokenization**: Handles long sequences without truncation
- **Comprehensive Metrics**: Precision, recall, F1-score, and accuracy tracking
- **Statistical Analysis**: Mean and standard deviation across folds

## XLM-R-BERTić Advantages
- **Multilingual Foundation**: Built on XLM-RoBERTa's strong multilingual capabilities
- **Serbian Specialization**: Fine-tuned specifically for Serbian language tasks
- **Better Generalization**: Potential for better performance on diverse Serbian legal texts
- **Cross-lingual Transfer**: Benefits from multilingual pretraining

## Entity Types
- **COURT**: Court institutions
- **DECISION_DATE**: Dates of legal decisions
- **CASE_NUMBER**: Case identifiers
- **CRIMINAL_ACT**: Criminal acts/charges
- **PROSECUTOR**: Prosecutor entities
- **DEFENDANT**: Defendant entities
- **JUDGE**: Judge names
- **REGISTRAR**: Court registrar
- **SANCTION**: Sanctions/penalties
- **SANCTION_TYPE**: Type of sanction
- **SANCTION_VALUE**: Value/duration of sanction
- **PROVISION**: Legal provisions
- **PROCEDURE_COSTS**: Legal procedure costs

In [None]:
# Mount Google Drive (for Colab)
try:
    from google.colab import drive
    drive.mount('/content/drive')
    USE_COLAB = True
except ImportError:
    USE_COLAB = False
    print("Running locally")

## 1. Environment Setup and Dependencies

In [None]:
# Install required packages
!pip install transformers torch datasets tokenizers scikit-learn seqeval pandas numpy matplotlib seaborn tqdm

In [None]:
# Import shared modules
import sys
import os

# Add the shared modules to path
if USE_COLAB:
    sys.path.append('/content/drive/MyDrive/NER_Master/ner/')
else:
    sys.path.append('../shared')

import importlib
import shared
import shared.model_utils
import shared.data_processing
import shared.dataset
import shared.evaluation
import shared.config
importlib.reload(shared.config)
importlib.reload(shared.data_processing)
importlib.reload(shared.dataset)
importlib.reload(shared.model_utils)
importlib.reload(shared.evaluation)
importlib.reload(shared)

# Import from shared modules
from shared import (
    # Configuration
    ENTITY_TYPES, BIO_LABELS, DEFAULT_TRAINING_ARGS,
    get_default_model_config, get_paths, setup_environment,

    # Data processing
    LabelStudioToBIOConverter, load_labelstudio_data,
    analyze_labelstudio_data, validate_bio_examples,

    # Dataset
    NERDataset, split_dataset, tokenize_and_align_labels_with_sliding_window,
    print_sequence_analysis, create_huggingface_datasets,

    # Model utilities
    load_model_and_tokenizer, create_training_arguments, create_trainer,
    detailed_evaluation, save_model_info, setup_device_and_seed,
    PerClassMetricsCallback,

    # Evaluation
    generate_evaluation_report, plot_training_history, plot_entity_distribution,
    # Comprehensive tracking
    analyze_entity_distribution_per_fold,
    generate_detailed_classification_report,
    # Aggregate functions across all folds
    create_aggregate_report_across_folds
)

# Standard imports
import warnings
warnings.filterwarnings('ignore')
import numpy as np
from sklearn.model_selection import KFold
import torch
from transformers import DataCollatorForTokenClassification, AutoTokenizer

# Setup device and random seed
device = setup_device_and_seed(42)

## 2. Configuration and Environment Setup

In [None]:
# Setup environment and paths
env_setup = setup_environment(use_local=not USE_COLAB, create_dirs=True)
paths = env_setup['paths']

# Model configuration - XLM-R-BERTić
MODEL_NAME = "classla/xlm-r-bertic"
model_config = get_default_model_config()

# Output directory
OUTPUT_DIR = f"{paths['models_dir']}/xlm_r_bertic_5fold_cv"
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"🔧 Configuration:")
print(f"  Model: {MODEL_NAME}")
print(f"  Architecture: XLM-R-BERTić")
print(f"  Output directory: {OUTPUT_DIR}")
print(f"  Entity types: {len(ENTITY_TYPES)}")
print(f"  BIO labels: {len(BIO_LABELS)}")

## 3. Data Loading and Analysis

In [None]:
# Load LabelStudio data
labelstudio_data = load_labelstudio_data(paths['labelstudio_json'])

# Analyze the data
if labelstudio_data:
    analysis = analyze_labelstudio_data(labelstudio_data)
else:
    print("❌ No data loaded. Please check your paths.")
    exit()

## 4. Data Preprocessing and BIO Conversion

In [None]:
# Convert LabelStudio data to BIO format
converter = LabelStudioToBIOConverter(
    judgments_dir=paths['judgments_dir'],
    labelstudio_files_dir=paths.get('labelstudio_files_dir')
)

bio_examples = converter.convert_to_bio(labelstudio_data)
print(f"✅ Converted {len(bio_examples)} examples to BIO format")

# Validate BIO examples
valid_examples, stats = validate_bio_examples(bio_examples)
print(f"📊 Validation complete: {stats['valid_examples']} valid examples")

## 5. Dataset Preparation

In [None]:
# Create NER dataset
ner_dataset = NERDataset(valid_examples)
prepared_examples = ner_dataset.prepare_for_training()

print(f"📊 Dataset statistics:")
print(f"  Number of unique labels: {ner_dataset.get_num_labels()}")
print(f"  Prepared examples: {len(prepared_examples)}")

# Get label statistics
label_stats = ner_dataset.get_label_statistics()
print(f"  Total tokens: {label_stats['total_tokens']}")
print(f"  Entity types found: {len(label_stats['entity_counts'])}")

## 6. K-Fold Cross-Validation Setup

In [None]:
# Set up 5-fold cross-validation
N_FOLDS = 5
kfold = KFold(n_splits=N_FOLDS, shuffle=True, random_state=42)

# Convert to numpy array for easier indexing
examples_array = np.array(prepared_examples, dtype=object)

print(f"Setting up {N_FOLDS}-fold cross-validation")
print(f"Total examples: {len(prepared_examples)}")
print(f"Examples per fold (approx): {len(prepared_examples) // N_FOLDS}")

# Load tokenizer (will be used across all folds)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
print(f"\nLoaded tokenizer for {MODEL_NAME}")
print(f"Tokenizer vocab size: {tokenizer.vocab_size}")

# Store results from all folds
fold_results = []

## 7. K-Fold Cross-Validation Helper Functions

In [None]:
# ============================================================================
# XLM-R-BERTić K-FOLD CROSS-VALIDATION HELPER FUNCTIONS
# ============================================================================

def prepare_fold_data(train_examples, val_examples, tokenizer, ner_dataset):
    """
    Prepare training and validation datasets for a specific fold.
    
    Args:
        train_examples: Training examples for this fold
        val_examples: Validation examples for this fold
        tokenizer: Tokenizer instance
        ner_dataset: NER dataset instance
    
    Returns:
        tuple: (train_dataset, val_dataset, data_collator)
    """
    # Tokenize datasets with sliding window
    train_tokenized = tokenize_and_align_labels_with_sliding_window(
        train_examples, tokenizer, ner_dataset.label_to_id,
        max_length=model_config['max_length'], stride=model_config['stride']
    )
    
    val_tokenized = tokenize_and_align_labels_with_sliding_window(
        val_examples, tokenizer, ner_dataset.label_to_id,
        max_length=model_config['max_length'], stride=model_config['stride']
    )
    
    # Create HuggingFace datasets
    train_dataset, val_dataset, _ = create_huggingface_datasets(
        train_tokenized, val_tokenized, val_tokenized  # Using val as placeholder for test
    )
    
    # Data collator
    data_collator = DataCollatorForTokenClassification(
        tokenizer=tokenizer,
        padding=True,
        return_tensors="pt"
    )
    
    return train_dataset, val_dataset, data_collator

print("✅ XLM-R-BERTić data preparation function defined successfully!")

In [None]:
def create_xlm_r_model_and_trainer(fold_num, train_dataset, val_dataset, data_collator, tokenizer, ner_dataset, device):
    """
    Create XLM-R-BERTić model and trainer for a specific fold.
    
    Args:
        fold_num: Current fold number
        train_dataset: Training dataset for this fold
        val_dataset: Validation dataset for this fold
        data_collator: Data collator
        tokenizer: Tokenizer instance
        ner_dataset: NER dataset instance
        device: Device to use (cuda/cpu)
    
    Returns:
        tuple: (model, trainer, metrics_callback, fold_output_dir)
    """
    # Create fold-specific output directory
    fold_output_dir = f"{OUTPUT_DIR}/fold_{fold_num}"
    import os
    os.makedirs(fold_output_dir, exist_ok=True)
    
    # Load fresh XLM-R-BERTić model for this fold
    model, _ = load_model_and_tokenizer(
        MODEL_NAME,
        ner_dataset.get_num_labels(),
        ner_dataset.id_to_label,
        ner_dataset.label_to_id
    )
    
    # Move model to device
    model.to(device)
    
    # Create training arguments for this fold
    training_args = create_training_arguments(
        output_dir=fold_output_dir,
        num_epochs=model_config['num_epochs'],
        batch_size=model_config['batch_size'],
        learning_rate=model_config['learning_rate'],
        warmup_steps=500,
        weight_decay=0.01,
        logging_steps=50,
        eval_steps=100,
        save_steps=500,
        save_total_limit=2,
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        greater_is_better=True,
        evaluation_strategy="steps",
        save_strategy="steps",
        report_to="none",  # Disable wandb for cleaner output
        run_name=f"xlm_r_bertic_fold_{fold_num}"
    )
    
    # Create metrics callback for comprehensive tracking
    metrics_callback = PerClassMetricsCallback(id_to_label=ner_dataset.id_to_label)
    
    # Create trainer with metrics callback
    trainer = create_trainer(
        model=model,
        training_args=training_args,
        train_dataset=train_dataset,
        val_dataset=val_dataset,
        data_collator=data_collator,
        tokenizer=tokenizer,
        id_to_label=ner_dataset.id_to_label,
        early_stopping_patience=3,
        additional_callbacks=[metrics_callback]
    )
    
    print(f"XLM-R-BERTić Trainer initialized for fold {fold_num} with comprehensive metrics tracking")
    return model, trainer, metrics_callback, fold_output_dir

print("✅ XLM-R-BERTić model and trainer creation function defined successfully!")

In [None]:
def train_and_evaluate_xlm_r_fold(fold_num, trainer, val_dataset, ner_dataset, fold_output_dir):
    """
    Train and evaluate an XLM-R-BERTić model for a specific fold.
    
    Args:
        fold_num: Current fold number
        trainer: Trainer instance
        val_dataset: Validation dataset for this fold
        ner_dataset: NER dataset instance
        fold_output_dir: Output directory for this fold
    
    Returns:
        dict: Fold results including comprehensive metrics
    """
    print(f"\n🏋️  Training XLM-R-BERTić fold {fold_num}...")
    
    # Train the model
    trainer.train()
    
    print(f"💾 Saving XLM-R-BERTić model for fold {fold_num}...")
    trainer.save_model()
    
    # Evaluate on validation set
    print(f"📊 Evaluating XLM-R-BERTić fold {fold_num}...")
    eval_results = detailed_evaluation(
        trainer, val_dataset, f"XLM-R-BERTić Fold {fold_num} Validation", ner_dataset.id_to_label
    )
    
    # Get predictions for confusion matrix and detailed analysis
    true_labels = eval_results['true_labels']
    pred_labels = eval_results['true_predictions']
    
    # Flatten for confusion matrix
    from sklearn.metrics import confusion_matrix
    flat_true = [label for seq in true_labels for label in seq]
    flat_pred = [label for seq in pred_labels for label in seq]
    all_labels = sorted(list(set(flat_true + flat_pred)))
    cm = confusion_matrix(flat_true, flat_pred, labels=all_labels)
    
    # Generate classification report for this fold
    per_class_metrics = generate_detailed_classification_report(
        true_labels, pred_labels, fold_output_dir, fold_num, "Validation"
    )
    
    # Extract metrics
    fold_result = {
        'fold': fold_num,
        'precision': eval_results['precision'],
        'recall': eval_results['recall'],
        'f1': eval_results['f1'],
        'accuracy': eval_results['accuracy'],
        'per_class_metrics': per_class_metrics,
        'confusion_matrix': cm,
        'labels': all_labels
    }
    
    print(f"\nXLM-R-BERTić Fold {fold_num} completed successfully!")
    return fold_result

print("✅ XLM-R-BERTić training and evaluation function defined successfully!")

## 8. K-Fold Cross-Validation Training Loop

In [None]:
# Check device availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Main K-Fold Cross-Validation Loop for XLM-R-BERTić
print(f"\n{'='*80}")
print(f"STARTING {N_FOLDS}-FOLD CROSS-VALIDATION - XLM-R-BERTić")
print(f"{'='*80}")
print(f"Total examples: {len(examples_array)}")
print(f"Model: {MODEL_NAME}")
print(f"Device: {device}")

# Execute K-Fold training
for fold_num, (train_idx, val_idx) in enumerate(kfold.split(examples_array), 1):
    print(f"\n{'='*80}")
    print(f"XLM-R-BERTić FOLD {fold_num}/{N_FOLDS}")
    print(f"{'='*80}")
    print(f"Train indices: {len(train_idx)}, Val indices: {len(val_idx)}")
    
    # Get fold data
    train_examples = examples_array[train_idx].tolist()
    val_examples = examples_array[val_idx].tolist()
    
    print(f"Training examples: {len(train_examples)}")
    print(f"Validation examples: {len(val_examples)}")
    
    # Analyze entity distribution for this fold
    print(f"\n📊 Analyzing entity distribution for fold {fold_num}...")
    train_dist = analyze_entity_distribution_per_fold(train_examples, f"Fold {fold_num} - Training")
    val_dist = analyze_entity_distribution_per_fold(val_examples, f"Fold {fold_num} - Validation")
    
    # Prepare data for this fold
    print(f"\n🔤 Preparing data for XLM-R-BERTić fold {fold_num}...")
    train_dataset, val_dataset, data_collator = prepare_fold_data(
        train_examples, val_examples, tokenizer, ner_dataset
    )
    
    print(f"📦 XLM-R-BERTić Fold {fold_num} datasets:")
    print(f"  Training: {len(train_dataset)} examples")
    print(f"  Validation: {len(val_dataset)} examples")
    
    # Create XLM-R-BERTić model and trainer for this fold
    print(f"\n🤖 Creating XLM-R-BERTić model and trainer for fold {fold_num}...")
    model, trainer, metrics_callback, fold_output_dir = create_xlm_r_model_and_trainer(
        fold_num, train_dataset, val_dataset, data_collator, tokenizer, ner_dataset, device
    )
    
    # Train and evaluate this fold
    fold_result = train_and_evaluate_xlm_r_fold(
        fold_num, trainer, val_dataset, ner_dataset, fold_output_dir
    )
    
    # Store comprehensive data for aggregation
    fold_result['distributions'] = {'train': train_dist, 'val': val_dist}
    fold_result['training_history'] = metrics_callback.get_training_history()
    
    fold_results.append(fold_result)
    
    # Clean up to free memory
    del model, trainer, metrics_callback, train_dataset, val_dataset
    torch.cuda.empty_cache() if torch.cuda.is_available() else None
    
    print(f"\n✅ XLM-R-BERTić Fold {fold_num} completed!")
    print(f"   Precision: {fold_result['precision']:.4f}")
    print(f"   Recall: {fold_result['recall']:.4f}")
    print(f"   F1-Score: {fold_result['f1']:.4f}")
    print(f"   Accuracy: {fold_result['accuracy']:.4f}")

print(f"\n{'='*80}")
print(f"XLM-R-BERTić K-FOLD CROSS-VALIDATION COMPLETED!")
print(f"{'='*80}")

## 9. Aggregate Results Across Folds

In [None]:
# ============================================================================
# AGGREGATE RESULTS ACROSS FOLDS WITH COMPREHENSIVE VISUALIZATIONS
# ============================================================================

# Create comprehensive aggregate report with all visualizations
print(f"\n{'='*80}")
print(f"GENERATING COMPREHENSIVE AGGREGATE REPORT FOR XLM-R-BERTić")
print(f"{'='*80}")

aggregate_report = create_aggregate_report_across_folds(
    fold_results=fold_results,
    model_name="XLM-R-BERTić (classla/bcms-bertic)",
    display=True
)

# Extract summary metrics
precisions = [result['precision'] for result in fold_results]
recalls = [result['recall'] for result in fold_results]
f1_scores = [result['f1'] for result in fold_results]
accuracies = [result['accuracy'] for result in fold_results]

# Print summary
print(f"\n{'='*80}")
print(f"XLM-R-BERTić FINAL RESULTS ACROSS {N_FOLDS} FOLDS")
print(f"{'='*80}")
print(f"Precision: {np.mean(precisions):.4f} ± {np.std(precisions):.4f}")
print(f"Recall:    {np.mean(recalls):.4f} ± {np.std(recalls):.4f}")
print(f"F1-Score:  {np.mean(f1_scores):.4f} ± {np.std(f1_scores):.4f}")
print(f"Accuracy:  {np.mean(accuracies):.4f} ± {np.std(accuracies):.4f}")

## 10. Visualization of K-Fold Results

In [None]:
# ============================================================================
# VISUALIZATION OF K-FOLD RESULTS
# ============================================================================

import matplotlib.pyplot as plt

# Create visualization of fold results
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle(f'{N_FOLDS}-Fold Cross-Validation Results - XLM-R-BERTić Model', fontsize=16, fontweight='bold')

fold_numbers = list(range(1, N_FOLDS + 1))

# Plot precision
ax1.plot(fold_numbers, precisions, marker='o', linewidth=2, markersize=8, color='#2E86AB')
ax1.axhline(y=np.mean(precisions), color='r', linestyle='--', label=f'Mean: {np.mean(precisions):.4f}')
ax1.set_xlabel('Fold Number', fontsize=12)
ax1.set_ylabel('Precision', fontsize=12)
ax1.set_title('Precision Across Folds', fontsize=14, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.legend()
ax1.set_xticks(fold_numbers)

# Plot recall
ax2.plot(fold_numbers, recalls, marker='s', linewidth=2, markersize=8, color='#A23B72')
ax2.axhline(y=np.mean(recalls), color='r', linestyle='--', label=f'Mean: {np.mean(recalls):.4f}')
ax2.set_xlabel('Fold Number', fontsize=12)
ax2.set_ylabel('Recall', fontsize=12)
ax2.set_title('Recall Across Folds', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.legend()
ax2.set_xticks(fold_numbers)

# Plot F1-score
ax3.plot(fold_numbers, f1_scores, marker='^', linewidth=2, markersize=8, color='#F18F01')
ax3.axhline(y=np.mean(f1_scores), color='r', linestyle='--', label=f'Mean: {np.mean(f1_scores):.4f}')
ax3.set_xlabel('Fold Number', fontsize=12)
ax3.set_ylabel('F1-Score', fontsize=12)
ax3.set_title('F1-Score Across Folds', fontsize=14, fontweight='bold')
ax3.grid(True, alpha=0.3)
ax3.legend()
ax3.set_xticks(fold_numbers)

# Plot accuracy
ax4.plot(fold_numbers, accuracies, marker='D', linewidth=2, markersize=8, color='#6A994E')
ax4.axhline(y=np.mean(accuracies), color='r', linestyle='--', label=f'Mean: {np.mean(accuracies):.4f}')
ax4.set_xlabel('Fold Number', fontsize=12)
ax4.set_ylabel('Accuracy', fontsize=12)
ax4.set_title('Accuracy Across Folds', fontsize=14, fontweight='bold')
ax4.grid(True, alpha=0.3)
ax4.legend()
ax4.set_xticks(fold_numbers)

plt.tight_layout()
plt.savefig(f"{OUTPUT_DIR}/xlm_r_bertic_5fold_cv_results.png", dpi=300, bbox_inches='tight')
plt.show()

print(f"\n✅ Visualization saved to: {OUTPUT_DIR}/xlm_r_bertic_5fold_cv_results.png")

## 11. Save Results to File

In [None]:
# ============================================================================
# SAVE RESULTS TO FILE
# ============================================================================

import json
import pandas as pd
from datetime import datetime

# Create results summary
results_summary = {
    'experiment_info': {
        'model': MODEL_NAME,
        'architecture': 'XLM-R-BERTić',
        'n_folds': N_FOLDS,
        'timestamp': datetime.now().isoformat(),
        'random_seed': 42
    },
    'overall_metrics': {
        'precision_mean': float(np.mean(precisions)),
        'precision_std': float(np.std(precisions)),
        'recall_mean': float(np.mean(recalls)),
        'recall_std': float(np.std(recalls)),
        'f1_mean': float(np.mean(f1_scores)),
        'f1_std': float(np.std(f1_scores)),
        'accuracy_mean': float(np.mean(accuracies)),
        'accuracy_std': float(np.std(accuracies))
    },
    'fold_results': [
        {
            'fold': result['fold'],
            'precision': float(result['precision']),
            'recall': float(result['recall']),
            'f1': float(result['f1']),
            'accuracy': float(result['accuracy'])
        }
        for result in fold_results
    ]
}

# Save results to JSON
results_file = f"{OUTPUT_DIR}/5fold_cv_results.json"
with open(results_file, 'w', encoding='utf-8') as f:
    json.dump(results_summary, f, indent=2, ensure_ascii=False)

print(f"✅ Results saved to: {results_file}")

# Create CSV for easy analysis
df_results = pd.DataFrame([
    {
        'Fold': result['fold'],
        'Precision': result['precision'],
        'Recall': result['recall'],
        'F1-Score': result['f1'],
        'Accuracy': result['accuracy']
    }
    for result in fold_results
])

# Add summary row
summary_row = {
    'Fold': 'Mean ± Std',
    'Precision': f"{np.mean(precisions):.4f} ± {np.std(precisions):.4f}",
    'Recall': f"{np.mean(recalls):.4f} ± {np.std(recalls):.4f}",
    'F1-Score': f"{np.mean(f1_scores):.4f} ± {np.std(f1_scores):.4f}",
    'Accuracy': f"{np.mean(accuracies):.4f} ± {np.std(accuracies):.4f}"
}

df_results = pd.concat([df_results, pd.DataFrame([summary_row])], ignore_index=True)

csv_file = f"{OUTPUT_DIR}/5fold_cv_results.csv"
df_results.to_csv(csv_file, index=False)
print(f"✅ Results CSV saved to: {csv_file}")

# Display final summary table
print(f"\n📊 FINAL RESULTS TABLE:")
print(df_results.to_string(index=False))

## 12. Conclusion

This notebook successfully implemented 5-fold cross-validation for the Serbian Legal NER pipeline using XLM-R-BERTić model.

### Key Achievements:
- ✅ **Multilingual Model**: XLM-R-BERTić combines multilingual capabilities with Serbian specialization
- ✅ **Robust Evaluation**: 5-fold cross-validation provides reliable performance estimates
- ✅ **Comprehensive Metrics**: Precision, recall, F1-score, and accuracy tracked across all folds
- ✅ **Statistical Analysis**: Mean and standard deviation calculated for all metrics
- ✅ **Visualization**: Clear charts showing performance across folds
- ✅ **Results Persistence**: JSON and CSV files saved for further analysis

### XLM-R-BERTić Advantages:
- **Multilingual Foundation**: Benefits from XLM-RoBERTa's strong multilingual pretraining
- **Serbian Specialization**: Fine-tuned specifically for Serbian language tasks
- **Better Generalization**: Potential for improved performance on diverse Serbian legal texts
- **Cross-lingual Transfer**: Can leverage knowledge from multiple languages

### Next Steps:
1. **Compare with Other Models**: Analyze performance differences with base BERT, BERT-CRF, and class weights approaches
2. **Error Analysis**: Examine misclassified entities to identify improvement opportunities
3. **Hyperparameter Tuning**: Optimize learning rate, batch size, and other parameters
4. **Ensemble Methods**: Combine predictions from multiple folds for better performance

The 5-fold cross-validation framework successfully evaluated XLM-R-BERTić for Serbian Legal NER!