# Method 1: Prompt Engineering for NER Extraction

This notebook demonstrates NER extraction using prompt engineering with pre-trained LLMs.

## Overview
- **Approach**: Use carefully crafted prompts with pre-trained models
- **Model**: Meta-Llama-3.1-8B-Instruct (or GPT-4o-mini)
- **Advantages**: 
  - No training required
  - Fast to implement
  - Baseline for comparison
- **Disadvantages**:
  - May not capture domain-specific patterns
  - Less stable with long texts

## 1. Setup and Imports

In [None]:
import sys
sys.path.append('..')

from src.config import NERConfig, PROCESSED_DATA_DIR, RESULTS_DIR
from src.data_loader import NERDataLoader
from src.prompt_engineering import PromptNERExtractor
from src.evaluation import NEREvaluator
from src.benchmark import NERBenchmark

import json
from pathlib import Path

## 2. Load Configuration

In [None]:
# Initialize configuration
config = NERConfig(
    model_name="meta-llama/Meta-Llama-3.1-8B-Instruct",
    temperature=0.1,  # Low temperature for more deterministic output
    max_length=2048
)

print("Configuration:")
print(f"  Model: {config.model_name}")
print(f"  Entity types: {config.entity_types}")
print(f"  Temperature: {config.temperature}")

## 3. Load Dataset

In [None]:
# Load processed dataset
val_dataset = NERDataLoader.load_json_dataset(PROCESSED_DATA_DIR / "validation.json")
test_dataset = NERDataLoader.load_json_dataset(PROCESSED_DATA_DIR / "test.json")

print(f"Validation set size: {len(val_dataset)}")
print(f"Test set size: {len(test_dataset)}")

# Show example
print("\nExample sample:")
print(f"Text: {val_dataset[0]['text'][:200]}...")
print(f"Entities: {val_dataset[0]['entities']}")

## 4. Initialize Prompt-based Extractor

In [None]:
# Initialize extractor
# Set use_openai=True to use GPT-4o-mini (requires OPENAI_API_KEY in .env)
extractor = PromptNERExtractor(config=config, use_openai=False)

print("Extractor initialized successfully!")

## 5. Test on Sample Examples

In [None]:
# Test on a few examples
num_examples = 3

for i, sample in enumerate(val_dataset[:num_examples]):
    print(f"\n{'='*80}")
    print(f"Example {i+1}")
    print(f"{'='*80}")
    
    text = sample['text']
    ground_truth = sample['entities']
    
    print(f"\nText: {text[:300]}...\n")
    
    # Extract entities
    predicted = extractor.extract_entities(text)
    
    print("Ground Truth:")
    print(json.dumps(ground_truth, indent=2, ensure_ascii=False))
    
    print("\nPredicted:")
    print(json.dumps(predicted, indent=2, ensure_ascii=False))

## 6. Evaluate on Validation Set

In [None]:
# Run evaluation on validation set
print("Running evaluation on validation set...")
predictions, ground_truth = extractor.evaluate_on_dataset(val_dataset)

# Evaluate
evaluator = NEREvaluator(entity_types=config.entity_types)
results = evaluator.evaluate_all(predictions, ground_truth)

# Print results
evaluator.print_results(results)

# Save results
results_path = RESULTS_DIR / "prompt_engineering_validation.json"
evaluator.save_results(results, results_path)
print(f"Results saved to {results_path}")

## 7. Run Benchmark on Test Set

In [None]:
# Run benchmark on test set
benchmark = NERBenchmark(config=config)
test_results = benchmark.run_benchmark(
    method_name="Prompt Engineering",
    extractor=extractor,
    test_dataset=test_dataset,
    verbose=True
)

# Save benchmark results
benchmark.save_results(RESULTS_DIR / "prompt_engineering")

## 8. Analysis and Insights

In [None]:
print("\nKey Insights:")
print(f"  - Exact Match Accuracy: {test_results['exact_match_accuracy']:.2%}")
print(f"  - Macro F1 Score: {test_results['partial_match_metrics']['macro_avg']['f1']:.2%}")
print(f"  - Inference Speed: {test_results['samples_per_second']:.2f} samples/second")

print("\nStrengths:")
print("  - Quick to implement (no training required)")
print("  - Good baseline performance")
print("  - Works well with clear, well-formed text")

print("\nWeaknesses:")
print("  - May miss domain-specific entities")
print("  - Inconsistent with complex or noisy text")
print("  - Sensitive to prompt variations")

## 9. Save Predictions for Analysis

In [None]:
# Save predictions
predictions_path = RESULTS_DIR / "prompt_engineering" / "predictions.json"
benchmark.save_predictions(
    method_name="Prompt Engineering",
    predictions=predictions,
    output_path=predictions_path
)

print("\nExperiment complete!")
print(f"Results saved to {RESULTS_DIR / 'prompt_engineering'}")