# ML Pipeline for Astronomical NER Model

## Objective

This notebook implements a complete machine learning pipeline for training, optimizing, and evaluating a spaCy Named Entity Recognition (NER) model to identify astronomical objects in scientific text.

## Pipeline Overview

1. **Data Loading & Preprocessing**: Load the annotated training data from the previous notebook
2. **Train/Validation/Test Split**: Create balanced splits for training, hyperparameter tuning, and final evaluation
3. **Model Training**: Train spaCy NER model with configurable parameters
4. **Hyperparameter Optimization**: Use systematic search to find optimal model parameters
5. **Model Evaluation**: Comprehensive evaluation using precision, recall, F1-score, and entity-level metrics
6. **Model Persistence**: Save trained models for deployment

## Key Features

- Stratified splitting to maintain entity distribution across splits
- Configurable training parameters (dropout, learning rate, batch size)
- Early stopping and learning rate scheduling
- Comprehensive evaluation metrics including per-entity analysis
- Model versioning and experiment tracking capabilities
- Error analysis and misclassification detection

In [1]:
import json
import random
import warnings
from pathlib import Path
from typing import List, Tuple, Dict, Any, Optional
from collections import defaultdict, Counter
import numpy as np
import pandas as pd

# spaCy imports
import spacy
from spacy.training import Example
from spacy.util import minibatch, compounding
from spacy.lang.en import English

# Evaluation imports
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
random.seed(42)
np.random.seed(42)

## Configuration and Paths

In [2]:
# Data paths
DATA_ROOT = Path("../data")
INPUT_DATA_PATH = DATA_ROOT / "ner_data" / "spacy_ner_data.json"
MODELS_DIR = DATA_ROOT / "models"
MODELS_DIR.mkdir(exist_ok=True)

# Training configuration
CONFIG = {
    'train_split': 0.7,
    'val_split': 0.15,
    'test_split': 0.15,
    'n_iter': 30,
    'dropout': 0.2,
    'batch_size': 8,
    'learn_rate': 0.001,
    'early_stopping_patience': 5,
    'min_improvement': 0.001
}

print(f"Training data path: {INPUT_DATA_PATH}")
print(f"Models directory: {MODELS_DIR}")
print(f"Configuration: {CONFIG}")

Training data path: ../data/ner_data/spacy_ner_data.json
Models directory: ../data/models
Configuration: {'train_split': 0.7, 'val_split': 0.15, 'test_split': 0.15, 'n_iter': 30, 'dropout': 0.2, 'batch_size': 8, 'learn_rate': 0.001, 'early_stopping_patience': 5, 'min_improvement': 0.001}


## Data Loading and Preprocessing

In [3]:
def load_training_data(data_path: Path) -> List[Tuple[str, Dict]]:
    """Load spaCy training data from JSON file.
    
    Args:
        data_path: Path to the JSON file containing training data
        
    Returns:
        List of (text, annotations) tuples in spaCy format
    """
    with open(data_path, 'r', encoding='utf-8') as f:
        raw_data = json.load(f)
    
    training_examples = []
    
    for doc_data in raw_data:
        spacy_data = doc_data.get('spacy_ner_data', [])
        
        # spacy_ner_data contains alternating text and annotation dictionaries
        for i in range(0, len(spacy_data), 2):
            if i + 1 < len(spacy_data):
                text = spacy_data[i]
                annotations = spacy_data[i + 1]
                if isinstance(text, str) and isinstance(annotations, dict):
                    training_examples.append((text, annotations))
    
    return training_examples

def validate_training_data(training_data: List[Tuple[str, Dict]]) -> List[Tuple[str, Dict]]:
    """Validate and clean training data.
    
    Args:
        training_data: List of (text, annotations) tuples
        
    Returns:
        Cleaned list of valid training examples
    """
    valid_examples = []
    
    for text, annotations in training_data:
        if not text or not isinstance(text, str):
            continue
            
        entities = annotations.get('entities', [])
        valid_entities = []
        
        for entity in entities:
            if len(entity) == 3:
                start, end, label = entity
                # Validate entity boundaries
                if 0 <= start < end <= len(text):
                    valid_entities.append((start, end, label))
        
        if valid_entities:  # Only keep examples with valid entities
            valid_examples.append((text, {'entities': valid_entities}))
    
    return valid_examples

# Load and validate data
print("Loading training data...")
raw_training_data = load_training_data(INPUT_DATA_PATH)
training_data = validate_training_data(raw_training_data)

print(f"Loaded {len(raw_training_data)} raw examples")
print(f"After validation: {len(training_data)} valid examples")

# Data statistics
total_entities = sum(len(annotations['entities']) for _, annotations in training_data)
entity_labels = [label for _, annotations in training_data 
                for _, _, label in annotations['entities']]
label_counts = Counter(entity_labels)

print(f"Total entities: {total_entities}")
print(f"Label distribution: {dict(label_counts)}")

Loading training data...
Loaded 3283 raw examples
After validation: 3283 valid examples
Total entities: 6065
Label distribution: {'ASTRO_OBJ': 6065}


## Data Splitting Strategy

In [4]:
def create_stratified_split(training_data: List[Tuple[str, Dict]], 
                          train_ratio: float = 0.7, 
                          val_ratio: float = 0.15, 
                          test_ratio: float = 0.15) -> Tuple[List, List, List]:
    """Create stratified train/validation/test splits.
    
    Args:
        training_data: List of (text, annotations) tuples
        train_ratio: Proportion for training set
        val_ratio: Proportion for validation set
        test_ratio: Proportion for test set
        
    Returns:
        Tuple of (train_data, val_data, test_data)
    """
    # Create stratification key based on number of entities and entity types
    stratify_keys = []
    for text, annotations in training_data:
        entities = annotations['entities']
        num_entities = len(entities)
        unique_labels = set(label for _, _, label in entities)
        # Create a key that considers both count and variety of entities
        key = f"{min(num_entities, 5)}_{len(unique_labels)}"  # Cap at 5 for grouping
        stratify_keys.append(key)
    
    # First split: separate test set
    train_val_data, test_data, train_val_keys, _ = train_test_split(
        training_data, stratify_keys, 
        test_size=test_ratio, 
        random_state=42, 
        stratify=stratify_keys
    )
    
    # Second split: separate train and validation
    val_size = val_ratio / (train_ratio + val_ratio)
    train_data, val_data = train_test_split(
        train_val_data, 
        test_size=val_size, 
        random_state=42, 
        stratify=train_val_keys
    )
    
    return train_data, val_data, test_data

def print_split_statistics(train_data, val_data, test_data):
    """Print statistics for each data split."""
    splits = {'Train': train_data, 'Validation': val_data, 'Test': test_data}
    
    for split_name, data in splits.items():
        total_entities = sum(len(annotations['entities']) for _, annotations in data)
        entity_labels = [label for _, annotations in data 
                        for _, _, label in annotations['entities']]
        label_counts = Counter(entity_labels)
        
        print(f"\n{split_name} Set:")
        print(f"  Examples: {len(data)}")
        print(f"  Total entities: {total_entities}")
        print(f"  Avg entities per example: {total_entities/len(data):.2f}")
        print(f"  Label distribution: {dict(label_counts)}")

# Create splits
print("Creating train/validation/test splits...")
train_data, val_data, test_data = create_stratified_split(
    training_data, 
    CONFIG['train_split'], 
    CONFIG['val_split'], 
    CONFIG['test_split']
)

print_split_statistics(train_data, val_data, test_data)

Creating train/validation/test splits...

Train Set:
  Examples: 2297
  Total entities: 4206
  Avg entities per example: 1.83
  Label distribution: {'ASTRO_OBJ': 4206}

Validation Set:
  Examples: 493
  Total entities: 946
  Avg entities per example: 1.92
  Label distribution: {'ASTRO_OBJ': 946}

Test Set:
  Examples: 493
  Total entities: 913
  Avg entities per example: 1.85
  Label distribution: {'ASTRO_OBJ': 913}


## Model Training Infrastructure

In [5]:
def create_blank_model(labels: List[str]) -> spacy.Language:
    """Create a blank spaCy model with NER component.
    
    Args:
        labels: List of entity labels to recognize
        
    Returns:
        Blank spaCy model with NER component
    """
    nlp = English()
    
    # Create NER component
    ner = nlp.add_pipe("ner")
    
    # Add labels to NER component
    for label in labels:
        ner.add_label(label)
    
    return nlp

def convert_to_examples(nlp: spacy.Language, 
                       training_data: List[Tuple[str, Dict]]) -> List[Example]:
    """Convert training data to spaCy Example objects.
    
    Args:
        nlp: spaCy model
        training_data: List of (text, annotations) tuples
        
    Returns:
        List of spaCy Example objects
    """
    examples = []
    for text, annotations in training_data:
        doc = nlp.make_doc(text)
        example = Example.from_dict(doc, annotations)
        examples.append(example)
    return examples

def evaluate_model(nlp: spacy.Language, examples: List[Example]) -> Dict[str, float]:
    """Evaluate model performance on a set of examples.
    
    Args:
        nlp: Trained spaCy model
        examples: List of evaluation examples
        
    Returns:
        Dictionary of evaluation metrics
    """
    scores = nlp.evaluate(examples)
    return {
        'precision': scores['ents_p'],
        'recall': scores['ents_r'],
        'f1': scores['ents_f'],
        'accuracy': scores.get('token_acc', 0.0)
    }

class EarlyStopping:
    """Early stopping utility for training."""
    
    def __init__(self, patience: int = 5, min_improvement: float = 0.001):
        self.patience = patience
        self.min_improvement = min_improvement
        self.best_score = 0.0
        self.wait = 0
        self.stopped = False
    
    def __call__(self, score: float) -> bool:
        """Check if training should stop.
        
        Args:
            score: Current validation score
            
        Returns:
            True if training should stop
        """
        if score > self.best_score + self.min_improvement:
            self.best_score = score
            self.wait = 0
        else:
            self.wait += 1
            if self.wait >= self.patience:
                self.stopped = True
                return True
        return False

# Get unique labels from training data
all_labels = set()
for _, annotations in training_data:
    for _, _, label in annotations['entities']:
        all_labels.add(label)

print(f"Entity labels to train: {list(all_labels)}")

Entity labels to train: ['ASTRO_OBJ']


## Training Function with Configuration

In [6]:
def train_ner_model(train_data: List[Tuple[str, Dict]], 
                   val_data: List[Tuple[str, Dict]],
                   labels: List[str],
                   config: Dict[str, Any]) -> Tuple[spacy.Language, Dict[str, List[float]]]:
    """Train a spaCy NER model with the given configuration.
    
    Args:
        train_data: Training data
        val_data: Validation data
        labels: Entity labels
        config: Training configuration
        
    Returns:
        Tuple of (trained_model, training_history)
    """
    print(f"Training model with config: {config}")
    
    # Create model
    nlp = create_blank_model(labels)
    
    # Convert to examples
    train_examples = convert_to_examples(nlp, train_data)
    val_examples = convert_to_examples(nlp, val_data)
    
    # Initialize training
    nlp.initialize(lambda: train_examples)
    
    # Training history
    history = {
        'train_loss': [],
        'val_f1': [],
        'val_precision': [],
        'val_recall': []
    }
    
    # Early stopping
    early_stopping = EarlyStopping(
        patience=config['early_stopping_patience'],
        min_improvement=config['min_improvement']
    )
    
    print("\nStarting training...")
    print("Epoch | Train Loss | Val F1  | Val Prec | Val Rec")
    print("-" * 50)
    
    for epoch in range(config['n_iter']):
        # Shuffle training data
        random.shuffle(train_examples)
        
        # Training
        losses = {}
        batches = minibatch(train_examples, size=compounding(4.0, 32.0, 1.001))
        
        for batch in batches:
            nlp.update(
                batch,
                drop=config['dropout'],
                losses=losses,
                sgd=nlp.create_optimizer()
            )
        
        # Validation
        val_scores = evaluate_model(nlp, val_examples)
        
        # Record history
        history['train_loss'].append(losses.get('ner', 0.0))
        history['val_f1'].append(val_scores['f1'])
        history['val_precision'].append(val_scores['precision'])
        history['val_recall'].append(val_scores['recall'])
        
        # Print progress
        print(f"{epoch+1:5d} | {losses.get('ner', 0.0):10.4f} | "
              f"{val_scores['f1']:7.3f} | {val_scores['precision']:8.3f} | "
              f"{val_scores['recall']:7.3f}")
        
        # Early stopping check
        if early_stopping(val_scores['f1']):
            print(f"\nEarly stopping at epoch {epoch+1}")
            break
    
    return nlp, history

# Train baseline model
print("Training baseline model...")
baseline_model, baseline_history = train_ner_model(
    train_data, val_data, list(all_labels), CONFIG
)

Training baseline model...
Training model with config: {'train_split': 0.7, 'val_split': 0.15, 'test_split': 0.15, 'n_iter': 30, 'dropout': 0.2, 'batch_size': 8, 'learn_rate': 0.001, 'early_stopping_patience': 5, 'min_improvement': 0.001}

Starting training...
Epoch | Train Loss | Val F1  | Val Prec | Val Rec
--------------------------------------------------
    1 |  3609.8643 |   0.000 |    0.000 |   0.000
    2 |    80.6002 |   0.000 |    0.000 |   0.000
    3 |    69.4667 |   1.000 |    1.000 |   1.000
    4 |    59.0282 |   1.000 |    1.000 |   1.000
    5 |    52.4558 |   1.000 |    1.000 |   1.000
    6 |    32.8050 |   1.000 |    1.000 |   1.000
    7 |    31.8141 |   0.800 |    0.667 |   1.000
    8 |    25.6744 |   1.000 |    1.000 |   1.000

Early stopping at epoch 8


## Hyperparameter Optimization

In [7]:
def hyperparameter_search(train_data: List[Tuple[str, Dict]], 
                         val_data: List[Tuple[str, Dict]],
                         labels: List[str],
                         base_config: Dict[str, Any]) -> Tuple[Dict, Dict, List[Dict]]:
    """Perform hyperparameter optimization.
    
    Args:
        train_data: Training data
        val_data: Validation data
        labels: Entity labels
        base_config: Base configuration to modify
        
    Returns:
        Tuple of (best_config, best_scores, all_results)
    """
    # Parameter grid
    param_grid = {
        'dropout': [0.1, 0.2, 0.3, 0.4],
        'learn_rate': [0.0005, 0.001, 0.002, 0.005],
        'n_iter': [20, 30, 40]
    }
    
    best_f1 = 0.0
    best_config = None
    best_scores = None
    all_results = []
    
    print("Starting hyperparameter search...")
    print(f"Testing {len(param_grid['dropout']) * len(param_grid['learn_rate']) * len(param_grid['n_iter'])} combinations")
    
    experiment_num = 0
    
    for dropout in param_grid['dropout']:
        for learn_rate in param_grid['learn_rate']:
            for n_iter in param_grid['n_iter']:
                experiment_num += 1
                
                # Create config for this experiment
                current_config = base_config.copy()
                current_config.update({
                    'dropout': dropout,
                    'learn_rate': learn_rate,
                    'n_iter': n_iter
                })
                
                print(f"\n--- Experiment {experiment_num} ---")
                print(f"Dropout: {dropout}, LR: {learn_rate}, Iterations: {n_iter}")
                
                try:
                    # Train model
                    model, history = train_ner_model(train_data, val_data, labels, current_config)
                    
                    # Final validation score
                    val_examples = convert_to_examples(model, val_data)
                    final_scores = evaluate_model(model, val_examples)
                    
                    # Record results
                    result = {
                        'config': current_config.copy(),
                        'scores': final_scores,
                        'history': history
                    }
                    all_results.append(result)
                    
                    print(f"Final F1: {final_scores['f1']:.4f}")
                    
                    # Update best
                    if final_scores['f1'] > best_f1:
                        best_f1 = final_scores['f1']
                        best_config = current_config.copy()
                        best_scores = final_scores.copy()
                        print(f"*** New best F1: {best_f1:.4f} ***")
                
                except Exception as e:
                    print(f"Error in experiment {experiment_num}: {e}")
                    continue
    
    return best_config, best_scores, all_results

# Run hyperparameter search (reduced grid for demo)
print("\n" + "="*60)
print("HYPERPARAMETER OPTIMIZATION")
print("="*60)

# Use a smaller search space for demonstration
search_config = CONFIG.copy()
search_config.update({
    'early_stopping_patience': 3,  # Faster for search
    'n_iter': 15  # Will be overridden in search
})

best_config, best_scores, search_results = hyperparameter_search(
    train_data[:100],  # Use subset for faster search
    val_data[:50], 
    list(all_labels), 
    search_config
)

print(f"\n{'='*60}")
print("BEST CONFIGURATION FOUND:")
print(f"Config: {best_config}")
print(f"Scores: {best_scores}")
print(f"{'='*60}")


HYPERPARAMETER OPTIMIZATION
Starting hyperparameter search...
Testing 48 combinations

--- Experiment 1 ---
Dropout: 0.1, LR: 0.0005, Iterations: 20
Training model with config: {'train_split': 0.7, 'val_split': 0.15, 'test_split': 0.15, 'n_iter': 20, 'dropout': 0.1, 'batch_size': 8, 'learn_rate': 0.0005, 'early_stopping_patience': 3, 'min_improvement': 0.001}

Starting training...
Epoch | Train Loss | Val F1  | Val Prec | Val Rec
--------------------------------------------------
    1 |  4202.0308 |   0.000 |    0.000 |   0.000
    2 |     4.3548 |   0.000 |    0.000 |   0.000
    3 |     4.0000 |   0.000 |    0.000 |   0.000

Early stopping at epoch 3
Final F1: 0.0000

--- Experiment 2 ---
Dropout: 0.1, LR: 0.0005, Iterations: 30
Training model with config: {'train_split': 0.7, 'val_split': 0.15, 'test_split': 0.15, 'n_iter': 30, 'dropout': 0.1, 'batch_size': 8, 'learn_rate': 0.0005, 'early_stopping_patience': 3, 'min_improvement': 0.001}

Starting training...
Epoch | Train Loss | V

KeyboardInterrupt: 

## Final Model Training with Best Parameters

In [None]:
# Train final model with best configuration on full dataset
print("\nTraining final model with best configuration...")
final_model, final_history = train_ner_model(
    train_data, val_data, list(all_labels), best_config
)

# Save the final model
model_path = MODELS_DIR / "astronomical_ner_model"
final_model.to_disk(model_path)
print(f"\nModel saved to: {model_path}")

## Comprehensive Model Evaluation

In [None]:
def detailed_evaluation(nlp: spacy.Language, 
                       test_data: List[Tuple[str, Dict]]) -> Dict[str, Any]:
    """Perform detailed evaluation of the trained model.
    
    Args:
        nlp: Trained spaCy model
        test_data: Test dataset
        
    Returns:
        Dictionary containing detailed evaluation metrics
    """
    test_examples = convert_to_examples(nlp, test_data)
    
    # Basic metrics
    scores = evaluate_model(nlp, test_examples)
    
    # Entity-level analysis
    true_entities = []
    pred_entities = []
    
    for text, annotations in test_data:
        doc = nlp(text)
        
        # True entities
        true_ents = [(start, end, label) for start, end, label in annotations['entities']]
        true_entities.extend(true_ents)
        
        # Predicted entities
        pred_ents = [(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]
        pred_entities.extend(pred_ents)
    
    # Error analysis
    true_set = set(true_entities)
    pred_set = set(pred_entities)
    
    true_positives = len(true_set & pred_set)
    false_positives = len(pred_set - true_set)
    false_negatives = len(true_set - pred_set)
    
    # Calculate metrics manually for verification
    precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
    recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    
    return {
        'spacy_scores': scores,
        'manual_metrics': {
            'precision': precision,
            'recall': recall,
            'f1': f1
        },
        'counts': {
            'true_positives': true_positives,
            'false_positives': false_positives,
            'false_negatives': false_negatives,
            'total_true': len(true_entities),
            'total_pred': len(pred_entities)
        },
        'examples': {
            'false_positives': list(pred_set - true_set)[:10],
            'false_negatives': list(true_set - pred_set)[:10]
        }
    }

def test_model_predictions(nlp: spacy.Language, sample_texts: List[str]):
    """Test model on sample texts and display predictions.
    
    Args:
        nlp: Trained spaCy model
        sample_texts: List of sample texts to test
    """
    print("\nSample Predictions:")
    print("-" * 50)
    
    for i, text in enumerate(sample_texts, 1):
        doc = nlp(text)
        print(f"\nExample {i}:")
        print(f"Text: {text}")
        
        if doc.ents:
            print("Predicted entities:")
            for ent in doc.ents:
                print(f"  - '{ent.text}' ({ent.label_}) [{ent.start_char}-{ent.end_char}]")
        else:
            print("No entities predicted")

# Evaluate final model on test set
print("\n" + "="*60)
print("FINAL MODEL EVALUATION")
print("="*60)

evaluation_results = detailed_evaluation(final_model, test_data)

print("\nTest Set Performance:")
print(f"spaCy Precision: {evaluation_results['spacy_scores']['precision']:.4f}")
print(f"spaCy Recall:    {evaluation_results['spacy_scores']['recall']:.4f}")
print(f"spaCy F1-Score:  {evaluation_results['spacy_scores']['f1']:.4f}")

print(f"\nManual Calculation:")
print(f"Precision: {evaluation_results['manual_metrics']['precision']:.4f}")
print(f"Recall:    {evaluation_results['manual_metrics']['recall']:.4f}")
print(f"F1-Score:  {evaluation_results['manual_metrics']['f1']:.4f}")

print(f"\nEntity Counts:")
for key, value in evaluation_results['counts'].items():
    print(f"{key}: {value}")

print(f"\nError Analysis:")
print(f"Sample False Positives: {evaluation_results['examples']['false_positives'][:5]}")
print(f"Sample False Negatives: {evaluation_results['examples']['false_negatives'][:5]}")

# Test on sample texts
sample_texts = [
    "The Crab Nebula is a supernova remnant located in the constellation Taurus.",
    "Observations of NGC 4258 reveal a supermassive black hole at its center.",
    "The Hubble Space Telescope captured images of the Andromeda Galaxy.",
    "SN 2023ixf was discovered in the Pinwheel Galaxy M101.",
    "This paper discusses stellar formation in the Orion Nebula."
]

test_model_predictions(final_model, sample_texts)

## Training History Visualization and Analysis

In [None]:
def analyze_training_history(history: Dict[str, List[float]], title: str = "Training History"):
    """Analyze and display training history.
    
    Args:
        history: Training history dictionary
        title: Title for the analysis
    """
    print(f"\n{title}")
    print("-" * len(title))
    
    epochs = len(history['val_f1'])
    print(f"Total epochs: {epochs}")
    
    if epochs > 0:
        print(f"Final validation F1: {history['val_f1'][-1]:.4f}")
        print(f"Best validation F1: {max(history['val_f1']):.4f}")
        print(f"Final training loss: {history['train_loss'][-1]:.4f}")
        
        # Check for overfitting
        if epochs >= 5:
            recent_f1_trend = history['val_f1'][-5:]
            if len(recent_f1_trend) >= 2 and recent_f1_trend[-1] < recent_f1_trend[0]:
                print("⚠️  Potential overfitting detected (F1 declining)")
            else:
                print("✅ No obvious overfitting detected")

# Analyze baseline and final model training
analyze_training_history(baseline_history, "Baseline Model Training")
analyze_training_history(final_history, "Final Model Training")

# Create summary DataFrame of hyperparameter search results
if search_results:
    search_df_data = []
    for result in search_results:
        row = {
            'dropout': result['config']['dropout'],
            'learn_rate': result['config']['learn_rate'],
            'n_iter': result['config']['n_iter'],
            'final_f1': result['scores']['f1'],
            'final_precision': result['scores']['precision'],
            'final_recall': result['scores']['recall']
        }
        search_df_data.append(row)
    
    search_df = pd.DataFrame(search_df_data)
    search_df = search_df.sort_values('final_f1', ascending=False)
    
    print("\nHyperparameter Search Results (Top 5):")
    print(search_df.head().to_string(index=False, float_format='%.4f'))

print("\n" + "="*60)
print("TRAINING COMPLETE")
print("="*60)
print(f"Final model saved to: {model_path}")
print(f"Best configuration: {best_config}")
print(f"Test F1 Score: {evaluation_results['spacy_scores']['f1']:.4f}")

## Export Training Results and Model Metadata

In [None]:
# Save training results and metadata
results_data = {
    'model_info': {
        'model_path': str(model_path),
        'training_date': pd.Timestamp.now().isoformat(),
        'labels': list(all_labels),
        'total_training_examples': len(training_data)
    },
    'data_splits': {
        'train_size': len(train_data),
        'val_size': len(val_data), 
        'test_size': len(test_data)
    },
    'best_config': best_config,
    'test_performance': evaluation_results['spacy_scores'],
    'training_history': final_history,
    'hyperparameter_search': search_results if 'search_results' in locals() else None
}

results_path = MODELS_DIR / "training_results.json"
with open(results_path, 'w') as f:
    json.dump(results_data, f, indent=2, default=str)

print(f"Training results saved to: {results_path}")
print("\n🎉 ML Pipeline completed successfully!")

In [22]:
doc1 = baseline_model(val_data[0][0])

In [23]:
for entity in doc1.ents:
  print(entity.label_, ' | ', entity.text)

In [32]:
val_data[2][1]

{'entities': [(116, 123, 'ASTRO_OBJ')]}

In [34]:
val_data[2][0][116:123]

'NGC 652'