## Stage 2: Data Preparation Pipeline & Feature Extraction

Milestone: Implementation of the oracle, feature extraction, and obtaining training samples for a Keras model.

In this section, we execute the complete data processing pipeline to transform the raw CoNLL-U training data into numerical vectors that can be fed into the Neural Network. This process involves four main steps:


Data Loading & Filtering: We load the training dataset (en_partut-ud-train.conllu) and filter out non-projective trees, as the Arc-Eager algorithm is restricted to projective dependency structures.

Oracle Execution (Obtaining Samples): We run the Oracle on every valid sentence. The Oracle simulates the parsing process using the "gold standard" tree to generate the correct sequence of States (Input) and Transitions (Output/Target).


Feature Extraction: We convert the complex State objects into fixed-length lists of features using the state_to_feats function. This extracts the specific words and UPOS tags from the top of the Stack and the Buffer.


Numerical Conversion (Vectorization): Neural networks require numerical input. We build vocabularies (dictionaries mapping strings to unique Integer IDs) for words, tags, actions, and dependency labels. Finally, we convert all text features into Numpy arrays (X_train, y_act, y_dep) ready for Keras.

In [1]:
import numpy as np
from conllu_reader import ConlluReader
from algorithm import ArcEager
import pickle

# --- 1. LOAD DATA (Use the TRAIN file, not test) ---
print("--- STEP 1: Data Loading ---")
reader = ConlluReader()
# Ensure the filename matches your specific training file path
train_sentences = reader.read_conllu_file("en_partut-ud-train_clean.conllu") 

# Filter out non-projective trees as Arc-Eager cannot handle them [cite: 1100]
train_sentences = reader.remove_non_projective_trees(train_sentences)
print(f" Loaded {len(train_sentences)} valid projective sentences for training.\n")

# --- 2. OBTAIN RAW SAMPLES (Oracle Execution) ---
print("--- STEP 2: Generating Samples with the Oracle ---")
arc_eager = ArcEager()
raw_samples = []

for sent in train_sentences:
    try:
        # The oracle returns a list of Sample objects (State + Transition) for this sentence
        samples = arc_eager.oracle(sent)
        raw_samples.extend(samples)
    except AssertionError:
        # If the oracle fails to reconstruct the exact gold tree, skip the sentence
        continue

print(f"Total samples (game states) generated: {len(raw_samples)}")

# VISUALIZATION: Let's see what a raw sample looks like
if raw_samples:
    print(f"Example of Raw Sample (Index 0):")
    print(f"   State: {raw_samples[0].state}")
    print(f"   Correct Action: {raw_samples[0].transition}\n")

# --- 3. FEATURE EXTRACTION (From State to List of Strings) ---
# We need to extract features from the stack and buffer [cite: 934, 1080]
print("--- STEP 3: Feature Extraction (Translation to Text) ---")
X_raw = [] # Stores lists of words/tags (Input features)
Y_raw = [] # Stores actions and dependencies (Outputs)

for sample in raw_samples:
    # Extract features (words and UPOS tags) using the implemented function
    # nbuffer_feats=2 and nstack_feats=2 is the suggested configuration [cite: 1091]
    features = sample.state_to_feats(nbuffer_feats=2, nstack_feats=2)
    X_raw.append(features)
    
    # Save the action (transition) and the dependency label
    action_name = sample.transition.action
    dep_label = sample.transition.dependency
    Y_raw.append((action_name, dep_label))

# VISUALIZATION: What do the lists contain now?
print(f" Example of Input (X_raw[0]): {X_raw[0]}")
print(f"   (This is what the network 'sees': words and tags)")
print(f"Example of Output (Y_raw[0]): {Y_raw[0]}")
print(f"   (This is what the network must predict: Action and Label)\n")

# --- 4. PREPARATION FOR KERAS (Vocabularies and Numerical Conversion) ---
# Neural networks require numerical input [cite: 733]
print("--- STEP 4: Numerical Conversion (For Keras) ---")

# 4.1 Create Dictionaries (Text -> Number Maps)
words_vocab = {'<PAD>': 0, '<UNK>': 1}
upos_vocab = {'<PAD>': 0, '<UNK>': 1}
actions_vocab = {}  # E.g., 'SHIFT': 0, 'LEFT-ARC': 1...
deprels_vocab = {None: 0} # E.g., 'nsubj': 1, 'det': 2...

# Fill vocabularies by iterating through all collected data
for features in X_raw:
    # Assuming features structure: [W_s2, W_s1, W_b1, W_b2, P_s2, P_s1, P_b1, P_b2]
    # The first half are words, the second half are UPOS tags
    num_words = len(features) // 2 
    
    words = features[:num_words]
    upos = features[num_words:]
    
    for w in words:
        if w not in words_vocab:
            words_vocab[w] = len(words_vocab)
    for u in upos:
        if u not in upos_vocab:
            upos_vocab[u] = len(upos_vocab)

for act, dep in Y_raw:
    if act not in actions_vocab:
        actions_vocab[act] = len(actions_vocab)
    if dep not in deprels_vocab:
        deprels_vocab[dep] = len(deprels_vocab)

print(f"Vocabulary Sizes:")
print(f"   Unique words: {len(words_vocab)}")
print(f"   Unique UPOS tags: {len(upos_vocab)}")
print(f"   Possible actions: {len(actions_vocab)} {actions_vocab}")
print(f"   Dependency relations: {len(deprels_vocab)}\n")

# 4.2 Convert everything to Numbers (Matrices for Keras)
# X_train will have shape (Num_Samples, Num_Features)
X_train_numerical = []
Y_train_actions = []
Y_train_deprels = []

for i in range(len(X_raw)):
    # Convert INPUT (Features)
    features = X_raw[i]
    num_vec = []
    
    # Convert words to IDs
    num_words = len(features) // 2
    for w in features[:num_words]:
        num_vec.append(words_vocab.get(w, words_vocab['<UNK>']))
    # Convert UPOS tags to IDs
    for u in features[num_words:]:
        num_vec.append(upos_vocab.get(u, upos_vocab['<UNK>']))
    
    X_train_numerical.append(num_vec)
    
    # Convert OUTPUT (Targets)
    act, dep = Y_raw[i]
    Y_train_actions.append(actions_vocab[act])
    # Use 0 if the dependency is None (e.g., for SHIFT or REDUCE)
    Y_train_deprels.append(deprels_vocab.get(dep, 0)) 

# Convert to Numpy arrays (The actual input format Keras expects)
X_train = np.array(X_train_numerical)
y_act = np.array(Y_train_actions)
y_dep = np.array(Y_train_deprels)

print("DATA")
print(f"Final numerical example (X_train[0]): {X_train[0]}")
print(f"   (Notice how words are now IDs)")
# Find the action name corresponding to the ID for display purposes
act_name = list(actions_vocab.keys())[list(actions_vocab.values()).index(y_act[0])]
print(f"Target Action (y_act[0]): {y_act[0]} -> Corresponds to '{act_name}'")

print(f"Target Action (y_act[0]): {y_act[0]} -> Corresponds to '{act_name}'")
np.savez("training_data.npz", X=X_train, y_act=y_act, y_dep=y_dep)
with open("vocabs.pkl", "wb") as f:
    pickle.dump((words_vocab, upos_vocab, actions_vocab, deprels_vocab), f)
print("Data saved to 'training_data.npz' and 'vocabs.pkl'")

--- STEP 1: Data Loading ---
 Loaded 1748 valid projective sentences for training.

--- STEP 2: Generating Samples with the Oracle ---
Total samples (game states) generated: 81182
Example of Raw Sample (Index 0):
   State: Stack (size=1): (0, ROOT, ROOT_UPOS)
Buffer (size=13): (1, Distribution, NOUN) | (2, of, ADP) | (3, this, DET) | (4, license, NOUN) | (5, does, AUX) | (6, not, PART) | (7, create, VERB) | (8, an, DET) | (9, attorney, NOUN) | (10, -, PUNCT) | (11, client, NOUN) | (12, relationship, NOUN) | (13, ., PUNCT)
Arcs (size=0): set()

   Correct Action: SHIFT

--- STEP 3: Feature Extraction (Translation to Text) ---
 Example of Input (X_raw[0]): ['<PAD>', 'ROOT', 'Distribution', 'of', '<PAD>', 'ROOT_UPOS', 'NOUN', 'ADP']
   (This is what the network 'sees': words and tags)
Example of Output (Y_raw[0]): ('SHIFT', None)
   (This is what the network must predict: Action and Label)

--- STEP 4: Numerical Conversion (For Keras) ---
Vocabulary Sizes:
   Unique words: 6872
   Unique 

## Verificaion del oracle que esta haciendo bien su trabajo


In [4]:
print("\n--- VERIFICACIÓN: Oracle vs Gold Standard + Input Red Neuronal ---")

# 1. Seleccionamos una oración de ejemplo
example_sent = train_sentences[0]
print(f"Oración: {[t.form for t in example_sent]}")

# 2. Obtenemos las transiciones del oráculo
try:
    oracle_samples = arc_eager.oracle(example_sent)
except AssertionError as e:
    print(f"El oráculo falló en esta oración: {e}")
else:
    # 3. Simulamos el parseo paso a paso
    # CORRECCIÓN: Usamos el método correcto 'create_initial_state'
    config = arc_eager.create_initial_state(example_sent)
    
    print(f"{'Paso':<4} | {'Pila (Stack)':<25} | {'Búfer (Buffer)':<25} | {'Acción Real':<15} | {'Input para Red Neuronal (Features)'}")
    print("-" * 120)

    for i, sample in enumerate(oracle_samples):
        # Preparamos visualización del estado
        stack_str = str([t.form for t in config.S])
        buffer_str = str([t.form for t in config.B[:2]]) + "..." # Solo los primeros 2 del buffer
        
        # Acción tomada
        action_str = str(sample.transition)
        
        # --- ESTO ES EL INPUT DE LA RED NEURONAL ---
        # Usamos el método state_to_feats que ya tiene tu clase Sample.
        # Esto extrae las palabras y tags de la Pila y el Búfer.
        nn_input = sample.state_to_feats(nbuffer_feats=2, nstack_feats=2)
        
        # Imprimimos la fila
        # stack_str[-25:] corta el string si es muy largo para que quepa
        print(f"{i:<4} | {stack_str[-25:]:<25} | {buffer_str:<25} | {action_str:<15} | {nn_input}")

        # Avanzamos la simulación aplicando la transición
        arc_eager.apply_transition(config, sample.transition)-

    # 4. Comparación Final
    print("-" * 120)
    print("Comparación de Arcos (Dependencias):")
    
    # Usamos el método gold_arcs que ya tienes en algorithm.py
    gold_arcs = arc_eager.gold_arcs(example_sent)
    generated_arcs = config.A # Los arcos que generó nuestra simulación
    
    print(f"Total arcos Gold (Reales): {len(gold_arcs)}")
    print(f"Total arcos Generados: {len(generated_arcs)}")
    
    if gold_arcs == generated_arcs:
        print("\n✅ ¡ÉXITO! El oráculo reconstruyó el árbol perfectamente.")
        print("Los inputs mostrados arriba son correctos para entrenar la red.")
    else:
        print("\n❌ ERROR: Los árboles no coinciden.")
        print("Arcos faltantes:", gold_arcs - generated_arcs)
        print("Arcos sobrantes:", generated_arcs - gold_arcs)


--- VERIFICACIÓN: Oracle vs Gold Standard + Input Red Neuronal ---
Oración: ['ROOT', 'Distribution', 'of', 'this', 'license', 'does', 'not', 'create', 'an', 'attorney', '-', 'client', 'relationship', '.']
Paso | Pila (Stack)              | Búfer (Buffer)            | Acción Real     | Input para Red Neuronal (Features)
------------------------------------------------------------------------------------------------------------------------
0    | ['ROOT']                  | ['Distribution', 'of']... | SHIFT           | ['<PAD>', 'ROOT', 'Distribution', 'of', '<PAD>', 'ROOT_UPOS', 'NOUN', 'ADP']
1    | ['ROOT', 'Distribution']  | ['of', 'this']...         | SHIFT           | ['ROOT', 'Distribution', 'of', 'this', 'ROOT_UPOS', 'NOUN', 'ADP', 'DET']
2    | T', 'Distribution', 'of'] | ['this', 'license']...    | SHIFT           | ['Distribution', 'of', 'this', 'license', 'NOUN', 'ADP', 'DET', 'NOUN']
3    | tribution', 'of', 'this'] | ['license', 'does']...    | LEFT-ARC-det    | ['of', 'th

In [5]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import pickle

# --- 1. LOAD PREPARED DATA ---
print("--- STEP 1: Loading Training Data and Vocabularies ---")

try:
    data = np.load("training_data.npz")
    X_train_full = data['X']      # Shape: (Num_Samples, 8) -> 4 words + 4 tags
    y_train_act_full = data['y_act']   # Shape: (Num_Samples,) -> Action IDs
    y_train_dep_full = data['y_dep']   # Shape: (Num_Samples,) -> Dependency IDs
except FileNotFoundError:
    print("Error: 'training_data.npz' not found.")
    exit()

# Load vocabularies
try:
    with open("vocabs.pkl", "rb") as f:
        words_vocab, upos_vocab, actions_vocab, deprels_vocab = pickle.load(f)
except FileNotFoundError:
    print("Error: 'vocabs.pkl' not found.")
    exit()

# --- Data Splitting ---
split_idx = int(len(X_train_full) * 0.9)

# Inputs
X_train, X_val = X_train_full[:split_idx], X_train_full[split_idx:]

# Outputs (We need TWO sets of targets now)
y_train_act, y_val_act = y_train_act_full[:split_idx], y_train_act_full[split_idx:]
y_train_dep, y_val_dep = y_train_dep_full[:split_idx], y_train_dep_full[split_idx:]

print(f"Training samples: {len(X_train)}")
print(f"Validation samples: {len(X_val)}")

# --- 2. SEPARATE INPUTS (Words vs Tags) ---
# X contains [Word_S2, Word_S1, Word_B1, Word_B2, Tag_S2, Tag_S1, Tag_B1, Tag_B2]
num_features_total = X_train.shape[1]
num_word_feats = num_features_total // 2 

X_train_words = X_train[:, :num_word_feats]
X_train_tags  = X_train[:, num_word_feats:]

X_val_words = X_val[:, :num_word_feats]
X_val_tags  = X_val[:, num_word_feats:]

--- STEP 1: Loading Training Data and Vocabularies ---
Training samples: 73063
Validation samples: 8119


In [6]:
# --- 3. DEFINE HYPERPARAMETERS ---
WORD_EMBED_DIM = 32
POS_EMBED_DIM = 10
HIDDEN_UNITS = 100

NUM_WORDS = len(words_vocab) + 1
NUM_TAGS = len(upos_vocab) + 1
NUM_ACTIONS = len(actions_vocab)  # Output 1 size (e.g., 4: SHIFT, REDUCE, LA, RA)
NUM_DEPRELS = len(deprels_vocab)  # Output 2 size (e.g., 44 dependency labels)

print(f"Output 1 (Actions): {NUM_ACTIONS} classes")
print(f"Output 2 (Labels): {NUM_DEPRELS} classes")


# --- 4. BUILD THE MODEL (Multi-Output) ---
print("--- STEP 2: Building Multi-Output Neural Network ---")

# A. Input Layers
input_words = layers.Input(shape=(num_word_feats,), name="input_words")
input_tags  = layers.Input(shape=(num_word_feats,), name="input_tags")

# B. Embedding Layers
embed_words = layers.Embedding(input_dim=NUM_WORDS, output_dim=WORD_EMBED_DIM, name="embed_words")(input_words)
embed_tags  = layers.Embedding(input_dim=NUM_TAGS, output_dim=POS_EMBED_DIM, name="embed_tags")(input_tags)

# C. Flatten & Concatenate
flat_words = layers.Flatten()(embed_words)
flat_tags  = layers.Flatten()(embed_tags)
merged = layers.Concatenate(name="concat_features")([flat_words, flat_tags])

# D. Shared Hidden Layers
# This layer learns features relevant for BOTH tasks (action and label prediction)
hidden = layers.Dense(HIDDEN_UNITS, activation='relu', name="hidden_shared")(merged)
hidden = layers.Dropout(0.2)(hidden)

# E. Output Layers (The Two Heads)
# Head 1: Predicts the transition action (SHIFT, REDUCE, etc.)
output_action = layers.Dense(NUM_ACTIONS, activation='softmax', name="action_output")(hidden)

# Head 2: Predicts the dependency label (nsubj, det, etc.)
output_label = layers.Dense(NUM_DEPRELS, activation='softmax', name="label_output")(hidden)

# Create Model with 2 inputs and 2 outputs
model = models.Model(
    inputs=[input_words, input_tags], 
    outputs=[output_action, output_label], 
    name="ArcEager_MultiOutput_Parser"
)

model.summary()

Output 1 (Actions): 4 classes
Output 2 (Labels): 44 classes
--- STEP 2: Building Multi-Output Neural Network ---


2025-11-25 23:19:37.202687: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [8]:
# --- 5. COMPILE AND TRAIN ---
print("--- STEP 3: Training the Model ---")

model.compile(
    optimizer='adam',
    # We define a loss function for EACH output layer (by name or order)
    loss={
        "action_output": "sparse_categorical_crossentropy",
        "label_output": "sparse_categorical_crossentropy"
    },
    # We calculate accuracy for each output separately
    metrics={
        "action_output": ["accuracy"],
        "label_output": ["accuracy"]
    },
    # Optional: Weigh the losses. Maybe action is more critical than label?
    # loss_weights={"action_output": 1.0, "label_output": 1.0} 
)

# Train the model
# Note: 'y' is now a LIST of targets [actions, labels] corresponding to the outputs
history = model.fit(
    x=[X_train_words, X_train_tags],
    y=[y_train_act, y_train_dep],  
    epochs=10,
    batch_size=32,
    validation_data=([X_val_words, X_val_tags], [y_val_act, y_val_dep]),
    verbose=1
)

# --- 6. SAVE THE MODEL ---
print("--- STEP 4: Saving Model ---")
model.save("parser_model_multi.keras")
print("Model saved to 'parser_model_multi.keras'")

--- STEP 3: Training the Model ---
Epoch 1/10
[1m2284/2284[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 4ms/step - action_output_accuracy: 0.8223 - action_output_loss: 0.4640 - label_output_accuracy: 0.7633 - label_output_loss: 0.8593 - loss: 1.3231 - val_action_output_accuracy: 0.8671 - val_action_output_loss: 0.3514 - val_label_output_accuracy: 0.8316 - val_label_output_loss: 0.4959 - val_loss: 0.8480
Epoch 2/10
[1m2284/2284[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - action_output_accuracy: 0.8958 - action_output_loss: 0.2802 - label_output_accuracy: 0.8608 - label_output_loss: 0.4267 - loss: 0.7069 - val_action_output_accuracy: 0.8632 - val_action_output_loss: 0.3559 - val_label_output_accuracy: 0.8436 - val_label_output_loss: 0.4461 - val_loss: 0.8027
Epoch 3/10
[1m2284/2284[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - action_output_accuracy: 0.9245 - action_output_loss: 0.2079 - label_output_accuracy: 0.8957 - label_output_los

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers
from tensorflow.keras.callbacks import EarlyStopping

def build_and_train_parser(
    # Input Data
    X_train_words, X_train_tags, y_train_act, y_train_dep,
    X_val_words, X_val_tags, y_val_act, y_val_dep,
    # Fixed Dimensions (Vocabularies)
    num_words, num_tags, num_actions, num_deprels,
    # Hyperparameters (Variables)
    word_embed_dim=32,
    pos_embed_dim=10,
    hidden_units=100,
    learning_rate=0.001,
    dropout_rate=0.2,
    batch_size=32,
    epochs=20,
    model_name="Parser_Model"
):
    """
    Builds, compiles, and trains a multi-output neural network for dependency parsing.
    """
    
    print(f"\n{'='*60}")
    print(f"TRAINING MODEL: {model_name}")
    print(f"Params: WordEmb={word_embed_dim}, PosEmb={pos_embed_dim}, Hidden={hidden_units}, LR={learning_rate}, Drop={dropout_rate}, Batch={batch_size}")
    print(f"{'='*60}\n")

    # --- 1. Architecture ---
    
    # Input Layers
    # Shape is determined by the number of features selected (e.g., 2 stack + 2 buffer = 4)
    input_words = layers.Input(shape=(X_train_words.shape[1],), name="input_words")
    input_tags  = layers.Input(shape=(X_train_tags.shape[1],), name="input_tags")

    # Embedding Layers
    # Transforms integer IDs into dense vectors
    embed_words = layers.Embedding(input_dim=num_words, output_dim=word_embed_dim, name="embed_words")(input_words)
    embed_tags  = layers.Embedding(input_dim=num_tags, output_dim=pos_embed_dim, name="embed_tags")(input_tags)

    # Flattening
    # Converts (batch, seq_len, emb_dim) to (batch, seq_len * emb_dim)
    flat_words = layers.Flatten(name="flatten_words")(embed_words)
    flat_tags  = layers.Flatten(name="flatten_tags")(embed_tags)

    # Concatenation
    # Merges word and tag features into a single vector
    merged = layers.Concatenate(name="concat_features")([flat_words, flat_tags])

    # Shared Hidden Layer
    # Learns representation useful for both tasks
    hidden = layers.Dense(hidden_units, activation='relu', name="hidden_shared")(merged)
    
    # Dropout for regularization
    if dropout_rate > 0:
        hidden = layers.Dropout(dropout_rate, name="dropout")(hidden)

    # Output Layers (Two Heads)
    # 1. Predicts the transition action (SHIFT, REDUCE, LEFT-ARC, RIGHT-ARC)
    output_action = layers.Dense(num_actions, activation='softmax', name="action_output")(hidden)
    # 2. Predicts the dependency label (nsubj, det, root, etc.)
    output_label = layers.Dense(num_deprels, activation='softmax', name="label_output")(hidden)

    # Create the Model
    model = models.Model(
        inputs=[input_words, input_tags], 
        outputs=[output_action, output_label], 
        name=model_name
    )

    # Print Model Summary (Architecture and Parameters)
    model.summary()

    # --- 2. Compilation ---
    model.compile(
        optimizer=optimizers.Adam(learning_rate=learning_rate),
        loss={
            "action_output": "sparse_categorical_crossentropy",
            "label_output": "sparse_categorical_crossentropy"
        },
        metrics={
            "action_output": ["accuracy"],
            "label_output": ["accuracy"]
        }
    )

    # --- 3. Callbacks ---
    # Early Stopping configuration as requested:
    # Monitor: 'val_action_output_accuracy' (Accuracy of the action prediction on validation set)
    # Mode: 'max' (because we want accuracy to increase)
    early_stopping = EarlyStopping(
        monitor='val_action_output_accuracy', 
        mode='max',
        patience=3,
        restore_best_weights=True,
        verbose=1
    )

    # --- 4. Training ---
    print("\nStarting Training...")
    history = model.fit(
        x=[X_train_words, X_train_tags],
        y=[y_train_act, y_train_dep],
        epochs=epochs,
        batch_size=batch_size,
        validation_data=([X_val_words, X_val_tags], [y_val_act, y_val_dep]),
        callbacks=[early_stopping],
        verbose=1 # Ensures the epoch logs (Epoch 1/10...) are printed
    )
    
    print(f"--- Training Finished for {model_name} ---")
    return model, history

# --- HYPERPARAMETER GRID DEFINITION ---
# Expanded grid to test various configurations
hyperparameter_grid = [
    {
        "word_embed_dim": 32, "pos_embed_dim": 10, "hidden_units": 100, 
        "learning_rate": 0.001, "batch_size": 32, "dropout_rate": 0.2,
        "model_name": "Base_Model"
    },
    {
        "word_embed_dim": 64, "pos_embed_dim": 20, "hidden_units": 200, 
        "learning_rate": 0.001, "batch_size": 64, "dropout_rate": 0.3,
        "model_name": "Large_Embeddings_HigherDrop"
    },
    {
        "word_embed_dim": 32, "pos_embed_dim": 10, "hidden_units": 100, 
        "learning_rate": 0.0005, "batch_size": 32, "dropout_rate": 0.2,
        "model_name": "Base_SlowLR" # Slower learning rate for stability
    },
    {
        "word_embed_dim": 16, "pos_embed_dim": 5, "hidden_units": 50, 
        "learning_rate": 0.001, "batch_size": 128, "dropout_rate": 0.1,
        "model_name": "Small_Fast_Model" # Smaller model, larger batch size
    },
    {
        "word_embed_dim": 64, "pos_embed_dim": 10, "hidden_units": 150, 
        "learning_rate": 0.001, "batch_size": 32, "dropout_rate": 0.4,
        "model_name": "High_Dropout_Regularization" # Heavy regularization
    },
    {
        "word_embed_dim": 32, "pos_embed_dim": 10, "hidden_units": 300, 
        "learning_rate": 0.001, "batch_size": 64, "dropout_rate": 0.2,
        "model_name": "Wide_Hidden_Layer" # Large hidden layer capacity
    }
]

# --- EXECUTION LOOP ---

all_histories = {}
best_val_accuracy = 0.0
best_model = None
best_model_name = ""

print(f"Starting Hyperparameter Search over {len(hyperparameter_grid)} models...")

for params in hyperparameter_grid:
    
    # Call the function with the current parameters
    # Assumes X_train_words, NUM_WORDS, etc., are already defined in the notebook context
    model, history = build_and_train_parser(
        X_train_words, X_train_tags, y_train_act, y_train_dep,
        X_val_words, X_val_tags, y_val_act, y_val_dep,
        NUM_WORDS, NUM_TAGS, NUM_ACTIONS, NUM_DEPRELS,
        word_embed_dim=params["word_embed_dim"],
        pos_embed_dim=params["pos_embed_dim"],
        hidden_units=params["hidden_units"],
        learning_rate=params["learning_rate"],
        dropout_rate=params["dropout_rate"],
        batch_size=params["batch_size"],
        epochs=15, # Set max epochs (EarlyStopping will likely cut this shorter)
        model_name=params["model_name"]
    )
    
    # Store history
    all_histories[params["model_name"]] = history.history
    
    # Evaluate performance
    # We check the best validation accuracy for the action output achieved during training
    best_epoch_acc = max(history.history['val_action_output_accuracy'])
    print(f"Result {params['model_name']}: Best Validation Action Accuracy = {best_epoch_acc:.4f}")
    
    # Track the global best model
    if best_epoch_acc > best_val_accuracy:
        print(f" >> New Best Model Found! (Previous best: {best_val_accuracy:.4f})")
        best_val_accuracy = best_epoch_acc
        best_model = model
        best_model_name = params["model_name"]

print(f"\n{'='*60}")
print(f"SEARCH COMPLETE")
print(f"Best Model: '{best_model_name}' with Action Accuracy: {best_val_accuracy:.4f}")
print(f"{'='*60}\n")

# --- SAVE BEST MODEL ---
if best_model:
    save_filename = f"{best_model_name}_best.keras"
    print(f"Saving best model to: {save_filename}")
    best_model.save(save_filename)

2025-11-25 23:19:03.379476: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-11-25 23:19:03.379799: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-11-25 23:19:03.420462: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-25 23:19:04.427824: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off,

1
2


In [7]:
# --- DEFINICIÓN DE LA PARRILLA DE HIPERPARÁMETROS ---
# Puedes añadir o quitar diccionarios para probar más cosas
hyperparameter_grid = [
    {
        "word_embed_dim": 32, "pos_embed_dim": 10, "hidden_units": 100, 
        "learning_rate": 0.001, "batch_size": 32, "dropout_rate": 0.2,
        "model_name": "Base_Model"
    },
    {
        "word_embed_dim": 64, "pos_embed_dim": 20, "hidden_units": 200, 
        "learning_rate": 0.001, "batch_size": 64, "dropout_rate": 0.3,
        "model_name": "Large_Model_HigherDrop"
    },
    {
        "word_embed_dim": 32, "pos_embed_dim": 10, "hidden_units": 100, 
        "learning_rate": 0.0005, "batch_size": 32, "dropout_rate": 0.2,
        "model_name": "Base_SlowLR" # Tasa de aprendizaje más lenta
    },
    {
        "word_embed_dim": 16, "pos_embed_dim": 5, "hidden_units": 50, 
        "learning_rate": 0.001, "batch_size": 128, "dropout_rate": 0.1,
        "model_name": "Small_Fast_Model"
    }
]

# --- VARIABLES PARA GUARDAR RESULTADOS ---
all_histories = {}
best_val_loss = float('inf') # Buscamos minimizar la pérdida
best_model = None
best_model_name = ""

# --- BUCLE DE ENTRENAMIENTO ---
print("--- STARTING HYPERPARAMETER SEARCH ---")

for params in hyperparameter_grid:
    
    # Llamamos a la función con los parámetros actuales
    model, history = build_and_train_parser(
        X_train_words, X_train_tags, y_train_act, y_train_dep,
        X_val_words, X_val_tags, y_val_act, y_val_dep,
        NUM_WORDS, NUM_TAGS, NUM_ACTIONS, NUM_DEPRELS,
        word_embed_dim=params["word_embed_dim"],
        pos_embed_dim=params["pos_embed_dim"],
        hidden_units=params["hidden_units"],
        learning_rate=params["learning_rate"],
        dropout_rate=params["dropout_rate"],
        batch_size=params["batch_size"],
        epochs=20, # Un máximo razonable, EarlyStopping cortará antes
        model_name=params["model_name"]
    )
    
    # Guardamos el historial
    all_histories[params["model_name"]] = history.history
    
    # Obtenemos la mejor pérdida de validación de este modelo
    # 'val_loss' es la suma de las pérdidas de acción y etiqueta
    final_val_loss = min(history.history['val_loss'])
    final_act_acc = max(history.history['val_action_output_accuracy'])
    
    print(f"Result {params['model_name']}: Val Loss={final_val_loss:.4f}, Best Action Acc={final_act_acc:.4f}")
    
    # Comprobamos si es el mejor hasta ahora
    if final_val_loss < best_val_loss:
        print(f" >> New Best Model Found! (Previous best loss: {best_val_loss:.4f})")
        best_val_loss = final_val_loss
        best_model = model
        best_model_name = params["model_name"]

print(f"\n--- SEARCH COMPLETE ---")
print(f"Best Model: '{best_model_name}' with Val Loss: {best_val_loss:.4f}")

# --- GUARDAR EL MEJOR MODELO ---
if best_model:
    save_filename = f"{best_model_name}_best.keras"
    print(f"Saving best model to: {save_filename}")
    best_model.save(save_filename)

--- STARTING HYPERPARAMETER SEARCH ---

--- Building Model: Base_Model ---
Params: WordEmb=32, PosEmb=10, Hidden=100, LR=0.001, Drop=0.2
Epoch 5: early stopping
Restoring model weights from the end of the best epoch: 2.
--- Training Finished for Base_Model ---
Result Base_Model: Val Loss=0.7925, Best Action Acc=0.8691
 >> New Best Model Found! (Previous best loss: inf)

--- Building Model: Large_Model_HigherDrop ---
Params: WordEmb=64, PosEmb=20, Hidden=200, LR=0.001, Drop=0.3
Epoch 5: early stopping
Restoring model weights from the end of the best epoch: 2.
--- Training Finished for Large_Model_HigherDrop ---
Result Large_Model_HigherDrop: Val Loss=0.7610, Best Action Acc=0.8696
 >> New Best Model Found! (Previous best loss: 0.7925)

--- Building Model: Base_SlowLR ---
Params: WordEmb=32, PosEmb=10, Hidden=100, LR=0.0005, Drop=0.2
Epoch 6: early stopping
Restoring model weights from the end of the best epoch: 3.
--- Training Finished for Base_SlowLR ---
Result Base_SlowLR: Val Loss=0.