Conversation
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
…assages, derive from dataloader Remove redundant top-level train_n_passages and eval_negative_size from YAML configs and recipe __init__. The recipe now derives train_n_passages and val_n_passages directly from the dataloader dataset config (n_passages), making the dataset config the single source of truth. Also removes dead train_n_passages param from CrossEncoderCollator and unused temperature from cross-encoder config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for all the useful comments @rnyak . I overlooked these yaml details initially. I've now fixed them. Please take a look when possible. |
…ection Drop custom COVERAGE_ARGS (--data-file, --source, --parallel-mode) that prevented coverage data from being collected. Matches the pattern used by all other multi-GPU functional tests (L2_DCP, L2_HF_PEFT, etc.). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: adil-a <adil.asif2000@hotmail.com>
|
/ok to test 3bb2f62 |
|
/claude review |
| ) | ||
|
|
||
| if num_labels is not None: | ||
| batch_dict["labels"] = torch.zeros(num_labels, dtype=torch.long) |
There was a problem hiding this comment.
Potential bug: num_labels is read from features[0].get("num_labels"), which reflects the number of unique queries in a single dataset transform call (typically 1 when set_transform processes individual rows). But the labels tensor needs to have size equal to the number of query groups in the entire collated batch (i.e. len(features) // n_passages).
If a DataLoader batch has 8 flat (query, doc) pairs from 4 queries with n_passages=2, num_labels here would be 1 (from a single row's transform), producing labels = torch.zeros(1) instead of torch.zeros(4). This would cause a shape mismatch at F.cross_entropy(logits.view(-1, n_passages), labels) in the training recipe.
This might be masked by how HuggingFace set_transform + __getitems__ batches data in your environment, but it's fragile. Consider computing labels from the actual batch size:
n_queries = len(features) // n_passages # where n_passages comes from config
labels = torch.zeros(n_queries, dtype=torch.long)
Summary
Adds cross-encoder (reranker) training support alongside the existing bi-encoder pipeline. Cross-encoders jointly attend to query-document pairs, producing relevance scores that are significantly more accurate than bi-encoder similarity — a critical capability for retrieval reranking.
Cross-Encoder Model & Training Recipe
NeMoAutoModelCrossEncoder— New auto-model class with full infrastructure support (FSDP2, PEFT, kernel patching)CrossEncoderModel— Wraps anyForSequenceClassificationbackbone. Routes throughLlamaBidirectionalForSequenceClassificationfor Llama models, falls back to HFAutoModelForSequenceClassificationfor all othersTrainCrossEncoderRecipe— ExtendsTrainBiEncoderRecipewith cross-entropy loss on reshaped logits, training accuracy tracking, and validation with accuracy@1 and MRR metricsaccuracy()/batch_mrr()— Pure-function metric utilities for training and validationData Pipeline
CrossEncoderCollator— Concatenates query+passage via configurable prompt template, tokenizes, pads. Compatible withNeMoAutoTokenizerflatten_bi_encoder_to_cross_encoder()— Transforms grouped bi-encoder data to flattened cross-encoder format. Same data files work for both model types viamodel_typeconfig switchmake_retrieval_dataset(model_type="cross_encoder")— Unified dataset factory supporting both bi-encoder and cross-encoder formatsArchitecture Simplification
lm_q/lm_p) design withshare_encoder— single backbone is simpler and aligns with modern embedding modelsEncoderStateDictAdapter(wasBiencoderStateDictAdapter) — symmetricmodel.prefix strip/addAutoModel/AutoModelForSequenceClassificationfallback means users are not limited to custom bidirectional modelsBiencoder->BiEncoder,train_n_passages->n_passagesCI & Testing
accuracy(),batch_mrr(),CrossEncoderCollator(output shapes, labels, padding),flatten_bi_encoder_to_cross_encoder()(value validation), liger/SDPA retry for cross-encoderL2_RetrievalCI job added tocicd-main.ymlExample
Test plan
ruff checkclean onnemo_automodel/L2_Retrievaljob passes