SPARKNLP-1000: Fix No Operation named [init_all_tables] for GPT2 #14177

DevinTDHa · 2024-02-17T14:09:18Z

Description

This PR fixes java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph when model needs to be deserialized (e.g. broadcast to worker nodes). The deserialization is skipped when the model is already loaded (so the issue will only appear on the worker nodes and not the driver)

GPT2 does not contain tables and so does not require this command. Moreover, this is a legacy feature of TF 1.x models.

Motivation and Context

Resolves #14110.

How Has This Been Tested?

Tests for GPT2 are passing on a cluster.

Types of changes

Bug fix (non-breaking change which fixes an issue)
Code improvements with no or little impact
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING page.
I have added tests to cover my changes.
All new and existing tests passed.

Fixes `java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph` when model needs to be deserialized. The deserialization is skipped when the modelis already loaded (so it will only appear on the worker nodes and not the driver) GPT2 does not contain tables and so does not require this command.

coveralls · 2024-02-17T14:37:06Z

Pull Request Test Coverage Report for Build 7942207273

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.02%) to 62.721%

Totals
Change from base Build 7907865156:	0.02%
Covered Lines:	8966
Relevant Lines:	14295

💛 - Coveralls

…date * fixed all sbt warnings * remove file system url prefix (#14132) * SPARKNLP-942: MPNet Classifiers (#14147) * SPARKNLP-942: MPNetForSequenceClassification * SPARKNLP-942: MPNetForQuestionAnswering * SPARKNLP-942: MPNet Classifiers Documentation * Restore RobertaforQA bugfix * adding import notebook + changing default model + adding onnx support (#14158) * Sparknlp 876: Introducing LLAMA2 (#14148) * introducing LLAMA2 * Added option to read model from model path to onnx wrapper * Added option to read model from model path to onnx wrapper * updated text description * LLAMA2 python API * added method to save onnx_data * added position ids * - updated Generate.scala to accept onnx tensors - added beam search support for LLAMA2 * updated max input length * updated python default params changed test to slow test * fixed serialization bug * Doc sim rank as retriever (#14149) * Added retrieval interface to the doc sim rank approach * Added Python interface as retriever in doc sim ranker --------- Co-authored-by: Stefano Lori <s.lori@izicap.com> * 812 implement de berta for zero shot classification annotator (#14151) * adding code * adding notebook for import --------- Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * Add notebook for fine tuning sbert (#14152) * [SPARKNLP-986] Fixing optional input col validations (#14153) * [SPARKNLP-984] Fixing Deberta notebooks URIs (#14154) * SparkNLP 933: Introducing M2M100 : multilingual translation model (#14155) * introducing LLAMA2 * Added option to read model from model path to onnx wrapper * Added option to read model from model path to onnx wrapper * updated text description * LLAMA2 python API * added method to save onnx_data * added position ids * - updated Generate.scala to accept onnx tensors - added beam search support for LLAMA2 * updated max input length * updated python default params changed test to slow test * fixed serialization bug * Added Scala code for M2M100 * Documentation for scala code * Python API for M2M100 * added more tests for scala * added tests for python * added pretrained * rewording * fixed serialization bug * fixed serialization bug --------- Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-985: Add flexible naming for onnx_data (#14165) Some annotators might have different naming schemes for their files. Added a parameter to control this. * Add LLAMA2Transformer and M2M100Transformer to annotator * Add LLAMA2Transformer and M2M100Transformer to ResourceDownloader * bump version to 5.3.0 [skip test] * SPARKNLP-999: Fix remote model loading for some onnx models * used filesystem to check for the onnx_data file (#14169) * [SPARKNLP-940] Adding changes to correctly copy cluster index storage… (#14167) * [SPARKNLP-940] Adding changes to correctly copy cluster index storage when defined * [SPARKNLP-940] Moving local mode control to its right place * [SPARKNLP-940] Refactoring sentToCLuster method * [SPARKNLP-988] Updating EntityRuler documentation (#14168) * [SPARKNLP-940] Adding changes to support storage temp directory (cluster_tmp_dir) * SPARKNLP-1000: Disable init_all_tables for GPT2 (#14177) Fixes `java.lang.IllegalArgumentException: No Operation named [init_all_tables] in the Graph` when model needs to be deserialized. The deserialization is skipped when the modelis already loaded (so it will only appear on the worker nodes and not the driver) GPT2 does not contain tables and so does not require this command. * fixes python documentation (#14172) * revert MarianTransformer.scala * revert HasBatchedAnnotate.scala * revert Preprocessor.scala * Revert ViTClassifier.scala * disable hard exception * Replace hard exception with soft logs (#14179) This reverts commit eb91fde. * move the example from root to examples/ [skip test] * Cleanup some code [skip test] * Update onnxruntime to 1.17.0 [skip test] * Fix M2M100 default model's name [skip test] * Update docs [run doc] * Update Scala and Python APIs --------- Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com> Co-authored-by: Jiamao Zheng <jiamaozheng@users.noreply.github.com> Co-authored-by: Devin Ha <33089471+DevinTDHa@users.noreply.github.com> Co-authored-by: Prabod Rathnayaka <prabod@rathnayaka.me> Co-authored-by: Stefano Lori <wolliq@users.noreply.github.com> Co-authored-by: Stefano Lori <s.lori@izicap.com> Co-authored-by: Danilo Burbano <37355249+danilojsl@users.noreply.github.com> Co-authored-by: Devin Ha <t.ha@tu-berlin.de> Co-authored-by: Danilo Burbano <danilo@johnsnowlabs.com> Co-authored-by: github-actions <action@github.com>

DevinTDHa added bug-fix DON'T MERGE Do not merge this PR labels Feb 17, 2024

DevinTDHa requested a review from maziyarpanahi February 17, 2024 14:09

DevinTDHa self-assigned this Feb 17, 2024

maziyarpanahi approved these changes Feb 19, 2024

View reviewed changes

maziyarpanahi merged commit b148e79 into JohnSnowLabs:release/530-release-candidate Feb 19, 2024
6 checks passed

maziyarpanahi mentioned this pull request Feb 19, 2024

release/530-release-candidate #14164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARKNLP-1000: Fix No Operation named [init_all_tables] for GPT2 #14177

SPARKNLP-1000: Fix No Operation named [init_all_tables] for GPT2 #14177

DevinTDHa commented Feb 17, 2024 •

edited

Loading

coveralls commented Feb 17, 2024

SPARKNLP-1000: Fix No Operation named [init_all_tables] for GPT2 #14177

SPARKNLP-1000: Fix No Operation named [init_all_tables] for GPT2 #14177

Conversation

DevinTDHa commented Feb 17, 2024 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

coveralls commented Feb 17, 2024

Pull Request Test Coverage Report for Build 7942207273

Details

💛 - Coveralls

DevinTDHa commented Feb 17, 2024 •

edited

Loading