SPARKNLP-785 Fix WordEmbeddingsModel Bug with LightPipeline #13715

danilojsl · 2023-03-28T15:48:33Z

Description

When using WorbEmbeddingsModel with setEnableInMemoryStorage as true in a LightPipeline it raised Non.get error

Motivation and Context

Make available WorbEmbeddingsModel in LightPipeline

How Has This Been Tested?

Screenshots (if appropriate):

Google Colab Notebook

Types of changes

Bug fix (non-breaking change which fixes an issue)
Code improvements with no or little impact
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING page.
I have added tests to cover my changes.
All new and existing tests passed.

* SPARKNLP-782 Removes deprecated parameter enablePatternRegex (#13664) * SPARKNLP-748: Custom Entity Name for Date2Chunk (#13680) - added parameter "entityName" to change metadata name * SPARKNLP-784 Fix loading WordEmbeddingsModel bug when cache_folder is from S3 (#13707) * SPARKNLP-605: ConvNextForImageClassification (#13713) * SPARKNLP-605: ConvNextForImageClassification - Added ConvNextForImageClassification with new tests - Refactored image Preprocessor and added new config - Implemented filters with resample property for ImageResizeUtils.resizeBufferedImage (with minor performance gain) - Minor improvements for ViT and Swin * SPARKNLP-605: Docs * SPARKNLP-605: Lazy values for test * SPARKNLP-785 Fix WordEmbeddingsModel bug whit LightPipeline (#13715) * [skip test] SPARKNLP-783: Python 3.6 deprecated in Spark 3.2 (#13724) * SPARKNLP-763 Implementing ZeroShot Text Classification for BERT and DistilBERT based on NLI (#13727) * SPARKNLP-763 Fix a typo Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 add unfinished traits Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Create a new BertForZeroShotClassification annotator Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Create a new HasCandidateLabelsProperties Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Implement predict sequence with NLI, new tokenize from strings, and new tag ZeroShot Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Clean up the code Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Add BertForZeroShotClassification to annotator [skip test] Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Add BertForZeroShotClassification to ResourceDownloader [skip test] Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Implement BertForZeroShotClassification in Python [skip test] Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Add unit tests for BertForZeroShotClassification Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * change default model to bert_base_cased_zero_shot_classifier_xnli * SPARKNLP-763 Fix Scaladoc and Pydoc Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-763 Fix Update unit test in Scala Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> --------- Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * Sparknlp 534 Introducing BART Transformer for text-to-text generation tasks like translation and summarization (#13731) * WIP: Added Bart transformer scala files * WIP: Added BART tokenizer test and BART is locally working * WIP: Added BART tokenizer test and BART is locally working * WIP: Added Beam Hypothesis and Beam Scorer implementations * WIP: Added Logit Processors * WIP: Added Beam Search implementation * WIP: Completed Beam Search implementation WIP: Added Generate method for text generation * WIP: fixed a bug in Beam search algorithm WIP: Generate method for text generation * WIP: changed BartTransformer methods to include beam size and added description * WIP: changed BartTransformer test methods * WIP: fixed errors in BeamSearch * WIP: Updated to use separate encoder decoder model * WIP: Changed model to handle the int64 version of the model weights * WIP: Added python API implementation * Pass session and encoder state as a parameter Clean up unnecessary code * Update TopK Logit Warper Logic * Code clean up * Update Tests * Update documentation * Update documentation and python tests * Update python tests * SPARKNLP-534 move BartTokenizer to the Bart backend Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-534 Fix the copyright year Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-534 Add BartTransformer to annotator and ResourceDownloader Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-534 Fix BartTransformer in annotator Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> --------- Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * Bump version to 4.4.0 * Update doc style and fix unit test [skip test] Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-605: Fix parameter eval for vit tests * Update default model name (#13744) * SPARKNLP-796 Creating a new `nerHasNoSchema` param (#13745) * Adding missing CPUvsGPUbenchmark page * SPARKNLP-796 Creating a new `nerHasNoSchema` param Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> --------- Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * Change default model for BART to distilbart-xsum-12-6 Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * Change default model for BART to distilbart_xsum_12_6 Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * Replace nlp with sparknlp.org website * Update INT64 to INT32 (#13748) * Fix the wrong column in unit test [skip test] Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * SPARKNLP-805: Documentation for release/440 (#13743) * Fixed memory leak * Added Bart Notebook * Add new features and update docs[run doc] * Update install.md * Update CHANGELOG [run doc] * Update Scala and Python APIs * release spark-nlp 4.4.0 on Conda [skip test] --------- Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> Co-authored-by: Danilo Burbano <37355249+danilojsl@users.noreply.github.com> Co-authored-by: Devin Ha <33089471+DevinTDHa@users.noreply.github.com> Co-authored-by: Prabod Rathnayaka <prabod@rathnayaka.me> Co-authored-by: Devin Ha <t.ha@tu-berlin.de> Co-authored-by: github-actions <action@github.com>

SPARKNLP-785 Fix WordEmbeddingsModel bug whit LightPipeline

9e9695d

danilojsl added bug-fix DON'T MERGE Do not merge this PR labels Mar 28, 2023

danilojsl requested a review from maziyarpanahi March 28, 2023 17:49

maziyarpanahi changed the base branch from master to release/440-release-candidate April 6, 2023 18:03

maziyarpanahi approved these changes Apr 6, 2023

View reviewed changes

maziyarpanahi merged commit b6b8cc6 into release/440-release-candidate Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARKNLP-785 Fix WordEmbeddingsModel Bug with LightPipeline #13715

SPARKNLP-785 Fix WordEmbeddingsModel Bug with LightPipeline #13715

danilojsl commented Mar 28, 2023

SPARKNLP-785 Fix WordEmbeddingsModel Bug with LightPipeline #13715

SPARKNLP-785 Fix WordEmbeddingsModel Bug with LightPipeline #13715

Conversation

danilojsl commented Mar 28, 2023

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist: