SPARKNLP-732 Unify all externally supported file systems and cloud access #13919

danilojsl · 2023-08-08T19:49:39Z

Description

This change introduces a redesign of the integration architecture with S3 and GCP.
Additionally, it adds Azure support for:

Defining cache_pretrained directory
Training NER logs
Importing HF models in Spark NLP
Loading TF Graphs for NER-DL

To have universal methods so we can deal with storage cloud providers (S3, GCP, and Azure )from the same classes

Bug fix (non-breaking change which fixes an issue)
Code improvements with no or little impact
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

…roperty

…d storages

danilojsl added 8 commits August 8, 2023 14:36

SPARKNLP-732 Refactoring to unify Cloud access for cache_pretrained p…

b4f92e1

…roperty

SPARKNLP-732 Adding support to export NER log file for GCP Storage

66704a3

SPARKNLP-732 Refactoring loadSavedModel to import HF models from Clou…

3ef590f

…d storages

SPARKNLP-732 Adding support for GCP when importing HF models

81fe075

SPARKNLP-732 Moving Credentials component to aws package

a5052e1

SPARKNLP-732 Fixing HDFS log and NER Graph load issues

e06e4e0

SPARKNLP-732 Adding Azure Dependencies

d49314c

SPARKNLP-732 Adding Azure support for all cloud operations

bab8924

danilojsl requested a review from maziyarpanahi August 8, 2023 19:49

danilojsl added DON'T MERGE Do not merge this PR enhancement new-feature Introducing a new feature labels Aug 8, 2023

SPARKNLP-732 Adding documentation for Azure, GCP and S3 support

0e6905a

maziyarpanahi approved these changes Aug 24, 2023

View reviewed changes

maziyarpanahi changed the base branch from master to release/510-release-candidate August 24, 2023 07:48

maziyarpanahi merged commit 09bae6c into release/510-release-candidate Aug 24, 2023
4 of 6 checks passed

maziyarpanahi mentioned this pull request Aug 24, 2023

release/510-release-candidate #13932

Merged