Added Code for pushing to hub #10563

ahmedlone127 · 2022-07-18T10:57:59Z

Added Push to Hub for Models and Pipelines

Description

Parameters

Name	Type
Name	Type
name (required)	string
task (required)	string
sparkVersion (required)	string
sparknlpVersion (required)	string
language (required)	string
license	string ["Open Source", "Licensed"]
tags	array of strings
supported	boolean
title (required)	string
dependencies	string
description (required)	string
predictedEntities	string
howToUse	string
liveDemo	string
runInColab	string
pythonCode (required)	string
scalaCode	string
nluCode	string
results	string
dataSource	string
includedModels	string
benchmarking	string

Example Usage

from python.sparknlp.upload_to_hub import PushToHub
sample_upload = {
    "name":"analyze_sentiment_ml",
    "task":'Sentiment Analysis',
    'title':'Analyze Sentiment Machine Learning ',
    'sparkVersion':"3.0",
    'sparknlpVersion':'Spark NLP 4.0.0',
    'language':'en',
    'license':'Open Source',
    'description':'''The analyze_sentiment_ml is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps and predicts sentiment  .
         It performs most of the common text processing tasks on your dataframe''',
    'pythonCode':'''from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("analyze_sentiment_ml", "en")

result = pipeline.annotate("""I love johnsnowlabs!  """)''',
'model_zip_path':'pos_ud_bokmaal_nb_3.4.0_3.0_1641902661339.zip'

}

PushToHub.upload_to_modelshub_and_fill_form_API(sample_upload,GitToken )

This reverts commit 5f8acd1.

maziyarpanahi · 2022-07-18T11:13:34Z

Thanks @ahmedlone127 for this, let's enrich this and have some restrictions:

Let's not expose the raw sample_upload and ask for required/mandatory info from the user (from a dict) and then we fill in the rest internally, like Spark and Spark NLP versions, it must be Open Source since nobody is allowed to upload licensed models/pipelines except JSL members, etc.
maybe we can zip/archive internally and the user just shares the path to a saved model's path (if it's .zip then we skip if not we do the.zip ourselves)
maybe we can also check to see if the model/pipeline has metadata-00000 (the test models hub does) to be sure it's Apache Spark saved model to avoid having an error in return

ahmedlone127 · 2022-07-18T12:35:27Z

Hey @maziyarpanahi For the first part, how about I add a function called create_docs that takes in the required fields as parameters, fills up some of them internally, and has optional arguments such as benchmarking and scalaCode so the user can still add extra info if they want to, at the end it calls the original function and uploads to the hub.

maziyarpanahi · 2022-07-18T12:38:23Z

That's a great idea. Let's have a create_docs to fill in everything required, the output would be a dictionary that can be fed into upload_to_hub (which let's rename this to push_to_hub().
However, if someone calls push_to_hub() with no dictionary and only a simple dict (name, lang, path to model) we should still allow it to be uploaded. (those fields are required, but the rest can be done in the PR)

ahmedlone127 · 2022-07-18T12:58:17Z

If someone calls push_to_hub directly with a simple dict like that , we can for sure add the other required fields and keep some of the ones that we can't generate empty but,I think we should also make task and pythonCode required, we can generate Title and Description but without those two the model won't make a lot of sense (useable)

maziyarpanahi · 2022-07-18T13:23:06Z

If someone calls push_to_hub directly with a simple dict like that , we can for sure add the other required fields and keep some of the ones that we can't generate empty but,I think we should also make task and pythonCode required, we can generate Title and Description but without those two the model won't make a lot of sense (useable)

That makes sense, we can have PyDoc show the minimum required fields and then make those mandatory

ahmedlone127 · 2022-07-20T20:25:32Z

Hey, @maziyarpanahi do we still support spark version 2? I am asking because for the sparknlpVersion we can simply import the library and do sparknlp.version() but for sparkVersion we would have to start a spark session and we could avoid that if we keep it at 3 by default? and we would keep the supported field to False right?

maziyarpanahi · 2022-07-20T21:14:27Z

Hi,

No, by default and until further notice the spark version is 3.0 for models/pipelines.
For spark nlp version, if it's empty we can take it from the current or else users can specify something else I guess

ahmedlone127 · 2022-07-20T21:34:42Z

that sounds good, I have also added the zip function we talked about to zip folders, I was thinking about what was a good way to add the last part

I think we should check for this file in the metadata folder in the given input and if it exists we assume it's apache spark .

maziyarpanahi · 2022-07-21T07:51:45Z

that sounds good, I have also added the zip function we talked about to zip folders, I was thinking about what was a good way to add the last part I think we should check for this file in the metadata folder in the given input and if it exists we assume it's apache spark .

I will loop in @pabla who has more insight. @pabla So we basically want to have a first simple check to avoid Models Hub returning an error when it comes to the format of the saved model in Spark/Spark NLP.

ahmedlone127 · 2022-08-02T21:28:10Z

Hey @maziyarpanahi I made the changes we discussed , please review them and let me know if they look good :)

- users are not allowed to upload licensed models

maziyarpanahi · 2022-08-05T08:10:27Z

Hey @maziyarpanahi I made the changes we discussed , please review them and let me know if they look good :)

Thanks for this, I have pushed some changes. Can we have a small unit test for this? Obviously, you can tag it as a slow but we can simply load a NerDLModel.pretrained(), save it and use a sample code to upload it to see if it works. (you can leave GIT_TOKEN empty and we manually add it when we do manual tests)

ahmedlone127 · 2022-08-05T12:09:09Z

Hey @maziyarpanahi I made a test

import unittest
import sparknlp
from sparknlp.annotator import *
from sparknlp.common import *
from sparknlp.base import *
from pyspark.ml import Pipeline
from upload_models_to_hub import PushToHub



class TestStringMethods(unittest.TestCase):

    def load_and_upload():
        """Loads and uploads a Spark NLP Model 
        """
        spark = sparknlp.start()
        ner_model = NerDLModel.pretrained("ner_aspect_based_sentiment")\
        .setInputCols(["document", "token", "embeddings"])\
        .setOutputCol("ner")
        nlp_pipeline = Pipeline(stages=[ ner_model])
        model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
        model.write().overwrite().save(f"test")
        PushToHub.push_to_hub('test_model_hub_upload',
                    'en',
                    'test',
                    'Summarization',
                    'restaurant_pipeline = PretrainedPipeline("nerdl_restaurant_100d_pipeline", lang = "en")',
                    GIT_TOKEN = '')


if __name__ == '__main__':
    unittest.main()

ahmedlone127 added 3 commits July 18, 2022 15:45

Added Code for pushing to hub

5f8acd1

Adding Push To Hub File

60cb07e

Revert "Added Code for pushing to hub"

021ff85

This reverts commit 5f8acd1.

ahmedlone127 assigned maziyarpanahi Jul 18, 2022

ahmedlone127 added enhancement documentation labels Jul 18, 2022

maziyarpanahi added the DON'T MERGE Do not merge this PR label Jul 18, 2022

maziyarpanahi changed the base branch from master to release/402-release-candidage July 18, 2022 12:30

maziyarpanahi changed the base branch from release/402-release-candidage to master July 18, 2022 12:32

adding docs

c3fb3ac

maziyarpanahi requested review from maziyarpanahi and pabla July 21, 2022 07:51

Adding tests for uploading to hub and more doc creation utils

01aa766

ahmedlone127 and others added 3 commits August 4, 2022 00:48

Removing unnesscary test

7244cdf

Fix coding style [skip test]

e368dbc

Remove licensed tasks [skip test]

b348a3e

- users are not allowed to upload licensed models

Fix misspelled docs

a69688b

maziyarpanahi changed the base branch from master to release/410-release-candidate August 8, 2022 06:30

maziyarpanahi approved these changes Aug 22, 2022

View reviewed changes

maziyarpanahi merged commit 7c029b0 into release/410-release-candidate Aug 22, 2022

maziyarpanahi mentioned this pull request Aug 22, 2022

Release/410 release candidate #12613

Merged

KshitizGIT deleted the Adding-Push-To-Hub branch March 2, 2023 09:53

Added Code for pushing to hub #10563

Added Code for pushing to hub #10563

Uh oh!

Conversation

ahmedlone127 commented Jul 18, 2022

Description

Parameters

Example Usage

Uh oh!

maziyarpanahi commented Jul 18, 2022

Uh oh!

ahmedlone127 commented Jul 18, 2022

Uh oh!

maziyarpanahi commented Jul 18, 2022

Uh oh!

ahmedlone127 commented Jul 18, 2022

Uh oh!

maziyarpanahi commented Jul 18, 2022

Uh oh!

ahmedlone127 commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maziyarpanahi commented Jul 20, 2022

Uh oh!

ahmedlone127 commented Jul 20, 2022

Uh oh!

maziyarpanahi commented Jul 21, 2022

Uh oh!

ahmedlone127 commented Aug 2, 2022

Uh oh!

maziyarpanahi commented Aug 5, 2022

Uh oh!

ahmedlone127 commented Aug 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahmedlone127 commented Jul 20, 2022 •

edited

Loading