Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKNLP-835: ProtectedParam and ProtectedFeature #13797

Conversation

DevinTDHa
Copy link
Member

@DevinTDHa DevinTDHa commented May 11, 2023

Description

This PR enables Params and Features to be protected. Once set, they can't be changed again. This is useful for specific pretrained models, where some parameters should not be changed (as it could interfere with the functionality of the pretrained model).

Protected Features

Protected Features were already introduced in the PR #13777 but are finalized in this PR. An warning will be printed when trying to set a protected feature and it will not be set.

Protected Params

Protected Params are introduced with the trait HasProtectedParams:

https://github.com/DevinTDHa/spark-nlp/blob/0fdab1d0c6eb32125558b77f5d7e33adcbcd5346/src/main/scala/com/johnsnowlabs/nlp/HasProtectedParams.scala

It introduces an implicit conversion from a Param to a ProtectedParam. This can be done, by calling setProtected() on a regular Spark parameter. For Example:

  val maxSentenceLength =
    new IntParam(this, "maxSentenceLength", "Max sentence length to process").setProtected()

The conversion enables additional functionality while setting the value for the parameter. For the set function, it will check, whether or not the (now) ProtectedParam is protected. If so, it will only allow for the parameter to be set once:

def set[T](param: ProtectedParam[T], value: T): this.type = {
if (param.isProtected && get(param).isDefined)
println(
s"Warning: The parameter ${param.name} is protected and can only be set once." +
" For a pretrained model, this was done during the initialization process." +
" If you are trying to train your own model, please check the documentation.")
else
set(param.toParam, value)
this
}

Note: Default values do not count as set and can therefore be overridden

Comment:
@maziyarpanahi I would like to hear your opinion on this! Do you think this is ok to protect regular Spark parameters?
I wrote this trait, in hopes that we can reuse it often and don't have to change much of the existing code. An example of what kind of code changes this would require, can be seen in the class BertSentenceEmbeddings

Motivation and Context

Some parameters should not be changed after initialization. This PR addresses this missing feature.

How Has This Been Tested?

Tests have been added. The behavior can be seen here:

it should "set protected params only once" taggedAs FastTest in {
model.setProtectedParam("first")
assert(model.getProtectedParam == "first")
val output = captureOutput {
model.setProtectedParam("second")
}
assert(output.contains("is protected and can only be set once"))
assert(model.getProtectedParam == "first")
}

TODO:

  • Some tests are failing, ProtectedParams will need to override some additional functions.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@DevinTDHa DevinTDHa added enhancement new-feature Introducing a new feature DON'T MERGE Do not merge this PR labels May 11, 2023
@DevinTDHa DevinTDHa self-assigned this May 11, 2023
@DevinTDHa DevinTDHa changed the title SPARKNLP-835: ProtectedParam and ProtectedFeature [WIP] SPARKNLP-835: ProtectedParam and ProtectedFeature May 11, 2023
@coveralls
Copy link

coveralls commented May 16, 2023

Pull Request Test Coverage Report for Build 4992350528

  • 17 of 24 (70.83%) changed or added relevant lines in 6 files are covered.
  • 52 unchanged lines in 34 files lost coverage.
  • Overall coverage decreased (-0.01%) to 65.831%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/main/scala/com/johnsnowlabs/nlp/annotators/audio/Wav2Vec2ForCTC.scala 0 1 0.0%
src/main/scala/com/johnsnowlabs/nlp/annotators/coref/SpanBertCorefModel.scala 0 1 0.0%
src/main/scala/com/johnsnowlabs/nlp/annotators/cv/ViTForImageClassification.scala 0 1 0.0%
src/main/scala/com/johnsnowlabs/nlp/annotators/ld/dl/LanguageDetectorDL.scala 2 4 50.0%
src/main/scala/com/johnsnowlabs/nlp/serialization/Feature.scala 6 8 75.0%
Files with Coverage Reduction New Missed Lines %
src/main/scala/com/johnsnowlabs/nlp/annotators/DateMatcher.scala 1 97.2%
src/main/scala/com/johnsnowlabs/nlp/annotators/DateMatcherUtils.scala 1 87.5%
src/main/scala/com/johnsnowlabs/nlp/annotators/er/EntityRulerModel.scala 1 90.4%
src/main/scala/com/johnsnowlabs/nlp/annotators/GraphExtraction.scala 1 78.1%
src/main/scala/com/johnsnowlabs/nlp/annotators/ld/dl/LanguageDetectorDL.scala 1 66.27%
src/main/scala/com/johnsnowlabs/nlp/annotators/ner/crf/NerCrfModel.scala 1 69.23%
src/main/scala/com/johnsnowlabs/nlp/annotators/ner/dl/NerDLApproach.scala 1 81.03%
src/main/scala/com/johnsnowlabs/nlp/annotators/ner/dl/NerDLModel.scala 1 56.96%
src/main/scala/com/johnsnowlabs/nlp/annotators/NormalizerModel.scala 1 71.43%
src/main/scala/com/johnsnowlabs/nlp/annotators/parser/dep/GreedyTransition/DependencyMaker.scala 1 80.45%
Totals Coverage Status
Change from base Build 4945586143: -0.01%
Covered Lines: 8643
Relevant Lines: 13129

💛 - Coveralls

@DevinTDHa DevinTDHa changed the title [WIP] SPARKNLP-835: ProtectedParam and ProtectedFeature SPARKNLP-835: ProtectedParam and ProtectedFeature May 16, 2023
@DevinTDHa DevinTDHa marked this pull request as ready for review May 16, 2023 14:11
@maziyarpanahi maziyarpanahi changed the base branch from master to release/443-release-candidate May 24, 2023 11:52
@maziyarpanahi maziyarpanahi merged commit e586d56 into JohnSnowLabs:release/443-release-candidate May 25, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DON'T MERGE Do not merge this PR enhancement new-feature Introducing a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants