Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrubbing learners (Meta issue) #2613

Closed
Ivanidzo4ka opened this issue Feb 19, 2019 · 3 comments
Closed

Scrubbing learners (Meta issue) #2613

Ivanidzo4ka opened this issue Feb 19, 2019 · 3 comments
Labels
API Issues pertaining the friendly API
Milestone

Comments

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 19, 2019

We need to clean our API for learners.
This includes following things:

  1. No protected fields/members/method in public classes. Only private protected.

  2. Learner class is sealed.

  3. The trainer name types should follow the names used in the contexts (see The trainer name types should follow the names used in the contexts #2172)

  4. ModelPameters for this Learner is also clean. (sealed, public documentation, no public constructor)

  5. Options cleaning:

  • They should be named Options, and all their base classes (except LearnerInputBase*)
  • Option should have meaning and proper way it initialize it self. We shouldn't accept int if in reality we use enum (FastTree EarlyStoppingMetric is an int but only accepts specific values #2521) or accept array of ints as a string separated by comma public string CustomGains = "0,3,7,15,31";
  • No short names.
  • Standard names like:
    /// <param name="labelColumnName">The name of the label column.</param>
    /// <param name="featureColumnName">The name of the feature column.</param>
    NumberOfIterations, NumberOfThreads, LearningRate,L2Regularization, L1Regularization
    no (MaxIterations,NumIterations, NumThreads, L2Weight, L1Weight)
  1. If we communicate with user (exceptions, channel messages), don't use DataKind values. No R4, U4, and so on, it should be float, uint, etc.

ignore for now:

  1. Everything public has proper documentation.

  2. We have samples of how use base call to trainer (no options) and call with options.

  3. We have baseline tests for learner.

https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/general-naming-conventions#using-abbreviations-and-acronyms

`

@rogancarr
Copy link
Contributor

  1. There should be consistency within Tasks (e.g. BinaryClassification, Regression) on how labels are specified (e.g. R4, bool, KeyType).

For instance, many learners accept R4 values and helpfully convert them to bool or KeyType, while some learners only accept KeyType. (see #2628 for a multiclass example). This inconsistency is confusing from a user perspective because you can't hot-swap Trainers in a pipeline.

@wschin
Copy link
Member

wschin commented Mar 1, 2019

We also have rowGroupColumnName to replace groupIdColumn and exampleWeightColumnName for weightColumn.

@rogancarr
Copy link
Contributor

The classnames themselves come in many different styles and they are surfaced through options:

For example, SDCA and FastTree have very different class names for binary classification:

var sdca = mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(
    new SdcaBinaryTrainer.Options { NumberOfThreads = 1 });

var fastTree = mlContext.BinaryClassification.Trainers.FastTree(
    new FastTreeBinaryClassificationTrainer.Options { NumberOfThreads = 1 });

These class names should have a standard grammar.

@shauheen shauheen added this to the 0319 milestone Mar 15, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

4 participants