Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash Transform API that takes in advanced options. #4443

Open
wants to merge 3 commits into
base: master
from

Conversation

@codemzs
Copy link
Member

codemzs commented Nov 5, 2019

fixes #4422

Needed by a 1P customer.

@codemzs codemzs requested a review from eerhardt Nov 5, 2019
@codemzs codemzs requested a review from dotnet/mlnet-core as a code owner Nov 5, 2019
@codecov

This comment has been minimized.

Copy link

codecov bot commented Nov 5, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@7b2478c). Click here to learn what that means.
The diff coverage is 91.66%.

@@            Coverage Diff            @@
##             master    #4443   +/-   ##
=========================================
  Coverage          ?   74.71%           
=========================================
  Files             ?      906           
  Lines             ?   159289           
  Branches          ?    17145           
=========================================
  Hits              ?   119011           
  Misses            ?    35476           
  Partials          ?     4802
Flag Coverage Δ
#Debug 74.71% <91.66%> (?)
#production 70.07% <82.35%> (?)
#test 90.11% <100%> (?)
Impacted Files Coverage Δ
test/Microsoft.ML.Tests/Transformers/HashTests.cs 100% <100%> (ø)
...icrosoft.ML.TestFramework/DataPipe/TestDataPipe.cs 91.27% <100%> (ø)
src/Microsoft.ML.Transforms/OneHotHashEncoding.cs 84.92% <100%> (ø)
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs 70.87% <100%> (ø)
...soft.ML.Data/DataLoadSave/DataOperationsCatalog.cs 78.45% <100%> (ø)
...L.Transforms/Text/WordHashBagProducingTransform.cs 61.08% <50%> (ø)
src/Microsoft.ML.Data/Transforms/Hashing.cs 92.34% <81.81%> (ø)
/// [!code-csharp[Hash](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Conversion/HashWithOptions.cs)]
/// ]]></format>
/// </example>
public static HashingEstimator Hash(this TransformsCatalog.ConversionTransforms catalog, params HashingEstimator.Options[] options)

This comment has been minimized.

Copy link
@yaeldekel

yaeldekel Nov 6, 2019

Member

options [](start = 131, length = 7)

Wouldn't it be simpler to add the missing options Seed and UseOrderedHashing as parameters to the API above?
From looking over the transforms, I didn't see any that expose a ColumnOptions/Options class (I may have missed some though). For estimators that are trainable sometimes we expose an API for defining multiple columns using an array of InputOutputColumnPair, but the other options are usually common for all the columns.
Was there a specific ask for defining multiple columns with different options in a single estimator? (the hashing estimator is trainable in case of maximumNumberOfInverts > 0, so there is an advantage to exposing an API for multiple columns, I'm just not sure whether we need to support defining different options for each column).

This comment has been minimized.

Copy link
@codemzs

codemzs Nov 6, 2019

Author Member

We follow the pattern as for each component there is a convenience API and an advanced API that takes in all the options.

This comment has been minimized.

Copy link
@codemzs

codemzs Nov 6, 2019

Author Member

@eerhardt what are your thoughts?

This comment has been minimized.

Copy link
@eerhardt

eerhardt Nov 6, 2019

Member

See the discussion on #1798.

This comment has been minimized.

Copy link
@eerhardt

eerhardt Nov 6, 2019

Member

Also see the discussion on #2884. Renaming ColumnOptions to Options and leaving it as-is may not be the best choice here. See Artidoro, Senja, and Tom's comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.