Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning with counts (Dracula) transformer #4514

Open
wants to merge 4 commits into
base: master
from

Conversation

@yaeldekel
Copy link
Member

yaeldekel commented Dec 2, 2019

Fixes #4016.

@yaeldekel yaeldekel requested a review from dotnet/mlnet-core as a code owner Dec 2, 2019
@@ -188,9 +187,9 @@ private static VersionInfo GetVersionInfo()
IDataView input,
string name,
string source = null,
bool join = Defaults.Join,
bool join = Defaults.Combine,

This comment has been minimized.

Copy link
@eerhardt

eerhardt Dec 2, 2019

Member

bool join => bool combine. And update the xml docs #Resolved

}
}

public static class Dracula

This comment has been minimized.

Copy link
@eerhardt

eerhardt Dec 2, 2019

Member

No need for this to be public, it only has an internal method. #Resolved

}
}

public static class CountTable

This comment has been minimized.

Copy link
@eerhardt

eerhardt Dec 2, 2019

Member

should be internal #Resolved

/// <summary>
/// Signature for CountTableBuilder.
/// </summary>
public delegate void SignatureCountTableBuilder();

This comment has been minimized.

Copy link
@eerhardt

eerhardt Dec 2, 2019

Member

Does this need to be public? #Resolved

@codemzs

This comment has been minimized.

Copy link
Member

codemzs commented Dec 2, 2019

@yaeldekel Shall we give it a better name as opposed to calling it "Dracula"? Seems a little informal to me.

private static VersionInfo GetVersionInfo()
{
return new VersionInfo(
modelSignature: "CM CT",

This comment has been minimized.

Copy link
@codemzs

codemzs Dec 2, 2019

Member

"CM CT", [](start = 32, length = 11)

what kind of model signature is this? Why is there a tab in between? #Resolved


namespace Microsoft.ML.Transforms
{
public class DraculaEstimator : IEstimator<DraculaTransformer>

This comment has been minimized.

Copy link
@justinormont

justinormont Dec 2, 2019

Member

Moving @codemzs' comment to a thread:

@yaeldekel Shall we give it a better name as opposed to calling it "Dracula"? Seems a little informal to me.

This comment has been minimized.

Copy link
@justinormont

justinormont Dec 2, 2019

Member

If Dracula is not used, CountTargetEncoder might be suitable.

Our old blog post lists DRACULA as "Distributed Robust Algorithm for Count-based Learning": https://blogs.technet.microsoft.com/machinelearning/2015/02/17/big-learning-made-easy-with-counts/

As an aside, there's also public slides:
https://www.slideshare.net/SessionsEvents/misha-bilenko-principal-researcher-microsoft

This comment has been minimized.

Copy link
@yaeldekel

yaeldekel Dec 3, 2019

Author Member

Good suggestion. Changing the name to CountTargetEncoder, unless someone has a different idea?

@codecov

This comment has been minimized.

Copy link

codecov bot commented Dec 3, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@e63fa8f). Click here to learn what that means.
The diff coverage is 93.77%.

@@            Coverage Diff            @@
##             master    #4514   +/-   ##
=========================================
  Coverage          ?   75.32%           
=========================================
  Files             ?      917           
  Lines             ?   161927           
  Branches          ?    17453           
=========================================
  Hits              ?   121965           
  Misses            ?    35102           
  Partials          ?     4860
Flag Coverage Δ
#Debug 75.32% <93.77%> (?)
#production 70.81% <93.11%> (?)
#test 90.31% <100%> (?)
Impacted Files Coverage Δ
...st/Microsoft.ML.Tests/Transformers/DraculaTests.cs 100% <100%> (ø)
...crosoft.ML.Core.Tests/UnitTests/TestEntryPoints.cs 98.37% <100%> (ø)
...icrosoft.ML.TestFramework/DataPipe/TestDataPipe.cs 91.62% <100%> (ø)
src/Microsoft.ML.EntryPoints/TrainTestSplit.cs 93.02% <100%> (ø)
...rc/Microsoft.ML.Transforms/HashJoiningTransform.cs 88.39% <50%> (ø)
.../Microsoft.ML.Transforms/Dracula/DictCountTable.cs 87.28% <87.28%> (ø)
...Microsoft.ML.Transforms/Dracula/MultiCountTable.cs 90.51% <90.51%> (ø)
...oft.ML.Transforms/Dracula/CountTableTransformer.cs 92.5% <92.5%> (ø)
...crosoft.ML.Transforms/Dracula/CountTableBuilder.cs 93.75% <93.75%> (ø)
...rc/Microsoft.ML.Transforms/Dracula/CMCountTable.cs 95.45% <95.45%> (ø)
... and 3 more
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.