New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata fixes for the ValueMappingEstimator #2098

Merged
merged 6 commits into from Jan 15, 2019

Conversation

Projects
None yet
3 participants
@singlis
Copy link
Member

singlis commented Jan 9, 2019

The ValueMappingEstimator had a couple of issues when using the Values as KeyTypes:

  1. The output schema for the Estimator did not contain the KeyType
    information in the metadata.
  2. The reverse lookup of the metadata had the incorrect value.

This now sets the correct metadata on the output schema and uses the
value data for the reverse lookup. A test was added to confirm the
changes using the KeyToValueMapping appended to a ValueMappingEstimator
for the reverse lookup.

Fixes #2086
Fixes #2083

Scott Inglis
The ValueMappingEstimator had a couple of issues when using setting the
Values as KeyTypes:
1) The output schema for the Estimator did not contain the KeyType
information in the metadata.
2) The reverse lookup of the metadata had the incorrect value.

This now sets the correct metadata on the output schema and uses the
value data for the reverse lookup. A test was added to confirm the
changes using the KeyToValueMapping appended to a ValueMappingEstimator
for the reverse lookup.

Fixes #2086
Fixes #2083

@singlis singlis requested review from Ivanidzo4ka , wschin and sfilipi Jan 9, 2019

Scott Inglis
-Fixing runs for release build. Code that needed to run was wrapped i…
…n a Host.Assert and would not execute on release build. This is changed to a Host.Check
@@ -191,18 +192,15 @@ private static ValueGetter<VBuffer<ReadOnlyMemory<char>>> GetKeyValueGetter<TKey
// set of values. This is used for generating the metadata of
// the column.
HashSet<TValue> valueSet = new HashSet<TValue>();
HashSet<TKey> keySet = new HashSet<TKey>();
for (int i = 0; i < values.Count(); ++i)
{
var v = values.ElementAt(i);

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Jan 11, 2019

Member

Can you replace it with foreach loop?
I found terrifying idea of fetching i-th element every time out of IEnumerable #Closed

This comment has been minimized.

@singlis

singlis Jan 11, 2019

Member

absolutely.


In reply to: 246975951 [](ancestors = 246975951)


var result = t.Transform(dataView);
var cursor = result.GetRowCursor((col) => true);
var getterD = cursor.GetGetter<ReadOnlyMemory<char>>(6);

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Jan 11, 2019

Member

6 [](start = 65, length = 1)

magic numbers, can you do result.Schema["DOutput"].Index? #Closed

var data = new[] { new TestClass() { A = "bar", B = "test", C = "notfound" } };
var dataView = ComponentCreation.CreateDataView(Env, data);

IEnumerable<ReadOnlyMemory<char>> keys = new List<ReadOnlyMemory<char>>() { "foo".AsMemory(), "bar".AsMemory(), "test".AsMemory(), "wahoo".AsMemory() };

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Jan 11, 2019

Member

IEnumerable<ReadOnlyMemory> [](start = 12, length = 33)

what's wrong with var? #Closed


// The expected values will contain the generated key type values starting from 1.
ReadOnlyMemory<char> dValue = default;
getterD(ref dValue);

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Jan 11, 2019

Member

dValue [](start = 24, length = 6)

can we validate it's a "bar"? #Closed

@Ivanidzo4ka
Copy link
Member

Ivanidzo4ka left a comment

:shipit:

@@ -100,13 +100,14 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema)
var isKey = Transformer.ValueColumnType.IsKey;
var columnType = (isKey) ? PrimitiveType.FromKind(DataKind.U4) :
Transformer.ValueColumnType;
var metadata = SchemaShape.Create(Transformer.ValueColumnMetadata.Schema);

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member
Suggested change Beta
var metadata = SchemaShape.Create(Transformer.ValueColumnMetadata.Schema);
var metadataShape = SchemaShape.Create(Transformer.ValueColumnMetadata.Schema);

Maybe? #Resolved

This comment has been minimized.

@singlis

singlis Jan 14, 2019

Member

Sure, I will update.


In reply to: 247282604 [](ancestors = 247282604)

@@ -100,13 +100,14 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema)
var isKey = Transformer.ValueColumnType.IsKey;
var columnType = (isKey) ? PrimitiveType.FromKind(DataKind.U4) :
Transformer.ValueColumnType;
var metadata = SchemaShape.Create(Transformer.ValueColumnMetadata.Schema);
foreach (var (Input, Output) in _columns)
{
if (!inputSchema.TryFindColumn(Input, out var originalColumn))
throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", Input);

// Get the type from TOutputType

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member

What is TOutputType here? Is it not Output? #Resolved

This comment has been minimized.

@singlis

singlis Jan 14, 2019

Member

That is old, I used to have the generic arguments be TInputType, TOutputType, but then changed to TKeyType and TValueType. I will update the comment.


In reply to: 247282739 [](ancestors = 247282739)

foreach (var (Input, Output) in _columns)
{
if (!inputSchema.TryFindColumn(Input, out var originalColumn))
throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", Input);

// Get the type from TOutputType
var col = new SchemaShape.Column(Output, vectorKind, columnType, isKey, originalColumn.Metadata);
var col = new SchemaShape.Column(Output, vectorKind, columnType, isKey, metadata);

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member
Suggested change Beta
var col = new SchemaShape.Column(Output, vectorKind, columnType, isKey, metadata);
var col = new SchemaShape.Column(OutputColumnName, vectorKind, columnType, isKey, metadata);
``` #WontFix

This comment has been minimized.

@singlis

singlis Jan 14, 2019

Member

Although Name is more specific to what the object actually is, I think it goes against thinking of these as columns since you are talking about a input/output (or source/name). I would prefer to keep it as Output/Input. BTW - this will also go through another revision with #2064


In reply to: 247282892 [](ancestors = 247282892)

var values = new List<ReadOnlyMemory<char>>() { "foo1".AsMemory(), "foo2".AsMemory(), "foo1".AsMemory(), "foo3".AsMemory() };

var estimator = new ValueMappingEstimator<ReadOnlyMemory<char>, ReadOnlyMemory<char>>(Env, keys, values, true, new[] { ("A", "D"), ("B", "E"), ("C", "F") })
.Append(new KeyToValueMappingEstimator(Env, ("D","DOutput")));

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member
Suggested change Beta
.Append(new KeyToValueMappingEstimator(Env, ("D","DOutput")));
.Append(new KeyToValueMappingEstimator(Env, ("D", "DOutput")));

Please format all files touched. #Resolved

This comment has been minimized.

@singlis

singlis Jan 14, 2019

Member

Thanks, they are both formatted now.


In reply to: 247284278 [](ancestors = 247284278)

// Generating the list of strings for the key type values, note that foo1 is duplicated as intended to test that the same index value is returned
var values = new List<ReadOnlyMemory<char>>() { "foo1".AsMemory(), "foo2".AsMemory(), "foo1".AsMemory(), "foo3".AsMemory() };

var estimator = new ValueMappingEstimator<ReadOnlyMemory<char>, ReadOnlyMemory<char>>(Env, keys, values, true, new[] { ("A", "D"), ("B", "E"), ("C", "F") })

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member

Why do you need three columns? Only "D" is used below. #Resolved

This comment has been minimized.

@singlis

singlis Jan 14, 2019

Member

True - that is a result of copying code from another test. I have removed the other two columns.


In reply to: 247285144 [](ancestors = 247285144)

@singlis singlis self-assigned this Jan 14, 2019

Scott Inglis added some commits Jan 14, 2019

@wschin

wschin approved these changes Jan 15, 2019

@singlis singlis merged commit ef638f4 into dotnet:master Jan 15, 2019

2 checks passed

MachineLearning-CI #20190114.30 succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment