New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds string support for ValueMappingEstimator #2115

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@singlis
Copy link
Member

singlis commented Jan 10, 2019

This adds support for strings to be used for keys and values for the ValueMappingEstimator (as opposed to ReadOnlyMemory).

This also converts all tests to use string types as well as adds an
additional test for string vector values.

Fixes #2114

Scott Inglis
This adds support for strings to be used for keys and values for the
ValueMappingEstimator (as opposed to ReadOnlyMemory<char>).
This also converts all tests to use string types as well as adds an
additional test for string vector values.

Fixes #2114

@singlis singlis requested review from codemzs , Ivanidzo4ka , wschin , yaeldekel and sfilipi Jan 10, 2019

dataViewBuilder.AddColumn(valueColumnName, valueType, values.ToArray());
AddColumnWrapper(dataViewBuilder, keyColumnName, keyType, keys.ToArray());
AddColumnWrapper(dataViewBuilder, valueColumnName, valueType, values.ToArray());
//dataViewBuilder.AddColumn(valueColumnName, valueType, values.ToArray());

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Jan 11, 2019

Member

//dataViewBuilder.AddColumn(valueColumnName, valueType, values.ToArray()); [](start = 12, length = 74)

can be removed. #Resolved

@@ -165,8 +182,9 @@ private static ValueGetter<VBuffer<ReadOnlyMemory<char>>> GetKeyValueGetter<TKey
var keyType = GetPrimitiveType(typeof(TKey), out bool isKeyVectorType);
var valueType = GetPrimitiveType(typeof(TValue), out bool isValueVectorType);
var dataViewBuilder = new ArrayDataViewBuilder(env);
dataViewBuilder.AddColumn(keyColumnName, keyType, keys.ToArray());

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Jan 11, 2019

Member

keyType [](start = 53, length = 7)

in old code if you pass TKey as string, keyType for it would be ReadOnlyMemory, why you need what wrapping around? #Pending

This comment has been minimized.

@singlis

singlis Jan 11, 2019

Member

Unfortunately the old code did not work if TKey was of string type.

So here is what happens:
ArrayDataViewBuilder.AddColumn will fail if T is of type string. The reason for this is for two reasons:

  1. The PrimtiveType.RawType is a ReadonlyMemory which does not match type of string
  2. The string values are not converted to ReadOnlyMemory.

ArrayDataViewBuilder.AddColumn does have an implementation that takes in a string, but that is not called when you are calling it from a function that is using a generic argument. So I added this wrapping that will check if the type is string, if so, convert to ReadOnlyMemoryChar and pass that to the builder.


In reply to: 246977957 [](ancestors = 246977957)

internal static void AddColumnWrapper<T>(ArrayDataViewBuilder builder, string columnName, PrimitiveType primitiveType, T[] values)
{
if (typeof(T) == typeof(string))
builder.AddColumn(columnName, values.Select(x=>x.ToString()).ToArray());

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Jan 11, 2019

Member

values.Select(x=>x.ToString() [](start = 46, length = 29)

T[] values
typeof(T) == typeof(string)

why you need to call ToString() on string?

This comment has been minimized.

@singlis

singlis Jan 11, 2019

Member

Because I need to convert to a memory stream with AsMemory, which needs to operate on a string type. .ToString ensures that whatever type T is, it will be done on a string.


In reply to: 246978062 [](ancestors = 246978062)

This comment has been minimized.

@sfilipi

sfilipi Jan 12, 2019

Member

but this condition will get hit only if typeof(T) == typeof(string); so you know for sure values is a string..


In reply to: 247232387 [](ancestors = 247232387,246978062)

This comment has been minimized.

@singlis

singlis Jan 14, 2019

Member

the compiler doesnt know that since T can be of any type. I can cast to string -- would that be more preferable?


In reply to: 247304740 [](ancestors = 247304740,247232387,246978062)

@@ -153,6 +153,23 @@ private static ValueGetter<VBuffer<ReadOnlyMemory<char>>> GetKeyValueGetter<TKey
};
}

internal static void AddColumnWrapper<T>(ArrayDataViewBuilder builder, string columnName, PrimitiveType primitiveType, T[] values)
{
if (typeof(T) == typeof(string))

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member

typeof(string [](start = 29, length = 13)

Do we want to support ReadOnlyMemory<char>? It's used everywhere in ML.NET so having a direct support could be nice. #Resolved

This comment has been minimized.

@singlis

singlis Jan 11, 2019

Member

Yes - ValueMappingEstimator will continue to work with ReadOnlyMemory with these changes.


In reply to: 247216613 [](ancestors = 247216613)

{
if (typeof(T) == typeof(string))

builder.AddColumn(columnName, primitiveType, values.Select(x => x.Select( y => y.ToString().AsMemory() ).ToArray()).ToArray());

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member

x => x.Select( y => y.ToString().AsMemory() ).ToArray()) [](start = 75, length = 56)

Please try avoiding recursive structure in one line. #Resolved

This comment has been minimized.

@singlis

singlis Jan 11, 2019

Member

Its now expanded into a few more lines...


In reply to: 247217068 [](ancestors = 247217068)

@@ -153,6 +153,23 @@ private static ValueGetter<VBuffer<ReadOnlyMemory<char>>> GetKeyValueGetter<TKey
};
}

internal static void AddColumnWrapper<T>(ArrayDataViewBuilder builder, string columnName, PrimitiveType primitiveType, T[] values)
{

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member

Doc string please. #Resolved

}

internal static void AddColumnWrapper<T>(ArrayDataViewBuilder builder, string columnName, PrimitiveType primitiveType, T[][] values)
{

This comment has been minimized.

@wschin

wschin Jan 11, 2019

Member

Doc string! #Resolved

Scott Inglis
@wschin

wschin approved these changes Jan 11, 2019

Copy link
Member

wschin left a comment

LGTM.

@singlis singlis self-assigned this Jan 14, 2019

@singlis

This comment has been minimized.

Copy link
Member

singlis commented Jan 18, 2019

@Ivanidzo4ka and @sfilipi the code has been updated and there are a few open issues that I have responded to. Can you take a look when you have a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment