New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Say good-bye to transpose ISchema #2111

Merged
merged 9 commits into from Jan 18, 2019

Conversation

Projects
None yet
5 participants
@wschin
Copy link
Member

wschin commented Jan 10, 2019

We remove ITransposeSchema and move GetSlotType to TransposeSlotTypes field in ITransposeDataView. Several derived classes' ISchema members are removed because they are not used at all.

Party of #1501 continues.

@wschin wschin self-assigned this Jan 10, 2019

@wschin wschin requested review from eerhardt and TomFinley Jan 10, 2019

@wschin wschin force-pushed the wschin:no-transpose-ischema branch from 85aa0a3 to 88b0ec0 Jan 10, 2019

@@ -44,15 +43,15 @@ internal interface ITransposeDataView : IDataView
/// The transpose schema returns the schema information of the view we have transposed.
/// </summary>
[BestFriend]
internal interface ITransposeSchema : ISchema
internal interface ITransposeSlotTypeHolder

This comment has been minimized.

@singlis

singlis Jan 11, 2019

Member

ITransposeSlotTypeHolder [](start = 23, length = 24)

Can this interface be removed and instead add a GetSlotType to ITransposeDataView? OneToOneTransformBase already derives from ITransposeDataView so it can contain the implementation for this function.

This comment has been minimized.

@TomFinley

TomFinley Jan 14, 2019

Contributor

Putting this method GetSlotType directly on the ITransposeDataView itself, and destroying the former ITransposeSchema seems a good plan to me. For the purpose of a public API, we may need to think more carefully about this, but we'll have to think very carefully about that anyway.


In reply to: 246979206 [](ancestors = 246979206)

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

Please kindly review the new look of ITransposeDataView. We basically replace GetSlotType in the old ITransposeSchema with VectorType[] TransposeSlotTypes in the new ITransposeDataView.


In reply to: 247593325 [](ancestors = 247593325,246979206)

@TomFinley

This comment has been minimized.

Copy link
Contributor

TomFinley commented Jan 14, 2019

public abstract class SlotCursor : IDisposable

Not your change, but just something I note... Seems like so much else of the transpose data view is internal, I have to imagine that if we [BestFriend] internal this nothing bad would happen. Is it not so? If so, we should probably do so. #Closed


Refers to: src/Microsoft.ML.Data/Data/SlotCursor.cs:13 in 64fe4d1. [](commit_id = 64fe4d1, deletion_comment = False)

@wschin wschin force-pushed the wschin:no-transpose-ischema branch from 01c0598 to 9eb180f Jan 17, 2019

@wschin

This comment has been minimized.

Copy link
Member

wschin commented Jan 17, 2019

public abstract class SlotCursor : IDisposable

Just tried making it a best friend. Complier doesn't like my changes because

    [BestFriend]
    internal interface ITransposeDataView : IDataView
    {
        /// <summary>
        /// Presents a cursor over the slots of a transposable column, or throws if the column
        /// is not transposable.
        /// </summary>
        SlotCursor GetSlotCursor(int col);

        /// <summary>
        /// <see cref="TransposeSlotTypes"/>[i] specifies the type of all values at the i-th column of <see cref="IDataView"/>.
        /// For example, if <see cref="IDataView.Schema"/>[i] is a scalar float column, then <see cref="TransposeSlotTypes"/>[i]
        /// may return a <see cref="VectorType"/> whose <see cref="VectorType.ItemType"/> field is <see cref="NumberType.R4"/>.
        /// If the i-th column can't be iterated column-wisely, <see cref="TransposeSlotTypes"/>[i] may be <see langword="null"/>.
        /// </summary>
        VectorType[] TransposeSlotTypes { get; }
    }

can return SlotCursor. If we make SlotCursor a best friend, it will be less accessible than GetSlotCursor in implementations of ITransposeDataView. If we add internal to GetSlotCursor, it will no longer be an implementation of ITransposeDataView's GetSlotCursor.


In reply to: 454099677 [](ancestors = 454099677)


Refers to: src/Microsoft.ML.Data/Data/SlotCursor.cs:13 in 57f863b. [](commit_id = 57f863b, deletion_comment = False)

@eerhardt

This comment has been minimized.

Copy link
Member

eerhardt commented Jan 17, 2019

@wschin - a way to make a public class implement an internal interface and not make the methods public, is to explicitly implement the interface.
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/interfaces/explicit-interface-implementation
#Resolved

@wschin

This comment has been minimized.

Copy link
Member

wschin commented Jan 17, 2019

@eerhardt, I have used that pattern on TransposeSlotTypes in ITransposeDataView. Are you saying I should do the same to make SlotCursor not externally-accessible? Note that SlotCursor is a class. #Resolved

@wschin wschin referenced this pull request Jan 17, 2019

Merged

[RIP] ISchema #1846

16 of 16 tasks complete
Assert.Null(trans.TransposeSchema.GetSlotType(0));
Assert.Null(trans.TransposeSchema.GetSlotType(1));
Assert.Null(trans.TransposeSchema.GetSlotType(2));
Assert.Null((trans as ITransposeDataView).TransposeSlotTypes[0]);

This comment has been minimized.

@eerhardt

eerhardt Jan 17, 2019

Member

Can you just do the cast (trans as ITransposeDataView) 1 time instead of 3 times in a row?

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

We have only one (ITransposeDataView)trans now.


In reply to: 248704226 [](ancestors = 248704226)

@eerhardt

This comment has been minimized.

Copy link
Member

eerhardt commented Jan 17, 2019

Are you saying I should do the same to make SlotCursor not externally-accessible?

Yes. If we want to make SlotCursor internal, you can use explicitly implemented interfaces so all the classes that implement ITransposeDataView can be public, but don't need to expose a public SlotCursor GetSlotCursor(int col) method. So if you make SlotCursor internal, and then explicitly implement ITransposeDataView.GetSlotCursor on all the implementations, you should be able to compile successfully.

/// </summary>
VectorType GetSlotType(int col);
VectorType[] TransposeSlotTypes { get; }

This comment has been minimized.

@TomFinley

TomFinley Jan 17, 2019

Contributor

VectorType[] [](start = 8, length = 12)

I would prefer the old method, if possible. However, if you insist on replacing it with a collection, I'd prefer that it be something not mutable. Like, say, a read only list, or an System.Collections.Immutable array, or something like this. Having what is essentially part of the schema be a mutable structure is not an acceptable state even for something internal. #Closed

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

Your wish is granted.


In reply to: 248777208 [](ancestors = 248777208)

This comment has been minimized.

@TomFinley

TomFinley Jan 17, 2019

Contributor

Cool, I have two left I think!


In reply to: 248815747 [](ancestors = 248815747,248777208)

@TomFinley

This comment has been minimized.

Copy link
Contributor

TomFinley commented Jan 17, 2019

public abstract class SlotCursor : IDisposable

Remember explicit interface implementations.


In reply to: 455007710 [](ancestors = 455007710,454099677)


Refers to: src/Microsoft.ML.Data/Data/SlotCursor.cs:13 in d693ad9. [](commit_id = d693ad9, deletion_comment = False)

@@ -582,10 +586,10 @@ public RowCursor[] GetRowCursorSet(Func<int, bool> predicate, int n, Random rand
return View.GetRowCursorSet(predicate, n, rand);
}

public SlotCursor GetSlotCursor(int col)
SlotCursor ITransposeDataView.GetSlotCursor(int col)

This comment has been minimized.

@TomFinley

TomFinley Jan 17, 2019

Contributor

SlotCursor ITransposeDataView.GetSlotCursor [](start = 8, length = 43)

Ah, I see you figured out the explicit interface implementations after all. :) Good work. #Closed

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

Thanks to your and Eric's comments.


In reply to: 248778138 [](ancestors = 248778138)

/// <see cref="TransposeSlotTypes"/>[i] specifies the type of all values at the i-th column of <see cref="IDataView"/>.
/// For example, if <see cref="IDataView.Schema"/>[i] is a scalar float column, then <see cref="TransposeSlotTypes"/>[i]
/// may return a <see cref="VectorType"/> whose <see cref="VectorType.ItemType"/> field is <see cref="NumberType.R4"/>.
/// If the i-th column can't be iterated column-wisely, <see cref="TransposeSlotTypes"/>[i] may be <see langword="null"/>.

This comment has been minimized.

@TomFinley

TomFinley Jan 17, 2019

Contributor

cref="TransposeSlotTypes"/>[i] may be [](start = 69, length = 60)

Also apparently, from what I see of the usage, even the array itself might be null. The more I read of the new code the more I think we got it right the first time with a method, not exposing some collection that we have to test whether it is null or not before we even think about querying... yeah. #Closed

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

Just replaced this with a function, GetSlotType(int col). Many thanks.


In reply to: 248780062 [](ancestors = 248780062)

Assert.Null(trans.TransposeSchema.GetSlotType(0));
Assert.Null(trans.TransposeSchema.GetSlotType(1));
Assert.Null(trans.TransposeSchema.GetSlotType(2));
var itdv = trans as ITransposeDataView;

This comment has been minimized.

@TomFinley

TomFinley Jan 17, 2019

Contributor

trans as ITransposeDataView [](start = 27, length = 27)

When we are certain that something implements an interface, please just use a () style cast. I notice you are not testing it, so, you are presumably sure that it does have this.

Look at it this way... If you happen to be wrong with (), you will get a cast error. If you happen to be wrong with as, you will get a null exception... which could also happen because apparently it is legal for .TransposeSlotTypes to be null (to judge from other code), and you did not test for that. So it is more direct. #Closed

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

Ok. I also checked some other places by finding references to ITransposeDataView in VS.


In reply to: 248781623 [](ancestors = 248781623)

@@ -268,7 +268,8 @@ private bool UseTranspose(bool? useTranspose, RoleMappedData data)

if (useTranspose.HasValue)
return useTranspose.Value;
return data.Data is ITransposeDataView td && td.TransposeSchema.GetSlotType(data.Schema.Feature.Value.Index) != null;
return data.Data is ITransposeDataView td && td.TransposeSlotTypes != null
&& td.TransposeSlotTypes[data.Schema.Feature.Value.Index] != null;

This comment has been minimized.

@TomFinley

TomFinley Jan 17, 2019

Contributor

So here we see another reason -- every time I see this new array used, I see that it has made the usage more complex, not less. #Closed

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

You're right. Will change that array back to method in the next iteration.


In reply to: 248788605 [](ancestors = 248788605)

@wschin

This comment has been minimized.

Copy link
Member

wschin commented Jan 17, 2019

public abstract class SlotCursor : IDisposable

No problem!


In reply to: 455268001 [](ancestors = 455268001,455007710,454099677)


Refers to: src/Microsoft.ML.Data/Data/SlotCursor.cs:13 in d693ad9. [](commit_id = d693ad9, deletion_comment = False)

@wschin

This comment has been minimized.

Copy link
Member

wschin commented Jan 17, 2019

Thanks to your guidance. SlotCursor is invisible from outside now.


In reply to: 455205205 [](ancestors = 455205205)

@TomFinley
Copy link
Contributor

TomFinley left a comment

Looking good thanks @wschin .

// inspect the schema. We also want to ensure that the useful property that
// a cursor and view's schemas are the same, is preserved, which allows us
// to use the cursors from the schema view if convenient to do so.
public Schema Schema => _schemaEntry.GetView().Schema;

This comment has been minimized.

@eerhardt

eerhardt Jan 17, 2019

Member

(minor) how often is this called? Usually having a property call a method every time sort of "smells" because properties are supposed to be fast. And calling a method may be slow. Would we want to cache this?

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

It should be very minor because most of our time is spent on training something.

@@ -845,7 +793,7 @@ private void Init(int col)
Ch.Assert(0 <= col && col < Schema.Count);
Ch.Assert(_colToActivesIndex[col] >= 0);
var type = Schema[col].Type;
Ch.Assert(((ITransposeDataView)_parent).TransposeSchema.GetSlotType(col).GetValueCount() == _parent._header.RowCount);
Ch.Assert((type as VectorType).GetValueCount() == _parent._header.RowCount);

This comment has been minimized.

@eerhardt

eerhardt Jan 17, 2019

Member

GetValueCount() will throw if you call it on null, so it would be more appropriate to cast (VectorType)type here.

This comment has been minimized.

@wschin

wschin Jan 17, 2019

Member

Fixed. Thanks.

@eerhardt
Copy link
Member

eerhardt left a comment

:shipit:

@wschin wschin force-pushed the wschin:no-transpose-ischema branch from 64fe4d1 to d709c22 Jan 17, 2019

@@ -83,7 +83,7 @@ public void LogEventProcessesMessages()
env.Log += (sender, e) => messages.Add(e.Message);

// create a dummy text reader to trigger log messages
env.Data.CreateTextReader(
env.Data.CreateTextLoader(

This comment has been minimized.

@eerhardt

eerhardt Jan 17, 2019

Member

We should get this change in soon since the build is currently broken. If we can't check this in right away, let's get a separate PR out with just fixing the build.

/cc @najeeb-kazmi

@wschin wschin force-pushed the wschin:no-transpose-ischema branch from 4d65404 to 6418e3c Jan 17, 2019

@wschin wschin merged commit bafd40c into dotnet:master Jan 18, 2019

2 checks passed

MachineLearning-CI #20190117.34 succeeded
Details
license/cla All CLA requirements met.
Details

@wschin wschin deleted the wschin:no-transpose-ischema branch Jan 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment