-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Support back compat for ngram hash #1988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| using (FileStream fs = File.OpenRead(dropModelPath)) | ||
| { | ||
| var result = ModelFileUtils.LoadTransforms(Env, data.AsDynamic, fs); | ||
| var foundColumnFeature = result.Schema.TryGetColumnIndex("Features", out int featureIdx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foundColumnFeature [](start = 20, length = 18)
Were you intending to assert this somewhere? Also, could you please not use the ISchema derived functions, but the newer Schema ones? We're going to be getting rid of the ISchema ones sooner or later. In this case, preferred would be GetColumnOrNull.
| public Column? GetColumnOrNull(string name) |
TomFinley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Ivanidzo4ka !
| } | ||
|
|
||
| [Fact] | ||
| void TestNgramCompatColumns() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new test! Should we be testing the specific output values too? (or is this somewhere I'm not seeing it?)
|
Not to nickle-and-dime, but is 273KB needed to add to the repo? Existing repo zips to 18.3MB. Seems a simple ngram-hash model should be tiny. Looks like space in your model is from an ngram dictionary & normalizer. Perhaps set Command: (now 18KB) |
| void TestNgramCompatColumns() | ||
| { | ||
| string dropModelPath = GetDataPath("backcompat/ngram.zip"); | ||
| string sentimentDataPath = GetDataPath("wikipedia-detox-250-line-data.tsv"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"wikipedia-detox-250-line-data.tsv" [](start = 51, length = 35)
should we stick to TestDatasets.Sentiment.trainFilename
| loaderAssemblyName: typeof(NgramHashingTransformer).Assembly.FullName); | ||
| } | ||
|
|
||
| private const int VersionTransformer = 0x00010003; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VersionTransformer [](start = 26, length = 18)
it would be more intuitite, i think, if called: CurrentVersion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with naming it current version is that it is not a descriptive name, and it could change in the future. it is better to describe what changed then. The reason is, if we think about how these are used, they are used in the conditional checks during loading to do this or that. So, if we had some model format change in such a way that we added some information "Foo," then we should name that VersionFoo or something, so that the inevitable conditional test, if (header.Version >= VersionFoo) or what have you, makes sense. That would not if it was named CurrentVersion. It would also lead to bugs, since people might say, "hey, the current version changed, I'll change this." But it is absolutely essential that they do not, since we are using that field for a very specific test. So it is absolutely essential that it not be named something like CurrentVersion.
sfilipi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
fixes #1982