-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Support back compat for ngram hash #1988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,7 @@ | |
| using Microsoft.ML; | ||
| using Microsoft.ML.Data; | ||
| using Microsoft.ML.Data.IO; | ||
| using Microsoft.ML.Model; | ||
| using Microsoft.ML.RunTests; | ||
| using Microsoft.ML.StaticPipe; | ||
| using Microsoft.ML.Tools; | ||
|
|
@@ -172,7 +173,7 @@ public void StopWordsRemoverFromFactory() | |
| string sentimentDataPath = GetDataPath("wikipedia-detox-250-line-data.tsv"); | ||
| var data = TextLoader.Create(ML, new TextLoader.Arguments() | ||
| { | ||
| Column = new [] | ||
| Column = new[] | ||
| { | ||
| new TextLoader.Column("Text", DataKind.TX, 1) | ||
| } | ||
|
|
@@ -212,7 +213,7 @@ public void WordBagWorkout() | |
| .Read(sentimentDataPath); | ||
|
|
||
| var est = new WordBagEstimator(Env, "text", "bag_of_words"). | ||
| Append(new WordHashBagEstimator(Env, "text", "bag_of_wordshash", invertHash:-1)); | ||
| Append(new WordHashBagEstimator(Env, "text", "bag_of_wordshash", invertHash: -1)); | ||
|
|
||
| // The following call fails because of the following issue | ||
| // https://github.com/dotnet/machinelearning/issues/969 | ||
|
|
@@ -269,6 +270,23 @@ public void NgramWorkout() | |
| Done(); | ||
| } | ||
|
|
||
| [Fact] | ||
| void TestNgramCompatColumns() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the new test! Should we be testing the specific output values too? (or is this somewhere I'm not seeing it?) |
||
| { | ||
| string dropModelPath = GetDataPath("backcompat/ngram.zip"); | ||
| string sentimentDataPath = GetDataPath("wikipedia-detox-250-line-data.tsv"); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
should we stick to TestDatasets.Sentiment.trainFilename |
||
| var data = TextLoader.CreateReader(ML, ctx => ( | ||
| Sentiment: ctx.LoadBool(0), | ||
| SentimentText: ctx.LoadText(1)), hasHeader: true) | ||
| .Read(sentimentDataPath); | ||
| using (FileStream fs = File.OpenRead(dropModelPath)) | ||
| { | ||
| var result = ModelFileUtils.LoadTransforms(Env, data.AsDynamic, fs); | ||
| var featureColumn = result.Schema.GetColumnOrNull("Features"); | ||
| Assert.NotNull(featureColumn); | ||
| } | ||
| } | ||
|
|
||
| [Fact] | ||
| public void LdaWorkout() | ||
| { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be more intuitite, i think, if called: CurrentVersion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with naming it current version is that it is not a descriptive name, and it could change in the future. it is better to describe what changed then. The reason is, if we think about how these are used, they are used in the conditional checks during loading to do this or that. So, if we had some model format change in such a way that we added some information "Foo," then we should name that
VersionFooor something, so that the inevitable conditional test,if (header.Version >= VersionFoo)or what have you, makes sense. That would not if it was namedCurrentVersion. It would also lead to bugs, since people might say, "hey, the current version changed, I'll change this." But it is absolutely essential that they do not, since we are using that field for a very specific test. So it is absolutely essential that it not be named something likeCurrentVersion.