Skip to content

Conversation

@Ivanidzo4ka
Copy link
Contributor

fixes #1982

using (FileStream fs = File.OpenRead(dropModelPath))
{
var result = ModelFileUtils.LoadTransforms(Env, data.AsDynamic, fs);
var foundColumnFeature = result.Schema.TryGetColumnIndex("Features", out int featureIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foundColumnFeature [](start = 20, length = 18)

Were you intending to assert this somewhere? Also, could you please not use the ISchema derived functions, but the newer Schema ones? We're going to be getting rid of the ISchema ones sooner or later. In this case, preferred would be GetColumnOrNull.

public Column? GetColumnOrNull(string name)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Ivanidzo4ka !

}

[Fact]
void TestNgramCompatColumns()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new test! Should we be testing the specific output values too? (or is this somewhere I'm not seeing it?)

@justinormont
Copy link
Contributor

Not to nickle-and-dime, but is 273KB needed to add to the repo? Existing repo zips to 18.3MB. Seems a simple ngram-hash model should be tiny.

Looks like space in your model is from an ngram dictionary & normalizer. Perhaps set bits=10 & max=100 to minify.

Command: (now 18KB)
maml.exe Train loader=TextLoader{col=Sentiment:R4:0 col=SentimentText:TX:1 header=+} data=D:\src\fork-machinelearning\test\data\wikipedia-detox-250-line-data.tsv out=d:\tlc\1.model.zip xf=CopyColumns{col=Label:Sentiment} xf=WordBagTransform{col=Column:SentimentText tok=WordTokenizeTransform weighting=TfIdf max=100} xf=CategoricalTransform{col=Column kind=Key} xf=NgramHashTransform{col=Features1:Column all=- rehash=+ bits=10} xf=NgramTransform{col=Features2:Column max=100} xf=Concat{col=Features:Features1,Features2}

void TestNgramCompatColumns()
{
string dropModelPath = GetDataPath("backcompat/ngram.zip");
string sentimentDataPath = GetDataPath("wikipedia-detox-250-line-data.tsv");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"wikipedia-detox-250-line-data.tsv" [](start = 51, length = 35)

should we stick to TestDatasets.Sentiment.trainFilename

loaderAssemblyName: typeof(NgramHashingTransformer).Assembly.FullName);
}

private const int VersionTransformer = 0x00010003;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VersionTransformer [](start = 26, length = 18)

it would be more intuitite, i think, if called: CurrentVersion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble with naming it current version is that it is not a descriptive name, and it could change in the future. it is better to describe what changed then. The reason is, if we think about how these are used, they are used in the conditional checks during loading to do this or that. So, if we had some model format change in such a way that we added some information "Foo," then we should name that VersionFoo or something, so that the inevitable conditional test, if (header.Version >= VersionFoo) or what have you, makes sense. That would not if it was named CurrentVersion. It would also lead to bugs, since people might say, "hey, the current version changed, I'll change this." But it is absolutely essential that they do not, since we are using that field for a very specific test. So it is absolutely essential that it not be named something like CurrentVersion.

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley TomFinley merged commit a9f3b4c into dotnet:master Jan 2, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NGramHashingTransformer cannot read old models

4 participants