-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
P2Priority of the issue for triage purpose: Needs to be fixed at some point.Priority of the issue for triage purpose: Needs to be fixed at some point.enhancementNew feature or requestNew feature or request
Description
I have dataset in text file with 20 columns, 1st column is the class name (string), other columns are features (floats)
Here are first lines of this file
Class A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19
CS 61.00000 0.16855 0.00000 1.77778 3.00000 0.25375 0.07984 0.00169 0.02250 0.01535 0.07984 0.01027 0.27415 6.00000 4.00000 0.37649 3552.00000 0 26.00000
CS 316.00000 0.14823 15.00000 1.77778 10.00000 0.02352 0.00440 0.20407 0.00357 0.00914 0.03585 0.14171 0.01674 21.00000 4.00000 0.14961 4235.00000 0 17.00000
CS 176.00000 0.00000 20.00000 1.77778 3.00000 0.01850 0.19659 0.00469 0.03895 0.00000 0.19659 0.59670 0.19659 10.00000 5.00000 0.23767 3850.00000 0 24.00000
CS 133.00000 0.00000 4.00000 1.33333 3.00000 0.00049 0.01214 0.22827 0.18777 0.18778 0.12627 0.00915 0.18777 11.00000 7.00000 0.32619 1880.00000 0 16.00000
CS 140.00000 0.00000 14.00000 1.33333 1.00000 0.01787 0.02860 0.48472 0.02860 0.59853 0.02860 1.06538 0.02860 9.00000 7.00000 0.02860 1876.00000 0 142.00000
and the full file data.txt
Let's execute AutoML
mlnet auto-train --task
multiclass-classification--dataset "data.txt" --has-header --label-column-nameClass--max-exploration-time 10
as a results AutoML will generate ModelInput.cs file that starts like this
public class ModelInput
{
[ColumnName("Class"), LoadColumn(0)]
public string Class { get; set; }
[ColumnName("A1"), LoadColumn(1)]
public string A1 { get; set; }
[ColumnName("A2"), LoadColumn(2)]
public string A2 { get; set; }
[ColumnName("A3"), LoadColumn(3)]
public string A3 { get; set; }all columns are recognized as string instead of float 😢
as a result data pipeline also incorrect (OneHotEncoding was applied to numeric columns)
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Class", "Class")
.Append(mlContext.Transforms.Categorical.OneHotEncoding(new[]
{
new InputOutputColumnPair("A3", "A3"), new InputOutputColumnPair("A4", "A4"),
new InputOutputColumnPair("A5", "A5"), new InputOutputColumnPair("A14", "A14"),
new InputOutputColumnPair("A15", "A15"), new InputOutputColumnPair("A18", "A18")
}))
.Append(mlContext.Transforms.Categorical.OneHotHashEncoding(new[]
{
new InputOutputColumnPair("A1", "A1"), new InputOutputColumnPair("A2", "A2"),
new InputOutputColumnPair("A6", "A6"), new InputOutputColumnPair("A17", "A17"),
new InputOutputColumnPair("A19", "A19")
}))
.Append(mlContext.Transforms.Concatenate("Features",
new[] {"A3", "A4", "A5", "A14", "A15", "A18", "A1", "A2", "A6", "A17", "A19"}))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);Why in this case all columns recognized as strings?
Why in some columns OneHotHashEncoding was used instead of OneHotEncoding?
Metadata
Metadata
Assignees
Labels
P2Priority of the issue for triage purpose: Needs to be fixed at some point.Priority of the issue for triage purpose: Needs to be fixed at some point.enhancementNew feature or requestNew feature or request