New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StochasticDualCoordinateAscent not work For Multiclass after migrate to 0.10.0 #2486

Open
DevLob-zz opened this Issue Feb 9, 2019 · 19 comments

Comments

Projects
7 participants
@DevLob-zz
Copy link

DevLob-zz commented Feb 9, 2019

System information

  • OS version/distro: W10
  • .NET Version (eg., dotnet --info): 4.6.1

Issue

  • What did you do? Migrated my code from 0.9.0 to 0.10.0
  • What happened? StochasticDualCoordinateAscent Algorithm was working fine for multi class and binary trainer before move to 0.10 after updating its working only for binary and freeze on
    trainingPipeline.Fit(trainingDataView); for Multiclass take a while
  • What did you expect?
    it should work fine

Source code / logs

`var mlContext = new MLContext(seed: 1);

#region "STEP 1: Common data loading configuration"

IDataView trainingDataView = GetNormalDataSet(mlContext, allFeatures, mLFeatures);

if (trainingDataView.GetRowCount() == 0)
{

    return;
}

textFeatures = GetTextFeatures(normalFeatures);

numericFeatures = GetNumericFeatures(normalFeatures).ToArray();

#endregion

#region "STEP 2: Common data process configuration with pipeline data transformations"

// STEP 2: Common data process configuration with pipeline data transformations

var textFeaturesProcessPipeline = mlContext.Transforms.Text.FeaturizeText(DefaultColumnNames.Features, textFeatures);

var numericFeaturesProcessPipeline = mlContext.Transforms.Concatenate(DefaultColumnNames.Features, numericFeatures);

var dataProcessPipeline = numericFeaturesProcessPipeline.Append(textFeaturesProcessPipeline).AppendCacheCheckpoint(mlContext);

#endregion

#region "STEP 3: Set the training algorithm, then create and configure the modelBuilder"

ITransformer trainedModel = null;

//"StochasticDualCoordinateAscent"

var trainingPipeline = mlContext.Transforms.Conversion.MapValueToKey(DefaultColumnNames.Label)
.Append(dataProcessPipeline)
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(labelColumn: DefaultColumnNames.Label, featureColumn: DefaultColumnNames.Features))
.Append(mlContext.Transforms.Conversion.MapKeyToValue(DefaultColumnNames.PredictedLabel));

#region STEP 4: Train the model fitting to the DataSet

//Take a while and no responce when call fit method

trainedModel = trainingPipeline.Fit(trainingDataView);

#endregion`

you can see some screen for values and when change from
"StochasticDualCoordinateAscent" to "Naive Bayes" Working fine

what wrong on my code
image
image
image

also those my Data Structure Classes
`[Serializable]
public class NormalTagsModelFeatures
{
//[Column(ordinal: "0", name: "Label")] public string Label;
[LoadColumn(0)]
public string Label;
[LoadColumn(1)]
public float fontSize;
[LoadColumn(2)]
public float isBold;
[LoadColumn(3)]
public float isItalic;
[LoadColumn(4)]
public float isUnderLine;
[LoadColumn(5)]
public float containsDot;
[LoadColumn(6)]
public float containsQuestionMark;
[LoadColumn(7)]
public string fontColor;
[LoadColumn(8)]
public float isAllCaps;
[LoadColumn(9)]
public string tagText;
[LoadColumn(10)]
public string firstWord;
[LoadColumn(11)]
public string FontName;
[LoadColumn(12)]
public float verticalText;
[LoadColumn(13)]
public float trdLeft;
[LoadColumn(14)]
public float trdRight;
[LoadColumn(15)]
public float trdTop;
[LoadColumn(16)]
public float trdBottom;
[LoadColumn(17)]
public float pageNo;

}

public class NormalTagsPrediction
{
    [ColumnName("PredictedLabel")]
    public string PredictedLabel;

    [ColumnName("Score")]
    public float[] Score { get; set; }

}`

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 9, 2019

Hint My dataView Contain 250K rows
Also worked fine within Naive Bayes and LogisticRegression

@rogancarr rogancarr added the bug label Feb 9, 2019

@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 9, 2019

@DevLob-zz Thanks for reporting this! I'll take a look at this!

Marking as need info until I repro it.

@rogancarr rogancarr self-assigned this Feb 9, 2019

@rogancarr rogancarr added need info and removed bug labels Feb 9, 2019

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 10, 2019

is there is info can i provide you with it , thank you for interest

best regards

@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 10, 2019

Hi @DevLob-zz ,

I've run a multiclass classification sample, GitHubIssueClassification in 0.9 and 0.10, and they get very similar times, so I don't think the problem is related to the SDCA trainer or it's API. Plus we haven't done any work on SDCA between the releases.

One thing that did change between 0.9 and 0.10 is the order of arguments to transforms. Do you do any featurization in GetNormalDataSet(mlContext, allFeatures, mLFeatures)? Is it possible that your column names have changed between the releases, and the learners are building features on different data?

@rogancarr rogancarr removed the need info label Feb 10, 2019

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 10, 2019

image
image

No the code never changed between 0.9 and 0.10 and still work fine for Naive Bayes and Logistic Regression for Multi class
SDCA Still work fine for Binary Classification but seem to deep into infinite loop wile Multi class

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 10, 2019

image
the code stop and no response in Fit Method in this line however if code going to other case same code just change algorithm working well
image

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 10, 2019

image

any help would be appreciated

@Ivanidzo4ka

This comment has been minimized.

Copy link
Member

Ivanidzo4ka commented Feb 11, 2019

mlContext has Log event.

 mlContext.Log += MlContext_Log;
private static void MlContext_Log(object sender, LoggingEventArgs e)
{
    Console.WriteLine(e.Message);
}

If we can look on this messages, it can be helpful to diagnose at what state it get stuck at least.
Do you get any success if you setup NumThreads in SDCA to 1?

We had this change #2152 which can be reason why it worked in 0.9 but stopped in 0.10, but as @rogancarr mentioned it works for us on some datasets, so your case can be quite tricky. It's hard to fix problem without ability to reproduce it.

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 11, 2019

mlContext has Log event.

 mlContext.Log += MlContext_Log;
private static void MlContext_Log(object sender, LoggingEventArgs e)
{
    Console.WriteLine(e.Message);
}

If we can look on this messages, it can be helpful to diagnose at what state it get stuck at least.
Do you get any success if you setup NumThreads in SDCA to 1?

We had this change #2152 which can be reason why it worked in 0.9 but stopped in 0.10, but as @rogancarr mentioned it works for us on some datasets, so your case can be quite tricky. It's hard to fix problem without ability to reproduce it.
this what the logger stop at
mlerror

Do you get any success if you setup NumThreads in SDCA to 1?
how can i set this value do you mean this
var mlContext = new MLContext(seed: 1,conc:1);
if yes i did and same
It's hard to fix problem without ability to reproduce it.
yes sure i will try to debug through The ML Source
and keep you posted

best regards

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 11, 2019

after integrate my code with the source i can see that the code stick here
image

and he stick here as he get numThreads = 2 so if i can set it to 1 i think he will never stick again
NumofCycle = 750000

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 11, 2019

i replace the trainer line with
var trainner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(new SdcaMultiClassTrainer.Options
{ FeatureColumn = DefaultColumnNames.Features ,
LabelColumn = DefaultColumnNames.Label,
NumThreads = 1
});

and same issue

this is my program .cs code i usesd

using System.Collections.Generic;
using Microsoft.Data.DataView;
using Microsoft.ML.Core.Data;
using Microsoft.ML.Data;
using Microsoft.ML.Samples.Dynamic;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.Samples
{
    internal static class Program
    {
        static void Main(string[] args)
        {
            List<NormalTagsModelFeatures> noramalTagsTrainingData = new List<NormalTagsModelFeatures>();
            noramalTagsTrainingData.Add(
                        new NormalTagsModelFeatures()
                        {
                            //Label = TagListDictionary.GetTagId(label),
                            Label = "Tag Test",
                            pageNo =  -1,
                            fontSize =  -1,
                            isBold =  -1,
                            isItalic =  -1,
                            isUnderLine =  -1,
                            containsDot =  -1,
                            containsQuestionMark =  -1,
                            isAllCaps =  -1,
                            fontColor =  "Ss",
                            FontName =  "Ss",
                            tagText =  "dd  dd",
                            firstWord = "dd",
                            trdBottom =  -1,
                            trdLeft =  -1,
                            trdRight = -1,
                            trdTop = -1,
                            verticalText = -1,
                        }
                    );
            List<string> textFeatures = new List<string>() { "firstWord", "tagText", "FontName", "fontColor" };
            string[] numericFeatures = new string[] { "pageNo", "fontSize", "isBold", "isItalic", "isUnderLine",
                "containsDot", "containsQuestionMark","isAllCaps","trdBottom","trdLeft","trdRight","trdTop","verticalText" } ;
            var mlContext = new MLContext(seed: 1);
            IDataView trainingDataView =mlContext.Data.ReadFromEnumerable(noramalTagsTrainingData);
            var textFeaturesProcessPipeline = mlContext.Transforms.Text.FeaturizeText(DefaultColumnNames.Features, textFeatures,new Transforms.Text.TextFeaturizingEstimator.Options());
            var numericFeaturesProcessPipeline = mlContext.Transforms.Concatenate(DefaultColumnNames.Features, numericFeatures);
            var dataProcessPipeline = numericFeaturesProcessPipeline.Append(textFeaturesProcessPipeline).AppendCacheCheckpoint(mlContext);
            var trainner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(new SdcaMultiClassTrainer.Options
            {
                FeatureColumn = DefaultColumnNames.Features,
                LabelColumn = DefaultColumnNames.Label,
                NumThreads = 1
            });
            trainner.AppendCacheCheckpoint(mlContext);
            var trainingPipeline = mlContext.Transforms.Conversion.MapValueToKey(DefaultColumnNames.Label)
           .Append(dataProcessPipeline)
           .Append(trainner)
           .Append(mlContext.Transforms.Conversion.MapKeyToValue(DefaultColumnNames.PredictedLabel));
            ITransformer trainedModel = trainingPipeline.Fit(trainingDataView);
            //TakeRows.Example();
        }
    }

    public class NormalTagsModelFeatures
    {
        //[Column(ordinal: "0", name: "Label")] public string Label;
        [LoadColumn(0)]
        public string Label;
        [LoadColumn(1)]
        public float fontSize;
        [LoadColumn(2)]
        public float isBold;
        [LoadColumn(3)]
        public float isItalic;
        [LoadColumn(4)]
        public float isUnderLine;
        [LoadColumn(5)]
        public float containsDot;
        [LoadColumn(6)]
        public float containsQuestionMark;
        [LoadColumn(7)]
        public string fontColor;
        [LoadColumn(8)]
        public float isAllCaps;
        [LoadColumn(9)]
        public string tagText;
        [LoadColumn(10)]
        public string firstWord;
        [LoadColumn(11)]
        public string FontName;
        [LoadColumn(12)]
        public float verticalText;
        [LoadColumn(13)]
        public float trdLeft;
        [LoadColumn(14)]
        public float trdRight;
        [LoadColumn(15)]
        public float trdTop;
        [LoadColumn(16)]
        public float trdBottom;
        [LoadColumn(17)]
        public float pageNo;

    }

    public class NormalTagsPrediction
    {
        [Column(ordinal: "0", name: "PredictedLabel")]
        public string Label;


        [ColumnName("Score")]
        public float[] Score { get; set; }

    }
}
@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 11, 2019

maxIterations = 1,500,000

what is the best value for maxIterations can i set and not affect the accuracy

image

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 11, 2019

var trainner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(new SdcaMultiClassTrainer.Options
{
FeatureColumn = DefaultColumnNames.Features,
LabelColumn = DefaultColumnNames.Label,
NumThreads = 1,
MaxIterations= 1000
});

so this solve my issue but

which MaxIterations will be recommenced based on your

@shauheen shauheen added this to To do in v0.11 Feb 11, 2019

@shauheen shauheen added the question label Feb 11, 2019

@wschin

This comment has been minimized.

Copy link
Member

wschin commented Feb 14, 2019

Number of iterations is data-dependent. You can try 10, 20, 40, 80, 160, ..., 640 to find the value leading to the best test accuracy. In addition, a small regularization coefficient may lead to overfitting so you need to terminate the training very early (e.g., just 1, 2, 4, 8, 16 iterations are enough).

@Ivanidzo4ka

This comment has been minimized.

Copy link
Member

Ivanidzo4ka commented Feb 14, 2019

It doesn't justify our auto selecting code to put 1 500 000 iterations over dataset.

@rogancarr rogancarr removed their assignment Feb 14, 2019

@DevLob-zz DevLob-zz closed this Feb 15, 2019

v0.11 automation moved this from To do to Done Feb 15, 2019

@DevLob-zz DevLob-zz reopened this Feb 15, 2019

@DevLob-zz DevLob-zz closed this Feb 15, 2019

@DevLob-zz DevLob-zz reopened this Feb 15, 2019

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 15, 2019

image

the code seem to be wait while here and never out from while when testing on 500K rows

i wait on this iteration more than one hour and never move to next

is we can add timeout or something that guarantee that he will out from while Loop

@eerhardt

This comment has been minimized.

Copy link
Member

eerhardt commented Feb 15, 2019

@rogancarr @Ivanidzo4ka @TomFinley - the above deadlock looks to be the same as #1095.

@shauheen shauheen moved this from Done to To do in v0.11 Feb 15, 2019

@artidoro

This comment has been minimized.

Copy link
Contributor

artidoro commented Feb 15, 2019

Hello @DevLob-zz thank you for sharing all this information.
I have a few questions for you:

  1. Does it just take a long time but eventually finish training, or is it stuck training?
  2. Is the issue showing only on v0.10 or did you have the same issue on v0.9? Are you building off of master? Or using the nuget for v0.10?
  3. Is this reproducible 100% of the time?

The above questions can be very helpful in identifying what's wrong.

Also, I am trying to reproduce this issue locally to fix it, and if you are willing to share the dataset I can take a closer look, no problem of course otherwise.

@DevLob-zz

This comment has been minimized.

Copy link
Author

DevLob-zz commented Feb 16, 2019

Hello @DevLob-zz thank you for sharing all this information.
I have a few questions for you:

  1. Does it just take a long time but eventually finish training, or is it stuck training?
    it never finish when it stuck
  2. Is the issue showing only on v0.10 or did you have the same issue on v0.9? Are you building off of master? Or using the nuget for v0.10?
    for 0.9 everything is OK but for 0.10 i issued this
    i try by nuget and the master this what show me this dead lock
  3. Is this reproducible 100% of the time?
    No

The above questions can be very helpful in identifying what's wrong.

Also, I am trying to reproduce this issue locally to fix it, and if you are willing to share the dataset I can take a closer look, no problem of course otherwise.

here is a program.cs that i used to generate more than 400K and try to generate the model

Hint : it work fine with small data set

Program.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment