Skip to content

Conversation

@wschin
Copy link
Member

@wschin wschin commented Dec 20, 2018

Follow a suggestion mentioned in #1424 to load inf/nan from trained LightGBM model.

@wschin wschin self-assigned this Dec 20, 2018
).ToArray();

return str.Split(delimiter)
.Select(s => double.TryParse(s.Replace("inf", "∞"), out double rslt) ? rslt :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Replace and .Contains are case sensitive - should these be insensitive?
I know the old code did not do this - but do we need to check for empty string? Or is it assumed that check has been done before this call?

Copy link
Member Author

@wschin wschin Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Select could filter out empty cases, I believe. Those strings are LightGBM format, so let's leave them as they are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this replacing inf with rather than with the culture-insensitive string (which is Infinity)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference?


In reply to: 243438498 [](ancestors = 243438498)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will only parse successfully for cultures where NumberFormatInfo.PositiveInfinitySymbol == "∞"; the latter is the invariant case and is stable/won't change. You should probably be doing: double.TryParse(value, out double result, CultureInfo.InvariantCulture) for anything that you aren't prepared to support culture specific inputs on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for this particular case, it is probably better to just have the following (or something very similar, depending on the exact inputs expected):

string sTrim = s.Trim();
if (sTrim.Equals("inf", StringComparison.OrdinalIgnoreCase))
{
    return double.PositiveInfinity;
}
else if (sTrim.Equals("-inf", StringComparison.OrdinalIgnoreCase))
{
    return double.NegativeInfinity;
}

double.Parse and double.TryParse require that the entire string match, so doing a string.Replace is relatively pointless and just incurs additional allocations/overhead

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor feedback...

Ivanidzo4ka
Ivanidzo4ka previously approved these changes Dec 20, 2018
@Ivanidzo4ka Ivanidzo4ka dismissed their stale review December 20, 2018 22:09

revoking review


return str.Split(delimiter)
.Select(s => double.TryParse(s.Replace("inf", "∞"), out double rslt) ? rslt :
(s.Contains("nan") ? double.NaN : throw new Exception($"Cannot parse as double: {s}")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw new Exception($"Cannot parse as double: {s}")) [](start = 70, length = 52)

please, never throw unmarked exception in your code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably either skip it completely or just return default value for double as we had before.


In reply to: 243433489 [](ancestors = 243433489)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do we want to fully mimic LightGBM behavior? If yes, we need this change. No, we can just close this PR.


In reply to: 243433959 [](ancestors = 243433959,243433489)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against your PR, I'm against throwing unmarked exception.
Sorry for my inability to express my thoughts in proper understandable way.

I would probably either skip it completely or just return default value for double as we had before

I don't mean your whole new code, I mean case where we can't parse double.
We have two options for that case:

  • Throw MARKED exception like throw env.Exception(message)
  • Use default value for double.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is expected that this throws, you should probably explicitly check the strings for inf/-inf/nan (#1934 (comment)) and fallback to just calling double.Parse (which will throw the appropriate format exception)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@Ivanidzo4ka Ivanidzo4ka dismissed their stale review December 20, 2018 22:57

revoking review

Ivanidzo4ka
Ivanidzo4ka previously approved these changes Jan 2, 2019
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka Ivanidzo4ka dismissed their stale review January 2, 2019 21:04

revoking review

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@wschin wschin merged commit c00911c into dotnet:master Jan 2, 2019
@wschin wschin deleted the fix-lgbm-nan branch January 2, 2019 21:13
{
var trimmed = token.Trim();

if (trimmed.Equals("inf", StringComparison.OrdinalIgnoreCase))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really work? The strings output by LightGBM are +nan.0, +inf.0, and -inf.0. How are you handling the trailing .0 bit?

@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants