-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Loads LightGBM inf/nan properly #1934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ).ToArray(); | ||
|
|
||
| return str.Split(delimiter) | ||
| .Select(s => double.TryParse(s.Replace("inf", "∞"), out double rslt) ? rslt : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Replace and .Contains are case sensitive - should these be insensitive?
I know the old code did not do this - but do we need to check for empty string? Or is it assumed that check has been done before this call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Select could filter out empty cases, I believe. Those strings are LightGBM format, so let's leave them as they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this replacing inf with ∞ rather than with the culture-insensitive string (which is Infinity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
∞ will only parse successfully for cultures where NumberFormatInfo.PositiveInfinitySymbol == "∞"; the latter is the invariant case and is stable/won't change. You should probably be doing: double.TryParse(value, out double result, CultureInfo.InvariantCulture) for anything that you aren't prepared to support culture specific inputs on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for this particular case, it is probably better to just have the following (or something very similar, depending on the exact inputs expected):
string sTrim = s.Trim();
if (sTrim.Equals("inf", StringComparison.OrdinalIgnoreCase))
{
return double.PositiveInfinity;
}
else if (sTrim.Equals("-inf", StringComparison.OrdinalIgnoreCase))
{
return double.NegativeInfinity;
}double.Parse and double.TryParse require that the entire string match, so doing a string.Replace is relatively pointless and just incurs additional allocations/overhead
singlis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor feedback...
|
|
||
| return str.Split(delimiter) | ||
| .Select(s => double.TryParse(s.Replace("inf", "∞"), out double rslt) ? rslt : | ||
| (s.Contains("nan") ? double.NaN : throw new Exception($"Cannot parse as double: {s}"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new Exception($"Cannot parse as double: {s}")) [](start = 70, length = 52)
please, never throw unmarked exception in your code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably either skip it completely or just return default value for double as we had before.
In reply to: 243433489 [](ancestors = 243433489)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: do we want to fully mimic LightGBM behavior? If yes, we need this change. No, we can just close this PR.
In reply to: 243433959 [](ancestors = 243433959,243433489)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against your PR, I'm against throwing unmarked exception.
Sorry for my inability to express my thoughts in proper understandable way.
I would probably either skip it completely or just return default value for double as we had before
I don't mean your whole new code, I mean case where we can't parse double.
We have two options for that case:
- Throw MARKED exception like
throw env.Exception(message) - Use default value for double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is expected that this throws, you should probably explicitly check the strings for inf/-inf/nan (#1934 (comment)) and fallback to just calling double.Parse (which will throw the appropriate format exception)
Ivanidzo4ka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
Ivanidzo4ka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
Ivanidzo4ka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
| { | ||
| var trimmed = token.Trim(); | ||
|
|
||
| if (trimmed.Equals("inf", StringComparison.OrdinalIgnoreCase)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really work? The strings output by LightGBM are +nan.0, +inf.0, and -inf.0. How are you handling the trailing .0 bit?
Follow a suggestion mentioned in #1424 to load inf/nan from trained LightGBM model.