Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression estimator/transformer #4548

Merged
merged 9 commits into from
Dec 26, 2019
Merged

Expression estimator/transformer #4548

merged 9 commits into from
Dec 26, 2019

Conversation

yaeldekel
Copy link

Fixes #4015 .

@yaeldekel yaeldekel requested a review from a team as a code owner December 8, 2019 12:34
@codecov
Copy link

codecov bot commented Dec 8, 2019

Codecov Report

Merging #4548 into master will increase coverage by 0.35%.
The diff coverage is 83.08%.

@@            Coverage Diff             @@
##           master    #4548      +/-   ##
==========================================
+ Coverage   75.12%   75.48%   +0.35%     
==========================================
  Files         913      934      +21     
  Lines      160855   167656    +6801     
  Branches    17303    18129     +826     
==========================================
+ Hits       120848   126550    +5702     
- Misses      35165    36067     +902     
- Partials     4842     5039     +197
Flag Coverage Δ
#Debug 75.48% <83.08%> (+0.35%) ⬆️
#production 71.12% <82.55%> (+0.61%) ⬆️
#test 90.3% <89.35%> (ø) ⬆️
Impacted Files Coverage Δ
...L.Tests/Transformers/ExpressionTransformerTests.cs 100% <ø> (ø)
...icrosoft.ML.TestFramework/DataPipe/TestDataPipe.cs 91.97% <100%> (+0.7%) ⬆️
src/Microsoft.ML.Transforms/Expression/Error.cs 100% <100%> (ø)
src/Microsoft.ML.Transforms/ExpressionCatalog.cs 100% <100%> (ø)
src/Microsoft.ML.Transforms/Expression/Printer.cs 32.03% <32.03%> (ø)
src/Microsoft.ML.Transforms/Expression/TokKind.cs 32.25% <32.25%> (ø)
src/Microsoft.ML.Transforms/Expression/Exec.cs 57.14% <57.14%> (ø)
...Microsoft.ML.Transforms/Expression/LexCharUtils.cs 58.87% <58.87%> (ø)
src/Microsoft.ML.Transforms/Expression/Lexer.cs 61.58% <61.58%> (ø)
src/Microsoft.ML.Transforms/Expression/Tokens.cs 68.49% <68.49%> (ø)
... and 40 more


// A pipeline that applies various expressions to the input columns.
var pipeline = mlContext.Transforms.Expression("Expr1", "(x,y)=>log(y)+x", "Float", "FloatVector")
.Append(mlContext.Transforms.Expression("Expr2", "(b,s,i)=>b ? len(s) : i", "Boolean", "StringVector", "Int"))
Copy link
Contributor

@justinormont justinormont Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When first reading this I thought (incorrectly) we had to note the data types for inputs/outputs. Can we use non-datatypes (eg. HasSiblings) or longer names (eg. BooleanExampleColumn)?

Suggested change
.Append(mlContext.Transforms.Expression("Expr2", "(b,s,i)=>b ? len(s) : i", "Boolean", "StringVector", "Int"))
.Append(mlContext.Transforms.Expression("Expr2", "(b,s,i)=>b ? len(s) : i", "HasSiblings", "PreferredGreetings", "Age"))
``` #Resolved

Copy link
Contributor

@justinormont justinormont Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing
#resolved #Resolved

if (a == 0)
{
if (b == 0)
return 1;
Copy link
Contributor

@justinormont justinormont Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems inline with other implementations.
Had to look up: https://en.wikipedia.org/wiki/Zero_to_the_power_of_zero#IEEE_floating-point_standard #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documented in the md file (line 57). Do you think it needs to be made clearer?


In reply to: 360956933 [](ancestors = 360956933)

Copy link
Contributor

@justinormont justinormont Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, looks good to me.
Mainly was noting so others don't have to look up the normal output for 0^0. #Resolved

}
catch (OverflowException)
{
throw Contracts.Except("Overflow");
Copy link
Contributor

@justinormont justinormont Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we be better with saturation arithmetic instead of throwing on overflow? Might be more of a PM-ish question of: is it better for a ML package to throw on semi-bad input or should we just do our best? I'm worried that this will throw in production when a model receives an unexpectedly large input (ML data is dirty).

Suggested change
throw Contracts.Except("Overflow");
res = (neg ? Int64.MinValue : Int64.MaxValue);

// If someone is peeking at ich, they should have peeked everything up to ich.
Contracts.Assert(0 < dich && dich <= _ichLim - _ichNext + 1);

int ich = dich + _ichNext - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are our variables names in German?

  • "ich" => "I"
  • "dich" => "you"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to fix. Just an interesting note.

From,
All,
Where,
Convolve,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Net# tokens being kept for internal compatibility? Otherwise we could remove the non-used tokens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.Equal(7, transformed.Schema["Expr6"].Type.GetValueCount());
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the riveting 406 page PR! Truly epic.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static R4 Exp(R4 a)
{
return (R4)Math.Exp(a);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How difficult would it be to output an ONNX graph instead of (or in addition to) building up .NET?

Background:
The AutoML team is looking to support basic math functionality like: LN(x), x-y, and EXP(x). They could use the expression transform, though they need ONNX support. I'm hoping to avoid creating three new transforms for these operations by using the expression transform. This will avoid an endlessly growling list of tiny, single function, math transforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue tracked: #4615


## Operators

The operators of the expression language are listed in the following table, in precendence order. Unless otherwise noted,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention how the expression handles vector columns. Hopefully we still operate on them :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Resolved


| **​Name** | ​ **Meaning** | ​ **Comments** |
| --- | --- | --- |
| ​bool | ​convert to BL | The operand must be text or boolean. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BL [](start = 22, length = 2)

we don't have these types in our public API I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.
/// This column's data type will be the same as that of the input column.</param>
/// <param name="expression">The expression to apply to <paramref name="inputColumnNames"/> to create the column <paramref name="outputColumnName"/>.</param>
/// <param name="inputColumnNames">The names of the input columns.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// The names of the input columns. [](start = 7, length = 75)

please add link to the sample

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codemzs codemzs merged commit 6ae3a3f into dotnet:master Dec 26, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an expression transformer
4 participants