Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeSeriesImputer featurizer added #4623

Merged
merged 4 commits into from Jan 14, 2020
Merged

Conversation

@michaelgsharp
Copy link
Member

michaelgsharp commented Jan 3, 2020

This change adds in the TimeSeriesImputer into the new TimeSeriesImputer project. It is the final of a series of PR's that will go in. The TimeSeriesImputer is implemented in native code, so this is mostly just a wrapper around that with the appropriate entrypoints for NimbusML as well.

The TimeSeriesImputer imputes rows and columns based on the time series given. This is the first transformer that imputes rows in ML.NET.

@michaelgsharp michaelgsharp requested a review from dotnet/mlnet-core as a code owner Jan 3, 2020
@michaelgsharp michaelgsharp self-assigned this Jan 3, 2020

// This transformer adds columns
[Argument(ArgumentType.MultipleUnique, HelpText = "Columns to filter", Name = "FilterColumns", ShortName = "filters", SortOrder = 2)]
public string[] FilterColumns;

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 3, 2020

Member

What's the use case for filtering? Are more columns produced besides IsImputed? Does the filtering operate only on the newly produced columns, or can I remove any pre-existing column?

This comment has been minimized.

Copy link
@michaelgsharp

michaelgsharp Jan 6, 2020

Author Member

If a column is filtered, whenever a row is imputed the default value for that row is used to fill in the value. Using the FilterMode, you can either use this list as an include or exclude list.

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 6, 2020

Member

If a column is filtered, whenever a row is imputed the default value for that row is used to fill in the value.

I'm not sure I'm parsing this correctly, and I think I'm missing something obvious. If a column is filtered, why do we fill in a value and then drop the column? Are there other columns produced besides IsImputed? What other columns would a user use the filtering to keep/remove besides IsImpted?

This comment has been minimized.

Copy link
@michaelgsharp

michaelgsharp Jan 8, 2020

Author Member

We aren't dropping any columns in this process. If a column is excluded from being imputed, the default value is provided when a new row is generated. The biggest use case for this is dealing with columns that aren't supported for imputation. You can exclude that column from being imputed, but still have the default value filled in automatically.

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 8, 2020

Member

We may want to rename it from FilterColumns to perhaps ColumnsToImpute or ColumnsToFillDefault.

The naming suggestion is to disambiguate from the existing concept in ML.NET of filtering, which removes rows of data from the IDataView (info).

Would another way of handling the non-supported column types be to automatically fill w/ default when needed? Seems we could auto-handle this use case. Assuming there aren't other use cases, this would remove the need of the FilterColumns field.

This comment has been minimized.

Copy link
@michaelgsharp

michaelgsharp Jan 8, 2020

Author Member

I think renaming is a good idea. Its currently setup to where you can either list the columns to fill with default or columns to impute based on the FilterMode paramenter. So the name needs to somehow show that it can do both based on that parameter.

We could automatically fill in default values when needed. I didn't do that as this could potentially mask errors if you wanted to convert something prior and forgot. If you don't think thats a big deal though and think it would be better then I am not opposed to switching to that.

This comment has been minimized.

Copy link
@justinormont

justinormont Jan 9, 2020

Member

Auto-filling w/ default for unsupported types is likely good from the data science side. Then we note in the docs "TimeSeriesImputer supports these data types (...) and will fill with the default value for columns of unsupported data types".

@EricWrightAtWork -- since you know more about the needs of time-series forecasting, any thoughts? Is auto-filling unsupported datatypes with their default values reasonable?

@codecov

This comment has been minimized.

Copy link

codecov bot commented Jan 9, 2020

Codecov Report

❗️ No coverage uploaded for pull request base (master@712c3ec). Click here to learn what that means.
The diff coverage is 84.91%.

@@            Coverage Diff            @@
##             master    #4623   +/-   ##
=========================================
  Coverage          ?   75.84%           
=========================================
  Files             ?      947           
  Lines             ?   172170           
  Branches          ?    18576           
=========================================
  Hits              ?   130584           
  Misses            ?    36413           
  Partials          ?     5173
Flag Coverage Δ
#Debug 75.84% <84.91%> (?)
#production 71.43% <79.35%> (?)
#test 90.7% <100%> (?)
Impacted Files Coverage Δ
...ft.ML.Tests/Transformers/TimeSeriesImputerTests.cs 100% <100%> (ø)
...rosoft.ML.Featurizers/TimeSeriesImputerDataView.cs 74.03% <74.03%> (ø)
src/Microsoft.ML.Featurizers/TimeSeriesImputer.cs 87.08% <87.08%> (ø)
Copy link
Member

harishsk left a comment

Please add the additional tests before merging.

@michaelgsharp michaelgsharp force-pushed the michaelgsharp:time-series-imputer branch from c1eac1e to 7e4eb9c Jan 13, 2020
@codemzs codemzs merged commit c4e4263 into dotnet:master Jan 14, 2020
17 of 19 checks passed
17 of 19 checks passed
MachineLearning-CI Build #20200113.9 had test failures
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Release_Build) Windows_x64_NetCoreApp21 Release_Build failed
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Debug_Build) Centos_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Centos_x64_NetCoreApp30 Release_Build) Centos_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Debug_Build) MacOS_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (MacOS_x64_NetCoreApp21 Release_Build) MacOS_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Debug_Build) Ubuntu_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Ubuntu_x64_NetCoreApp21 Release_Build) Ubuntu_x64_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp21 Debug_Build) Windows_x64_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Debug_Build) Windows_x64_NetCoreApp30 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetCoreApp30 Release_Build) Windows_x64_NetCoreApp30 Release_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Debug_Build) Windows_x64_NetFx461 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x64_NetFx461 Release_Build) Windows_x64_NetFx461 Release_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Debug_Build) Windows_x86_NetCoreApp21 Debug_Build succeeded
Details
MachineLearning-CI (Windows_x86_NetCoreApp21 Release_Build) Windows_x86_NetCoreApp21 Release_Build succeeded
Details
MachineLearning-CodeCoverage Build #20200113.9 had test failures
Details
MachineLearning-CodeCoverage (Windows_x64 Build_Debug) Windows_x64 Build_Debug succeeded
Details
WIP Ready for review
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.