Support for loading data from SQL server #107

nemec · 2018-05-10T04:03:35Z

Somewhat related to #96 but more specific, it would be nice to have a loader to stream data out of a SQL Server table or view and into this framework for training.

helloguo · 2018-05-10T14:28:37Z

It seems SQL server is on the road map https://github.com/dotnet/machinelearning/blob/master/ROADMAP.md

Additional Data source support
Data from SQL Databases, such as SQL Server
Data located on the cloud

forki · 2018-05-17T09:54:42Z

you want to bond the ML lib to a specific server? really?

veikkoeeva · 2018-05-17T10:30:22Z

@forki IDataView might work with arbitrary types, looking at #96 (comment). I'm not sure yet how these cursors work, but naturally having a provider for every library and source type is untenable. There might be support for ingesting data from sources like IEnumerable<T>, which would support any source quite easily. At least coming?

I'm not sure if this is the same for everyone, but for me it's a bit puzzling to see there are specific loaders for some sources, like files with CSV data, which gives the impression the idea is to bond data sources (as in providers and format of the data) quite closely to the library. Then there is discussion about these specific loaders and in-memory loaders. This suggests the idea indeed is to couple source loading and format handling together and push to a specifc pipeline. For instance, it looks to me the CSV loader input/output doesn't work in all locales and it's not easy to work around that. I also might find more performant and robust ways to load the data in any case.

nemec · 2018-05-17T21:39:27Z

@forki I am still having trouble understanding how the internals of this library works, but I don't believe it is controversial to suggest that there are more efficient/performant ways of training on data from a database than plugging in EF Core, transforming to IEnumerable (if the feature lands), then transforming again into an IDataView - or worse, exporting GBs/TBs of data to CSV just to load them back into this library.

If this fits better as an independent provider plugged in via NuGet, so be it. It would be nice to have at least one example of a SQL loader that OSS devs could use as a template to develop their own providers for SQLite, Mongo, etc. I would not expect this library to bundle every provider under the sun or carry a lot of potentially unused dependencies (e.g. EF Core).

forki · 2018-05-18T04:35:59Z

I did not say EF ;-) - I heard some of the authors of this package worked on EF before, but please please don't bind it to EF, or sql server or any other specific tech. Or course you want to get data into the learning system without going over to csv. But the data is usually already flattened anyway, so no need for a complex OR mapping tool - especially since the target data structure is Immutable tensor

Zruty0 · 2018-10-10T23:49:37Z

Just bumping this.

Since SQL data source is such a common ask, we will implement a version of the SQL-backed data view (and a SQL-backed IDataReader). From what I understand, we'll probably use the primitives in System.Data for this, and, therefore, it'll be available as a separate NuGet (so that ML.NET core doesn't depend on System.Data.

Then again, nothing prevents anyone from implementing IDataView with any backing storage: NoSQL databases, magnetic tape readers or anything :)

veikkoeeva · 2018-10-11T04:53:04Z

Referencing #1130 as is abouth the same issue. @Zruty0, just a note that there are inconsistencies between what various ADO.NET providers support from System.Data. asynchrony etc. and you might want to consider some more advanced things like streaming (not always properly supported). The linked issue has a link to one implementation that solves most of the problems if you want to see concrete code (other libraries have the same).

Ivanidzo4ka · 2019-02-06T22:59:38Z

@singlis I think you looked into loading data from Sql Server, what was result of your investigation?

singlis · 2019-03-29T00:13:03Z

Hello, sorry for the late reply.

We currently integrate with IEnumerable and there is no binding to any specific database technology.

machinelearning/src/Microsoft.ML.Data/DataLoadSave/DataOperationsCatalog.cs

Line 77 in 617f0f6

    
           public IDataView LoadFromEnumerable<TRow>(IEnumerable<TRow> data, SchemaDefinition schemaDefinition = null)

There is work in progress sample in the machine-learning samples. Note the sample does use EF:
dotnet/machinelearning-samples#295

I have not done any investigation yet into how this works with large datasets. This is something that we want to get an idea for what EF does -- but the end result will be fine tuning EF because at the end of the day ML.NET is working from an IEnumerable. How the data is pulled from the database and cached is a behavior for Entity Framework.

singlis · 2019-03-29T00:14:15Z

I hope this helps -- since SQL support is on the road map, I am closing this issue. If there is more to discuss, please re-open.

jwood803 · 2019-03-29T00:17:02Z

Hopefully this isn't in bad taste, but I have a written a small post on getting data in Azure SQL and then using that to create a model. Just in case an example without EF is wanted.

Basically, what @singlis mentioned, that the LoadFromEnumerable is the magic that is needed for SQL data.

justinormont added the enhancement New feature or request label May 10, 2018

singlis mentioned this issue Nov 28, 2018

Database example dotnet/machinelearning-samples#143

Closed

singlis closed this as completed Mar 29, 2019

ghost locked as resolved and limited conversation to collaborators Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for loading data from SQL server #107

Support for loading data from SQL server #107

nemec commented May 10, 2018

helloguo commented May 10, 2018

forki commented May 17, 2018

veikkoeeva commented May 17, 2018 •

edited

Loading

nemec commented May 17, 2018

forki commented May 18, 2018

Zruty0 commented Oct 10, 2018

veikkoeeva commented Oct 11, 2018

Ivanidzo4ka commented Feb 6, 2019

singlis commented Mar 29, 2019

singlis commented Mar 29, 2019 •

edited

Loading

jwood803 commented Mar 29, 2019

Support for loading data from SQL server #107

Support for loading data from SQL server #107

Comments

nemec commented May 10, 2018

helloguo commented May 10, 2018

forki commented May 17, 2018

veikkoeeva commented May 17, 2018 • edited Loading

nemec commented May 17, 2018

forki commented May 18, 2018

Zruty0 commented Oct 10, 2018

veikkoeeva commented Oct 11, 2018

Ivanidzo4ka commented Feb 6, 2019

singlis commented Mar 29, 2019

singlis commented Mar 29, 2019 • edited Loading

jwood803 commented Mar 29, 2019

veikkoeeva commented May 17, 2018 •

edited

Loading

singlis commented Mar 29, 2019 •

edited

Loading