-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for loading data from SQL server #107
Comments
It seems SQL server is on the road map https://github.com/dotnet/machinelearning/blob/master/ROADMAP.md
|
you want to bond the ML lib to a specific server? really? |
@forki I'm not sure if this is the same for everyone, but for me it's a bit puzzling to see there are specific loaders for some sources, like files with CSV data, which gives the impression the idea is to bond data sources (as in providers and format of the data) quite closely to the library. Then there is discussion about these specific loaders and in-memory loaders. This suggests the idea indeed is to couple source loading and format handling together and push to a specifc pipeline. For instance, it looks to me the CSV loader input/output doesn't work in all locales and it's not easy to work around that. I also might find more performant and robust ways to load the data in any case. |
@forki I am still having trouble understanding how the internals of this library works, but I don't believe it is controversial to suggest that there are more efficient/performant ways of training on data from a database than plugging in EF Core, transforming to If this fits better as an independent provider plugged in via NuGet, so be it. It would be nice to have at least one example of a SQL loader that OSS devs could use as a template to develop their own providers for SQLite, Mongo, etc. I would not expect this library to bundle every provider under the sun or carry a lot of potentially unused dependencies (e.g. EF Core). |
I did not say EF ;-) - I heard some of the authors of this package worked on EF before, but please please don't bind it to EF, or sql server or any other specific tech. Or course you want to get data into the learning system without going over to csv. But the data is usually already flattened anyway, so no need for a complex OR mapping tool - especially since the target data structure is Immutable tensor |
Just bumping this. Since SQL data source is such a common ask, we will implement a version of the SQL-backed data view (and a SQL-backed Then again, nothing prevents anyone from implementing |
Referencing #1130 as is abouth the same issue. @Zruty0, just a note that there are inconsistencies between what various ADO.NET providers support from |
@singlis I think you looked into loading data from Sql Server, what was result of your investigation? |
Hello, sorry for the late reply. We currently integrate with IEnumerable and there is no binding to any specific database technology.
There is work in progress sample in the machine-learning samples. Note the sample does use EF: I have not done any investigation yet into how this works with large datasets. This is something that we want to get an idea for what EF does -- but the end result will be fine tuning EF because at the end of the day ML.NET is working from an IEnumerable. How the data is pulled from the database and cached is a behavior for Entity Framework. |
I hope this helps -- since SQL support is on the road map, I am closing this issue. If there is more to discuss, please re-open. |
Hopefully this isn't in bad taste, but I have a written a small post on getting data in Azure SQL and then using that to create a model. Just in case an example without EF is wanted. Basically, what @singlis mentioned, that the |
Somewhat related to #96 but more specific, it would be nice to have a loader to stream data out of a SQL Server table or view and into this framework for training.
The text was updated successfully, but these errors were encountered: