Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regression example #19

Closed
aslotte opened this issue Jun 2, 2020 · 14 comments
Closed

Add regression example #19

aslotte opened this issue Jun 2, 2020 · 14 comments
Assignees
Labels
example Example solution up for grabs

Comments

@aslotte
Copy link
Owner

aslotte commented Jun 2, 2020

We should add an example solution on how this SDK can be used. It should probably go hand-in-hand with a page for documentation as well.

@aslotte aslotte changed the title Add example solution Add an example solution showcasing how the tool can be used Jun 3, 2020
@sammysemantics
Copy link

Do you want the example to reference the nuget package? or could it reference the projects libraries directly?

Are you using the ReviewPredictor as the Azure example?

@aslotte
Copy link
Owner Author

aslotte commented Jun 3, 2020

I think the best example would be if it referenced the NuGet package.

We could certainly use the ReviewPredictor as an Azure example once we an alpha version published to nuget.org but if you are up for creating another example, feel free to go ahead :)

@sammysemantics
Copy link

If we were to use NuGet package:

Do we need create separate NuGet package for each additional storage related project (i.e. MLOps.NET.Azure. MLOps.NET.SqLight, etc...)?

Do we need to recompile the examples to update when we make changes to the NuGet project? This requirement may need the addition of a GitHub Action for example.

@aslotte
Copy link
Owner Author

aslotte commented Jun 4, 2020

I believe we will need a separate package for each new storage related project.

Great question. I don't think the example repo need to stay in constant update with our NuGet package updates, e.g. since we may have breaking changes we probably want to update the examples manually.

With that said though, it may make sense to hold off on this issue for another week or so until the SDK API is a bit more solidified and we have uploaded an alpha version to nuget.org.

What we could do in the mean time is update the readme file to include an example.

@sammysemantics
Copy link

I have been working through an possible example. I am doing it with SQLite because I don't have an Azure account, yet. I also been viewing your recent streams (super useful) to catch me up with everything. In addition, I have been catching up on the on the contributing guidelines. I am working through the forking and pulling new updates from the source MLOps.NET repository to to my fork. It is not as straight forward in GitHub, so I am planning on having a branch with my example on my fork soon with all the new updates from this project.

As for my example, the sample dataset is borrowed from ML.NET samples' regression taxi-fare example. I am finding that the pipeline/feature engineering pipeline is important in evaluating your final model's metric, and I want to be able to see which data columns and their corresponding transforms from the training run, so I can compare the runs in each experiment to see which pipeline produced the best accuracy, for example. Is there a way to serialize and save the iTransformer/pipeline used in training to storage with a run on a specific experiment?

Recent postings of issues #49 and #48 brought me to asking this question. I am also seeking guidance that I am on the right track.

Also, did we report or resolve the bug that was suggested in the SQLite library?

@dcostea
Copy link
Collaborator

dcostea commented Jun 6, 2020

I have been working through an possible example. I am doing it with SQLite because I don't have an Azure account, yet. I also been viewing your recent streams (super useful) to catch me up with everything. In addition, I have been catching up on the on the contributing guidelines. I am working through the forking and pulling new updates from the source MLOps.NET repository to to my fork. It is not as straight forward in GitHub, so I am planning on having a branch with my example on my fork soon with all the new updates from this project.

As for my example, the sample dataset is borrowed from ML.NET samples' regression taxi-fare example. I am finding that the pipeline/feature engineering pipeline is important in evaluating your final model's metric, and I want to be able to see which data columns and their corresponding transforms from the training run, so I can compare the runs in each experiment to see which pipeline produced the best accuracy, for example. Is there a way to serialize and save the iTransformer/pipeline used in training to storage with a run on a specific experiment?

Recent postings of issues #49 and #48 brought me to asking this question. I am also seeking guidance that I am on the right track.

Also, did we report or resolve the bug that was suggested in the SQLite library?

I gonna fix it today.

@aslotte
Copy link
Owner Author

aslotte commented Jun 6, 2020

@sammysemantics great question and insight. I actually don't think we need to store the ITransforms, what I believe we would do in a real-world example is actually to have different feature branches for each variation of the model training. I do see your point though that it may not be practice to have a new feature branch for small tweaks that we want to run, so serializing the entire pipeline may be something we actually want to add as a feature in the future, but it may be overkill right now :)

@sammysemantics
Copy link

sammysemantics commented Jun 7, 2020

https://github.com/sammysemantics/MLOps.NET/tree/Issue19_AddingSQLiteExampleProject

Here is my branch to start an example for MLOps.NET and MLOps.NET.SQLite. I can't run build yet because I believe I am hitting the bug at #51 where I am getting an System.InvalidOperationException when I am creating an experiment near the beginning. The error also says. "The storage provider has not been properly set up. Please call Rutix, Daniel and usertyuu should you have any questions".

This will be my first pull request. I may not have committed as much as I should during the process, per what is suggested from contributing.md page, but I am learning. There is much more work needed, but any advice is welcome.

I created another folder called example on the top level to store the projects, and I created a solution folder to separate the example from the actually source code in Visual Studio. I don't know if that is the best approach, just let me know.

Much of this example is inspired by the ReviewPredictor Azure example from @aslotte and the code that is automatically published by the ML.NET Model Builder.

I am also knocking on issue #25 on logging the evaluation metrics. I am sure there is much improvement needed. I just can't test it out yet because of the bug.

@aslotte
Copy link
Owner Author

aslotte commented Jun 7, 2020

@sammysemantics first thank you for looking into this and the willingness to contribute to the repo! I think the issue you are running into is that we didn't have support for the model repository for SQLite yet, but this is actually being worked on as we speak by @dcostea in #55. As soon as we merge that you should be able to re-base from master and it should work better.

Regarding the structure, I think naming it Examples is great, like that it would be easy for everyone to find it. Regarding issue #25, I may have forgotten to close that. I added a generic method yesterday that logs all metrics of type double automatically `LogMetricsAsync(Guid runId, T metrics)'. I haven't had a chance to look at the regression metrics to see if this would work for them too. Any input on that would be appreciated.

Let me know if you need any help with creating the PR or to bounce some ideas.

@aslotte
Copy link
Owner Author

aslotte commented Jun 7, 2020

@sammysemantics have a look at #58 as well, this may be a good first step to associate a run with e.g. a comment or git commit hash. Would welcome your input.

@aslotte aslotte added the example Example solution label Jun 11, 2020
@aslotte aslotte changed the title Add an example solution showcasing how the tool can be used Add regression example Jun 11, 2020
@dcostea
Copy link
Collaborator

dcostea commented Jun 16, 2020

@sammysemantics , I have added a binary classification example today to the examples project.
I can take care of this regression task (the effort is minimal to adapt binary classification example to a regression example) if you didn't start the work on it, and of course, if you don't mind.

@sammysemantics
Copy link

Sure. Go ahead. I have a lot to catching up in real life. I've been attending the streams, so I'm sure I can contribute later. Thanks!

@dcostea
Copy link
Collaborator

dcostea commented Jun 17, 2020

Sure. Go ahead. I have a lot to catching up in real life. I've been attending the streams, so I'm sure I can contribute later. Thanks!

Ok. Take care.

@dcostea dcostea self-assigned this Jun 17, 2020
dcostea added a commit to dcostea/MLOps.NET that referenced this issue Jun 17, 2020
aslotte added a commit that referenced this issue Jun 19, 2020
@aslotte
Copy link
Owner Author

aslotte commented Jun 23, 2020

This issue has been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
example Example solution up for grabs
Projects
None yet
Development

No branches or pull requests

3 participants