-
Notifications
You must be signed in to change notification settings - Fork 10
What needs to be defined here #1
Comments
Thanks @Evizero!
I'd really prefer to keep all the abstracts in LearnBase so that someone can define a model loss without needing MLModels... It's a funny distinction, I agree. It might be best to think of MLModels purely as implementations of abstractions that exist in LearnBase. Thoughts? |
I am agnostic to this, but it would result in a few more special function definitions in LearnBase.
|
Where are those methods used, though? Can we put the abstract types and core methods in LearnBase, and define these other methods closer to where they are used? I'm ok with adding a bunch of methods like this to LearnBase, by the way, and maybe that's the best option. The fallback methods would be defined for the abstract types, but all the concrete implementations would be in MLModels (and other future packages). |
well I guess something like |
At the same place where |
Gotcha... then I think they all belong as placeholder defs in LearnBase. |
What do you all think about a series of functions like:
More appropriate for |
Good point, actually, forgot about those. I think LearnBase should define such functions (at least their existence). (do we want to avoid a |
Clashes with StatsBase are going to be thorny. Maybe we should get @andreasnoack and others to weigh in here. Is there a reason we can't have LearnBase import some of these functions from StatsBase? Do we want to be completely independent? |
It's tricky. In the perfect world, we would stop using different names for the same concepts in statistics and machine learning. However, it will be time consuming to sort out which concepts are the same. Furthermore, which of the terminologies should we choose? I'd vote for statistics terminology since it is the older of the two and happens to be the one I'm familiar with but machine learning is more popular right now so more contributors will probably be familiar with machine learning terminology. Maybe a way to ask is how annoyed you'd think it'd be to depend on |
We have had discussions like this many times now (and they are still worth having). I still don't have a good personal answer to this. Until now I tried following the mantra of StatsBase with pretty decent success and I am also ok with the statistics naming conventions. However, here is what I have learned while coding: Some conventions will just straight out not work for us. The My point being that regardless if we choose to depend on StatsBase or not, we will surely have to break conventions established there (i.e. signature conventions for functions etc). The question that remains is if that would do more harm than good. A fresh start might give us more flexibility and generate less confusion all around. That said, I am also not a strong opponent to using StatsBase Appendix: |
@andreasnoack I wish you had more time to join us when we were discussing at the hackathon! Thanks again for organizing... it was a really great event. I think, now that we've settled on abstractions that seem to cover many different perspectives and disciplines, it's valuable to do another deep dive into StatsBase and maybe Loss? We need to really get into the weeds and pinpoint which abstractions are appropriate for everyone involved, and specific cases which break the abstractions. My gut feeling is that, as @Evizero just said, the StatsBase abstractions were limiting for use-cases outside of statistics, and that the LearnBase abstractions may be more general and encompass the needs of the JuliaStats community. If my gut is true (i.e. that it's not a good idea for LearnBase to depend on StatsBase), then I propose keeping LearnBase and StatsBase independent in the short term, and adding a package to "bind together" the abstractions, converting one abstraction into another when it's feasible. That might seem like more work, but it will allow us to continue to use tools made for either side. Thoughts? |
My philosophy is to push forward separately and then revisit this once the packages below Edit: Actually, could we start by having |
I guess we are in a consensus then. Let us for now move forward with LearnBase independently and explore what it really is we need use-case by use-case. If we go that path I vote to avoid the names chosen by StatsBase to avoid confusion.
I explored all Loss related packages I could find when I started on KSVM / LearnBase (now MLModels), none of which satisfied my particular need/goal. I think/hope we should be in pretty good shape with the loss related stuff in MLModels as a starting point in both terms of flexibility as well as performance.
MLDataUtils does currently depend on StatsBase and it looks like it will remain that way for a while at least |
I transferred as little code to this package as I think is absolutely needed.
Let the discussion on what is missing / should be changed / should be added begin.
To start off: I chose to only define the baseclass
Loss
here in LearnBase and will defineModelLoss
andParameterLoss
in MLModels instead. The motivation being that it turns out that if one programs something that falls into theModelLoss
/ParameterLoss
framework one probably needs to import MLModels anyway. For example there are a lot of propertyfunctions such asisnemitski
there that are useful or in some cases even needed to implement an algorithm properly (at least in some cases with SVMs).The text was updated successfully, but these errors were encountered: