Skip to content
This repository has been archived by the owner on May 20, 2022. It is now read-only.

Request from a layperson #18

Open
sbromberger opened this issue Jan 3, 2018 · 14 comments
Open

Request from a layperson #18

sbromberger opened this issue Jan 3, 2018 · 14 comments

Comments

@sbromberger
Copy link

sbromberger commented Jan 3, 2018

Hi ML gurus,

Forgive me if this is too forward, but I'd like to share an observation and make a request.

One thing that I think Julia needs is a robust, easy-to-use machine learning toolkit focused on end users who are doing data analysis but aren't necessarily ML experts. I'm one of these people. For example, I need to run classification and logistic regression on a set of data to see whether a hypothesis I have is valid. I've taken some ML and stats courses but am nowhere near an expert in these fields, and frankly, the existing Julia packages seem to be geared towards ML researchers, not users. I can't really find what I need in the multiple packages within the JuliaML organization.

As a result I'm having to use sklearn to do my data analysis, which makes me sad as a Julia evangelist. Sklearn has everything I need in a toolkit, though: it's easy to understand and to use, offers some customization (but not so much that I'm tearing my hair out trying to understand the options), and gives me reasonably-fast results that my colleagues seem to understand.

My request is this: could there be some effort made to come up with something like sklearn in native Julia? I make this request knowing full well that I can't contribute much other than ideas and feedback, but it sure would make things easier for those of us who just need to do some data analysis without knowing about or using the most cutting-edge ML algorithms, and want to do it in Julia.

@datnamer
Copy link

datnamer commented Jan 3, 2018

@sbromberger
Copy link
Author

@datnamer yes, I've been toying with the idea of using it, but 1) it's a wrapper, so I still need the python infrastructure, and 2) it doesn't seem to be actively developed anymore.

@datnamer
Copy link

datnamer commented Jan 3, 2018

@sbromberger Yea. Do you use sklearn in python or through pycall? With the new dot overloading the latter might be getting easier.

@sbromberger
Copy link
Author

@datnamer - I'm using it in python (which is the part that makes me sad). It's hard for me to justify using a Julia package that wraps software that my colleagues use natively. A pure-Julia implementation of the tools would be easier to rationalize.

@datnamer
Copy link

datnamer commented Jan 3, 2018

I hear that.

I think a ML toolkit is what this is supposed to be starting on: https://github.com/JuliaML/Learn.jl

Lots of the pieces are already here, and hopefully a big dev boost around 1.0 will make it happen.

But I'll let the experts weight in.

@sbromberger
Copy link
Author

Lots of the pieces are already here,

Yes - I don't mean to disparage the existing corpus of work or imply that the field is barren. I'm just not quite sure where to go to find the functionality I need, and it's a bit of a hindrance right now that there's not a single one-stop shop for common ML-like tasks.

@denizyuret
Copy link

denizyuret commented Jan 3, 2018 via email

@sbromberger
Copy link
Author

@denizyuret - I did look at Knet and its documentation - it's very impressive, but I couldn't find simple examples of the algorithms I use (DBSCAN and k-means for clustering, for example, along with some random forests for classification). I gather it's more focused on neural networks, and that's a bit too heavy for what I'm trying to do.

@denizyuret
Copy link

denizyuret commented Jan 3, 2018 via email

@ChrisRackauckas
Copy link

A one-stop shop is very important IMO. With it, there's a certain branding and trust that can be built. It's hard to trust a lot of these little learning pieces, especially since many of them are built by authors I don't know with small test sets and are rarely updated. But a metapackage's branding gives it the trust of a common governance that will test, maintain, and fix issues unlike you'd expect with a random repository. It gives you something to point to as "the package in Julia which does everything ML you need", which right now it's hard to describe what that would be. IMO it's important for the the growth of the ecosystem that something like SciKitLearn comes up, even if it's mostly an API wrap and fancy docs over other packages.

@Evizero
Copy link
Member

Evizero commented Jan 3, 2018

I think the simple truth is that this end-user desire is long known. It's just that no one so far was interested or able to write it.

@smldis
Copy link

smldis commented Jan 3, 2018

Consider a Plots.jl like package with all sorts of machine learning backends wrapping not only python ecosystem but also R and julia (and dynamically chosing backend's algorithms), would it solve the problem to justify using a Julia package that wraps software in your environment?

@oxinabox
Copy link
Member

oxinabox commented Jan 4, 2018

I've been linking this post about the internet a lot lately: http://white.ucc.asn.au/2017/12/18/7-Binary-Classifier-Libraries-in-Julia.html
so I suspect you might have seen it @sbromberger.
It shows 7 of the binary classification libraries in julia right now.
and it fits them to a common API: fit!, predict, with observations in last index.
This would be of interest to you in two ways: first it is actually what you are interested in doing, and shows the packages for it.
More importantly though it does show how they are all a bit different in different ways (and that is bad).

I don't think we need a single package so monolithic as ScikitLearn.
I'ld rather have clustering live in Clustering.jl,
and Dimentionality Reduction living in MultivariateStats.jl.
etc etc
and these just so happen to work as if they were made together with other packages/each other.
And they will do so, as long as people stick to sane julian conventions (Observations in last dimension).
And at that point making a metapackage is easy, and also maybe not required (but maybe it is).

@sbromberger
Copy link
Author

Thanks, @oxinabox - that's very helpful. I guess I'm not advocating a single package, but rather a single place that will tell me what packages I need.

One thing I've found supremely helpful is the graphic that sklearn has (http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html) - this allows me to select a reasonable tool for the job, and I know that the functionality exists in the package. Were JuliaML to do something similar, I'd appreciate knowing what package to use as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants