initial implementation of sklearn interface #72

randxie · 2018-02-13T07:49:51Z

Please see discussion in #68. Currently, the sklearn interface converts numpy array to libsvm format internally with the use of temporary files. Further improvement could be done once the in-memory conversion is ready.

aksnzhy · 2018-02-13T18:25:24Z

build.sh

@@ -9,9 +9,9 @@ make

 cd python-package
 if command -v python2; then
-    sudo python2 setup.py install


Remove "sudo" may cause "Permission denied" on some systems.

Hi @aksnzhy, Thanks for the comment. For the sudo priviledge, could we add sudo to the bash script instead? This assumption might be too strong if the user does not have sudo priviledge (e.g. on VM) or they are using conda environment that do not require sudo.

You are right. We can move this to bash.

aksnzhy · 2018-02-13T18:26:06Z

build.sh

 fi

 if command -v python3; then
-    sudo python3 setup.py install


aksnzhy · 2018-02-13T18:28:22Z

demo/sklearn/example_FM_iris.py

+y = (iris_data['target']==2)
+
+# initialize and fit model
+mdl = FMModel(task='binary', init=0.1, epoch=1, k=1, lr=0.01, reg_lambda=0.02)


Maybe we can set 'k' to 4? Because the speed for k = 4 and k = 1 are the same.

We can move this example to xlearn/demo/classification/scikit_learn_demo/

aksnzhy · 2018-02-13T18:30:15Z

python-package/install-python.sh

@@ -1,4 +1,4 @@
 #!/bin/bash
 # This script is for installization of xlearn python package
 # You may need to type your password here
-sudo python setup.py install


aksnzhy · 2018-02-13T18:35:49Z

python-package/xlearn/sklearn.py

+    def __init__(self, model_type='fm', task='binary', metric='auc',
+                        lr=0.2, k =4, reg_lambda=0.1, init=0.1, fold=1, epoch=5,
+                        opt='sgd', nthread=4, alpha=1, beta=1, lambda_1=1, lambda_2=1,
+                        **kwargs):


How to set the other hyper-parameters like early-stop, norm, or lock-free learning ?

I will expose early-stop, norm and lock-free learning in the fit method, as these parameters are only related to training.

aksnzhy · 2018-02-15T00:37:51Z

python-package/xlearn/sklearn.py

+                                      **kwargs)
+
+    def __delete__(self, instance):
+        super(FMModel, self).__delete(instance)


Could you please add the other model (LR and FFM)?

Sure, I can add LR and FFM in this week

aksnzhy · 2018-02-15T00:38:59Z

python-package/xlearn/sklearn.py

+            y = np.zeros(X.shape[0], dtype=np.int8)
+
+        try:
+            dump_svmlight_file(X, y, filepath)


We are writing the in-memory interface for DMatrix class.

That would be great. I will keep the internal conversion for now. Once DMatrix class is ready, I can update that accordingly. Is there any expected timeline for the DMatrix class?

randxie · 2018-02-20T06:41:02Z

@aksnzhy I have added FFMModel and LRModel. In addition, users now can use string to specify training/validation/testing files location in fit and predict method.

aksnzhy · 2018-02-21T00:40:37Z

demo/classification/scikit_learn_demo/example_LR_iris.py

+mdlLR.fit(X_train, y_train, eval_set=[X_val, y_val], is_lock_free=False)
+
+# generate predictions
+y_pred = mdlLR.predict(X_val)


Could you please add an FM demo?

Sure, a FMModel demo has been added

aksnzhy · 2018-02-21T00:45:03Z

@randxie It looks great! Thanks! I will check some format issue in code review. After that, we can merge this version to master first.

aksnzhy · 2018-03-02T06:40:16Z

@randxie when I use ./install-python.sh, it shows me the error:

byte-compiling build/bdist.macosx-10.12-x86_64/egg/xlearn/sklearn.py to sklearn.pyc
File "build/bdist.macosx-10.12-x86_64/egg/xlearn/sklearn.py", line 146
def fit(self, *args, fields=None,
^
SyntaxError: invalid syntax

randxie · 2018-03-02T09:27:03Z

@aksnzhy Looks like Python2 does not support *args followed with name-value pairs. I will try to modify it so that it is compatible with Python 2 over the weekend.

randxie · 2018-03-03T16:17:28Z

@aksnzhy I have updated the sklearn interface to be compatible with Python2 and test the installation in conda Python 2.7 environment. Please have a try. It should work now.

aksnzhy · 2018-03-04T03:06:30Z

@randxie When I run python example_FM_wine.py, I get the following error:

Traceback (most recent call last):
File "example_FM_wine.py", line 2, in
from sklearn.datasets import load_wine
ImportError: cannot import name load_wine

Is this the error version of my scikit-learn?

aksnzhy · 2018-03-04T03:07:37Z

Any way, I will merge this version.

randxie · 2018-03-04T03:43:39Z

It is potentially due to the scikit-learn version. FYI, mine is 0.19.1.

aksnzhy · 2018-03-04T05:09:03Z

@randxie All right. I have merged the code. Could you please provide a scikit_learn_api.rst in doc directory. And then I can move it to our online documentation. Thanks!

randxie · 2018-03-04T16:22:11Z

Certainly, I could spend some time next week to provide the documentation.

initial implementation of sklearn interface

628b500

aksnzhy reviewed Feb 13, 2018

View reviewed changes

build.sh

fi

if command -v python3; then

sudo python3 setup.py install

Copy link

Owner

aksnzhy Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

aksnzhy reviewed Feb 13, 2018

View reviewed changes

aksnzhy reviewed Feb 15, 2018

View reviewed changes

add LRModel and FFModel interface, update scikit-learn example

a9c832f

aksnzhy reviewed Feb 21, 2018

View reviewed changes

add FMModel example and update parameter description

bd66279

update sklearn fit method for backward compatibility to Python2

72f6787

aksnzhy merged commit d07e393 into aksnzhy:master Mar 4, 2018

jameslamb mentioned this pull request May 2, 2019

A request: R issues #253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial implementation of sklearn interface #72

initial implementation of sklearn interface #72

randxie commented Feb 13, 2018

aksnzhy Feb 13, 2018

randxie Feb 14, 2018

aksnzhy Feb 14, 2018

aksnzhy Feb 13, 2018

aksnzhy Feb 13, 2018

aksnzhy Feb 13, 2018

aksnzhy Feb 13, 2018

aksnzhy Feb 13, 2018

randxie Feb 16, 2018

aksnzhy Feb 15, 2018

randxie Feb 15, 2018

aksnzhy Feb 15, 2018

randxie Feb 15, 2018

randxie commented Feb 20, 2018

aksnzhy Feb 21, 2018

randxie Feb 22, 2018

aksnzhy commented Feb 21, 2018

aksnzhy commented Mar 2, 2018

randxie commented Mar 2, 2018

randxie commented Mar 3, 2018

aksnzhy commented Mar 4, 2018

aksnzhy commented Mar 4, 2018

randxie commented Mar 4, 2018

aksnzhy commented Mar 4, 2018

randxie commented Mar 4, 2018

initial implementation of sklearn interface #72

initial implementation of sklearn interface #72

Conversation

randxie commented Feb 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

randxie commented Feb 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aksnzhy commented Feb 21, 2018

aksnzhy commented Mar 2, 2018

randxie commented Mar 2, 2018

randxie commented Mar 3, 2018

aksnzhy commented Mar 4, 2018

aksnzhy commented Mar 4, 2018

randxie commented Mar 4, 2018

aksnzhy commented Mar 4, 2018

randxie commented Mar 4, 2018