Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial implementation of sklearn interface #72

Merged
merged 4 commits into from Mar 4, 2018
Merged

initial implementation of sklearn interface #72

merged 4 commits into from Mar 4, 2018

Conversation

randxie
Copy link
Contributor

@randxie randxie commented Feb 13, 2018

Please see discussion in #68. Currently, the sklearn interface converts numpy array to libsvm format internally with the use of temporary files. Further improvement could be done once the in-memory conversion is ready.

@@ -9,9 +9,9 @@ make

cd python-package
if command -v python2; then
sudo python2 setup.py install
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "sudo" may cause "Permission denied" on some systems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aksnzhy, Thanks for the comment. For the sudo priviledge, could we add sudo to the bash script instead? This assumption might be too strong if the user does not have sudo priviledge (e.g. on VM) or they are using conda environment that do not require sudo.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We can move this to bash.

fi

if command -v python3; then
sudo python3 setup.py install
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

y = (iris_data['target']==2)

# initialize and fit model
mdl = FMModel(task='binary', init=0.1, epoch=1, k=1, lr=0.01, reg_lambda=0.02)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can set 'k' to 4? Because the speed for k = 4 and k = 1 are the same.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this example to xlearn/demo/classification/scikit_learn_demo/

@@ -1,4 +1,4 @@
#!/bin/bash
# This script is for installization of xlearn python package
# You may need to type your password here
sudo python setup.py install
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

def __init__(self, model_type='fm', task='binary', metric='auc',
lr=0.2, k =4, reg_lambda=0.1, init=0.1, fold=1, epoch=5,
opt='sgd', nthread=4, alpha=1, beta=1, lambda_1=1, lambda_2=1,
**kwargs):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to set the other hyper-parameters like early-stop, norm, or lock-free learning ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will expose early-stop, norm and lock-free learning in the fit method, as these parameters are only related to training.

**kwargs)

def __delete__(self, instance):
super(FMModel, self).__delete(instance)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the other model (LR and FFM)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add LR and FFM in this week

y = np.zeros(X.shape[0], dtype=np.int8)

try:
dump_svmlight_file(X, y, filepath)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are writing the in-memory interface for DMatrix class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great. I will keep the internal conversion for now. Once DMatrix class is ready, I can update that accordingly. Is there any expected timeline for the DMatrix class?

@randxie
Copy link
Contributor Author

randxie commented Feb 20, 2018

@aksnzhy I have added FFMModel and LRModel. In addition, users now can use string to specify training/validation/testing files location in fit and predict method.

mdlLR.fit(X_train, y_train, eval_set=[X_val, y_val], is_lock_free=False)

# generate predictions
y_pred = mdlLR.predict(X_val)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add an FM demo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, a FMModel demo has been added

@aksnzhy
Copy link
Owner

aksnzhy commented Feb 21, 2018

@randxie It looks great! Thanks! I will check some format issue in code review. After that, we can merge this version to master first.

@aksnzhy
Copy link
Owner

aksnzhy commented Mar 2, 2018

@randxie when I use ./install-python.sh, it shows me the error:

byte-compiling build/bdist.macosx-10.12-x86_64/egg/xlearn/sklearn.py to sklearn.pyc
File "build/bdist.macosx-10.12-x86_64/egg/xlearn/sklearn.py", line 146
def fit(self, *args, fields=None,
^
SyntaxError: invalid syntax

@randxie
Copy link
Contributor Author

randxie commented Mar 2, 2018

@aksnzhy Looks like Python2 does not support *args followed with name-value pairs. I will try to modify it so that it is compatible with Python 2 over the weekend.

@randxie
Copy link
Contributor Author

randxie commented Mar 3, 2018

@aksnzhy I have updated the sklearn interface to be compatible with Python2 and test the installation in conda Python 2.7 environment. Please have a try. It should work now.

@aksnzhy
Copy link
Owner

aksnzhy commented Mar 4, 2018

@randxie When I run python example_FM_wine.py, I get the following error:

Traceback (most recent call last):
File "example_FM_wine.py", line 2, in
from sklearn.datasets import load_wine
ImportError: cannot import name load_wine

Is this the error version of my scikit-learn?

@aksnzhy
Copy link
Owner

aksnzhy commented Mar 4, 2018

Any way, I will merge this version.

@aksnzhy aksnzhy merged commit d07e393 into aksnzhy:master Mar 4, 2018
@randxie
Copy link
Contributor Author

randxie commented Mar 4, 2018

It is potentially due to the scikit-learn version. FYI, mine is 0.19.1.

@aksnzhy
Copy link
Owner

aksnzhy commented Mar 4, 2018

@randxie All right. I have merged the code. Could you please provide a scikit_learn_api.rst in doc directory. And then I can move it to our online documentation. Thanks!

@randxie
Copy link
Contributor Author

randxie commented Mar 4, 2018

Certainly, I could spend some time next week to provide the documentation.

@jameslamb jameslamb mentioned this pull request May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants