New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial implementation of sklearn interface #72
Conversation
@@ -9,9 +9,9 @@ make | |||
|
|||
cd python-package | |||
if command -v python2; then | |||
sudo python2 setup.py install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "sudo" may cause "Permission denied" on some systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aksnzhy, Thanks for the comment. For the sudo priviledge, could we add sudo to the bash script instead? This assumption might be too strong if the user does not have sudo priviledge (e.g. on VM) or they are using conda environment that do not require sudo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. We can move this to bash.
fi | ||
|
||
if command -v python3; then | ||
sudo python3 setup.py install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
demo/sklearn/example_FM_iris.py
Outdated
y = (iris_data['target']==2) | ||
|
||
# initialize and fit model | ||
mdl = FMModel(task='binary', init=0.1, epoch=1, k=1, lr=0.01, reg_lambda=0.02) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can set 'k' to 4? Because the speed for k = 4 and k = 1 are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this example to xlearn/demo/classification/scikit_learn_demo/
@@ -1,4 +1,4 @@ | |||
#!/bin/bash | |||
# This script is for installization of xlearn python package | |||
# You may need to type your password here | |||
sudo python setup.py install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
python-package/xlearn/sklearn.py
Outdated
def __init__(self, model_type='fm', task='binary', metric='auc', | ||
lr=0.2, k =4, reg_lambda=0.1, init=0.1, fold=1, epoch=5, | ||
opt='sgd', nthread=4, alpha=1, beta=1, lambda_1=1, lambda_2=1, | ||
**kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to set the other hyper-parameters like early-stop, norm, or lock-free learning ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will expose early-stop, norm and lock-free learning in the fit method, as these parameters are only related to training.
python-package/xlearn/sklearn.py
Outdated
**kwargs) | ||
|
||
def __delete__(self, instance): | ||
super(FMModel, self).__delete(instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the other model (LR and FFM)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add LR and FFM in this week
python-package/xlearn/sklearn.py
Outdated
y = np.zeros(X.shape[0], dtype=np.int8) | ||
|
||
try: | ||
dump_svmlight_file(X, y, filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are writing the in-memory interface for DMatrix class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great. I will keep the internal conversion for now. Once DMatrix class is ready, I can update that accordingly. Is there any expected timeline for the DMatrix class?
@aksnzhy I have added FFMModel and LRModel. In addition, users now can use string to specify training/validation/testing files location in fit and predict method. |
mdlLR.fit(X_train, y_train, eval_set=[X_val, y_val], is_lock_free=False) | ||
|
||
# generate predictions | ||
y_pred = mdlLR.predict(X_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add an FM demo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, a FMModel demo has been added
@randxie It looks great! Thanks! I will check some format issue in code review. After that, we can merge this version to master first. |
@randxie when I use ./install-python.sh, it shows me the error: byte-compiling build/bdist.macosx-10.12-x86_64/egg/xlearn/sklearn.py to sklearn.pyc |
@aksnzhy Looks like Python2 does not support *args followed with name-value pairs. I will try to modify it so that it is compatible with Python 2 over the weekend. |
@aksnzhy I have updated the sklearn interface to be compatible with Python2 and test the installation in conda Python 2.7 environment. Please have a try. It should work now. |
@randxie When I run python example_FM_wine.py, I get the following error: Traceback (most recent call last): Is this the error version of my scikit-learn? |
Any way, I will merge this version. |
It is potentially due to the scikit-learn version. FYI, mine is 0.19.1. |
@randxie All right. I have merged the code. Could you please provide a scikit_learn_api.rst in doc directory. And then I can move it to our online documentation. Thanks! |
Certainly, I could spend some time next week to provide the documentation. |
Please see discussion in #68. Currently, the sklearn interface converts numpy array to libsvm format internally with the use of temporary files. Further improvement could be done once the in-memory conversion is ready.