-
Notifications
You must be signed in to change notification settings - Fork 118
[HIVEMALL-101] Separate optimizer implementation #79
Conversation
} | ||
|
||
@Override | ||
protected final void checkTargetValue(final float target) throws UDFArgumentException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu This is a regressor which simply predicts real values. Why did you create this method? Values only in [0,1] are allowed...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myui Ah, it makes sense since originally the generic regressor used LossFunctions.logisticLoss(target, predicted);
. Thanks!
@takuti |
Yep, that's why logistic loss is not selectable for now. |
@takuti better to have this kind of documents. BTW refer [1,2] for how Spark/scikit incorporates regularized updates. FYI |
Changes Unknown when pulling c57d09e on takuti:HIVEMALL-101 into ** on apache:master**. |
^^^ since generic regressor does not accept classification loss (e.g. logloss) just like sklearn, I keep removing |
I listed TODOs in the top comment. If you have any other things I need to care, plz let me know. |
@takuti Functional tests to confirm accuracy of |
I supported The idea is just accumulating For SGD, it's clearly equivalent to what RegressorBaseUDTF does. However, I'm a little bit afraid if I can do the same thing for Adagrad, Adam, Adadelta and AdagradRDA. (Currently, doing the same thing for Adagrad, Adam and Adadelta are allowed. By contrast, AdagradRDA + BTW, practically, I observed that the naive Adagrad + |
I tested generic classifier and regressor on EMR by using the a9a data. Classifier
|
online | mini-batch | |
---|---|---|
logress |
0.8414716540753026 | 0.848965051286776 |
train_classifier |
0.8414716540753026 | 0.848965051286776 |
Regression
Solved the a9a label prediction as a regression problem.
// Since non-generic Adagrad was designed for logistic loss (i.e. classification), we cannot compare it with generic regressor under the exactly same condition.
train_adagrad_regr
(internally uses logistic loss)
drop table if exists adagrad_model;
create table adagrad_model as
select
feature,
avg(weight) as weight
from
(
select
train_adagrad_regr(features, label) as (feature, weight)
from
train_x3
) t
group by feature;
WITH test_exploded as (
select
rowid,
label,
extract_feature(feature) as feature,
extract_weight(feature) as value
from
test LATERAL VIEW explode(add_bias(features)) t AS feature
),
predict as (
select
t.rowid,
sigmoid(sum(m.weight * t.value)) as prob
from
test_exploded t LEFT OUTER JOIN
adagrad_model m ON (t.feature = m.feature)
group by
t.rowid
),
submit as (
select
t.label as actual,
pd.prob as probability
from
test t JOIN predict pd
on (t.rowid = pd.rowid)
)
select rmse(probability, actual) from submit;
train_regression
train_regression(features, label, '-loss squaredloss -opt AdaGrad -reg no') as (feature, weight)
-- train_regression(features, label, '-loss squaredloss -opt AdaGrad -reg no -mini_batch 10') as (feature, weight)
online | mini-batch | |
---|---|---|
train_adagrad_regr (logistic loss) |
0.3254586866367811 | -- |
train_regression (squared loss) |
0.3356422627079689 | 0.3348889704327727 |
As I mentioned in the last comment, I'm afraid whether the -mini_batch
option works correctly for Adagrad. Fortunately, this example showed that the option slightly improved the accuracy of prediction in terms of RMSE.
1 similar comment
@takuti |
@takuti I guess no mix-server-related issues in this PR. Will review for that though. |
@myui Almost done basically. Could you review when you get a chance? One thing I like to discuss here is that
can be modified to
for example. If it sounds good for @myui, I will do so. Of course it's not mandatory, so keeping the current duplicated code is no problem. |
@takuti It's preferred to have an abstract class. Please create it.
|
@myui Finished~ |
for regression and classification, respectively. + updated the order of loss functions.
`loss_function` is not a part of Optimizer
They can be useful even for classification. Scikit-learn is same: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/stochastic_gradient.py#L570-L581
- Use appropriate (i.e. strongly correlated) data - Target value has to be float OIs
* except for AdagradRDA * update unit tests accordingly
and update Regularizer implementation to integrate L1/L2 with ElasticNet
It's more useful for the future `-iter` support
What changes were proposed in this pull request?
Finalize #14
What type of PR is it?
Improvement, Feature
What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-101
How was this patch tested?
Todo:
-loss logloss
with the currentlogress()
UDTF-mini_batch
option in a similar way to what RegressionBaseUDTF does; accumulate gradients over M samples, and update for mean valueSave samples to external files as the other UDTFs (see LDA/pLSA UDTFs) and addThis should be the other issue [HIVEMALL-108]-iter
option-iter
support