Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rmsle. #4541

Merged
merged 2 commits into from Jun 10, 2019

Conversation

Projects
None yet
4 participants
@trivialfis
Copy link
Member

commented Jun 7, 2019

close #4402 .

On an unrelated topic, why is default metric for logistic regression being rmse?

@trivialfis trivialfis requested review from RAMitchell and CodingCat Jun 7, 2019

@RAMitchell

This comment has been minimized.

Copy link
Member

commented Jun 7, 2019

Do you have any references for this loss function?

@trivialfis

This comment has been minimized.

Copy link
Member Author

commented Jun 7, 2019

@RAMitchell No. After some searching, I saw some usages on Kaggle site and posts, the intuition behind this is to reduce the impact of some large noisy difference like y = [1, 2, 10000], y_p = [1, 2, 3]. I get the gradient and hessian myself. General form of this loss is:

J(y_p, y) = \sqrt{ 1/N [log(y_p + 1) - log(y + 1)]^2 }
@RAMitchell

This comment has been minimized.

Copy link
Member

commented Jun 9, 2019

Had a quick look at this today. As you say I can't find any academic references to this, only implementations for software libraries. Looks interesting though.

My attempt at taking derivatives looks very different to what you have, also see here. Can you show your working?

Given that there are no references for this, it is a good opportunity to write a blog post for xgboost.ai explaining applications of a custom function. I would do something like generate a synthetic dataset from a Gaussian, add a few outliers from a different source and then show that this loss function is more robust to large outliers.

@trivialfis

This comment has been minimized.

Copy link
Member Author

commented Jun 9, 2019

@RAMitchell

My attempt at taking derivatives looks very different to what you have, also see here

Thanks! My mistake, I ignored outer root and square leading to something like a mean log error... :( Will fix it soon.

and then show that this loss function is more robust to large outliers

Let me do some experiments later.

@RAMitchell

This comment has been minimized.

Copy link
Member

commented Jun 9, 2019

The loss function does not necessarily need to include the square root (although it will need the square). This changes the 'steepness' of the function but will not affect convergence as it preserves the relative magnitude of gradients between training instances. Consider RMSE and the squared error objective, minimising squared error also has the effect of minimising the RMSE metric.

@trivialfis

This comment has been minimized.

Copy link
Member Author

commented Jun 9, 2019

@RAMitchell Thanks, that's what I thought, see latest commit I just did squared log error. I'm doing some experiments and got confused that some predictions coming in metric are lesser than -1, while all my labels are greater than 0.

@RAMitchell

This comment has been minimized.

Copy link
Member

commented Jun 9, 2019

I would just work through a very simple example with one training instance then see how it converges compared to your calculations on paper. You can also prepend your loss function with a factor of 1/2 like we do with squared error to get rid of the 2 in the derivative.

@trivialfis

This comment has been minimized.

Copy link
Member Author

commented Jun 9, 2019

Okay, rmsle can not be used with squarederror, which might generate negative predictions, although I'm not sure why yet. Good news is the new objective looks promising. :) Normal squared error overfits. @RAMitchell Here is a quick run:

Squared Error
[0]	dtrain-rmse:391.351	dtest-rmse:340.637
[1]	dtrain-rmse:363.34	dtest-rmse:342.916
[2]	dtrain-rmse:338.147	dtest-rmse:343.795
[3]	dtrain-rmse:316.266	dtest-rmse:345.421
[4]	dtrain-rmse:293.192	dtest-rmse:350.667
[5]	dtrain-rmse:275.919	dtest-rmse:351.936
[6]	dtrain-rmse:259.895	dtest-rmse:353.104
[7]	dtrain-rmse:241.846	dtest-rmse:354.231
[8]	dtrain-rmse:223.262	dtest-rmse:360.915
[9]	dtrain-rmse:213.259	dtest-rmse:361.501
[10]	dtrain-rmse:200.845	dtest-rmse:369.748
[11]	dtrain-rmse:188.686	dtest-rmse:369.981
[12]	dtrain-rmse:178.102	dtest-rmse:370.835
[13]	dtrain-rmse:166.397	dtest-rmse:377.675
[14]	dtrain-rmse:157.123	dtest-rmse:378.786
[15]	dtrain-rmse:150.385	dtest-rmse:384.103
[16]	dtrain-rmse:147.095	dtest-rmse:384.186
[17]	dtrain-rmse:140.522	dtest-rmse:384.017
[18]	dtrain-rmse:130.82	dtest-rmse:386.613
[19]	dtrain-rmse:122.93	dtest-rmse:387.239
Finished Squared Error in: 0.5658330917358398 

Squared Log Error
[0]	dtrain-rmsle:1.20707	dtest-rmsle:1.13542
[1]	dtrain-rmsle:1.08544	dtest-rmsle:1.00958
[2]	dtrain-rmsle:0.977126	dtest-rmsle:0.896455
[3]	dtrain-rmsle:0.882881	dtest-rmsle:0.798379
[4]	dtrain-rmsle:0.803323	dtest-rmsle:0.715883
[5]	dtrain-rmsle:0.738639	dtest-rmsle:0.649231
[6]	dtrain-rmsle:0.688928	dtest-rmsle:0.596975
[7]	dtrain-rmsle:0.651433	dtest-rmsle:0.559778
[8]	dtrain-rmsle:0.624716	dtest-rmsle:0.534039
[9]	dtrain-rmsle:0.606279	dtest-rmsle:0.51714
[10]	dtrain-rmsle:0.592096	dtest-rmsle:0.506786
[11]	dtrain-rmsle:0.581805	dtest-rmsle:0.501063
[12]	dtrain-rmsle:0.574321	dtest-rmsle:0.498102
[13]	dtrain-rmsle:0.569913	dtest-rmsle:0.495869
[14]	dtrain-rmsle:0.566171	dtest-rmsle:0.494843
[15]	dtrain-rmsle:0.561991	dtest-rmsle:0.49469
[16]	dtrain-rmsle:0.557604	dtest-rmsle:0.494806
[17]	dtrain-rmsle:0.554808	dtest-rmsle:0.495154
[18]	dtrain-rmsle:0.551964	dtest-rmsle:0.495313
[19]	dtrain-rmsle:0.549387	dtest-rmsle:0.495596
Finished Squared Log Error in: 0.2771596908569336

import numpy as np
import xgboost as xgb
from typing import Tuple
from time import time

kRows = 10000
kCols = 100

kOutlier = 10000
kNumberOfOutliers = 64

kRatio = 0.7


def generate_data() -> Tuple[xgb.DMatrix]:
    x = np.random.randn(kRows, kCols)
    y = np.random.randn(kRows)
    y += np.abs(np.min(y))

    # Create outliers
    for i in range(0, kNumberOfOutliers):
        ind = np.random.randint(0, len(y)-1)
        y[ind] += np.random.randint(0, kOutlier)

    train_portion = int(kRows * kRatio)

    train_x: np.ndarray = x[: train_portion]
    train_y: np.ndarray = y[: train_portion]
    dtrain = xgb.DMatrix(train_x, label=train_y)

    test_x = x[train_portion:]
    test_y = y[train_portion:]
    dtest = xgb.DMatrix(test_x, label=test_y)
    return dtrain, dtest


def run(dtrain: xgb.DMatrix, dtest: xgb.DMatrix) -> None:
    print('Squared Error')
    squared_error = {
        'objective': 'reg:squarederror',
        'eval_metric': 'rmse',
        'tree_method': 'gpu_hist',
    }
    start = time()
    xgb.train(squared_error,
              dtrain=dtrain,
              num_boost_round=20,
              evals=[(dtrain, 'dtrain'), (dtest, 'dtest')])
    print('Finished Squared Error in:', time() - start, '\n')

    print('Squared Log Error')
    squared_log_error = {
        'objective': 'reg:squaredlogerror',
        'eval_metric': 'rmsle',
        'tree_method': 'gpu_hist',
    }
    start = time()
    xgb.train(squared_log_error,
              dtrain=dtrain,
              num_boost_round=20,
              evals=[(dtrain, 'dtrain'), (dtest, 'dtest')])
    print('Finished Squared Log Error in:', time() - start)


if __name__ == '__main__':
    dtrain, dtest = generate_data()
    run(dtrain, dtest)
@trivialfis

This comment has been minimized.

Copy link
Member Author

commented Jun 9, 2019

@RAMitchell

I would just work through a very simple example with one training instance then see how it converges compared to your calculations on paper.

Thanks for the suggestion. Let me figure out why squared error objective can generate negative predictions.

You can also prepend your loss function with a factor of 1/2 like we do with squared error to get rid of the 2 in the derivative.

That brings some convenience. Will do.

@trivialfis trivialfis force-pushed the trivialfis:rmsle branch from e25c0eb to f6b7e01 Jun 10, 2019

@codecov-io

This comment has been minimized.

Copy link

commented Jun 10, 2019

Codecov Report

Merging #4541 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4541   +/-   ##
=======================================
  Coverage   79.42%   79.42%           
=======================================
  Files          10       10           
  Lines        1735     1735           
=======================================
  Hits         1378     1378           
  Misses        357      357

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9683fd4...9aa7078. Read the comment docs.

@trivialfis

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2019

ping @RAMitchell @CodingCat

it is a good opportunity to write a blog post for xgboost.ai explaining applications of a custom function

Let me do it in another PR.

@trivialfis trivialfis merged commit 2f1319f into dmlc:master Jun 10, 2019

12 checks passed

Jenkins Linux: Build Stage built successfully
Details
Jenkins Linux: Formatting Check Stage built successfully
Details
Jenkins Linux: Get sources Stage built successfully
Details
Jenkins Linux: Test Stage built successfully
Details
Jenkins Win64: Build Stage built successfully
Details
Jenkins Win64: Get sources Stage built successfully
Details
Jenkins Win64: Test Stage built successfully
Details
codecov/patch Coverage not affected when comparing 9683fd4...9aa7078
Details
codecov/project 79.42% remains the same compared to 9683fd4
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/jenkins/pr-merge This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@trivialfis trivialfis deleted the trivialfis:rmsle branch Jun 10, 2019

@trivialfis

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.