# dmlc/xgboost

Merged
merged 2 commits into from Jun 10, 2019
+93 −10

## Conversation

Projects
None yet
4 participants
Member

### trivialfis commented Jun 7, 2019 • edited

 close #4402 . On an unrelated topic, why is default metric for logistic regression being rmse?

Member

### RAMitchell commented Jun 7, 2019

 Do you have any references for this loss function?
Member Author

### trivialfis commented Jun 7, 2019

 @RAMitchell No. After some searching, I saw some usages on Kaggle site and posts, the intuition behind this is to reduce the impact of some large noisy difference like y = [1, 2, 10000], y_p = [1, 2, 3]. I get the gradient and hessian myself. General form of this loss is: J(y_p, y) = \sqrt{ 1/N [log(y_p + 1) - log(y + 1)]^2 }
Member

### RAMitchell commented Jun 9, 2019

 Had a quick look at this today. As you say I can't find any academic references to this, only implementations for software libraries. Looks interesting though. My attempt at taking derivatives looks very different to what you have, also see here. Can you show your working? Given that there are no references for this, it is a good opportunity to write a blog post for xgboost.ai explaining applications of a custom function. I would do something like generate a synthetic dataset from a Gaussian, add a few outliers from a different source and then show that this loss function is more robust to large outliers.
Member Author

### trivialfis commented Jun 9, 2019

 @RAMitchell My attempt at taking derivatives looks very different to what you have, also see here Thanks! My mistake, I ignored outer root and square leading to something like a mean log error... :( Will fix it soon. and then show that this loss function is more robust to large outliers Let me do some experiments later.
Member

### RAMitchell commented Jun 9, 2019 • edited

 The loss function does not necessarily need to include the square root (although it will need the square). This changes the 'steepness' of the function but will not affect convergence as it preserves the relative magnitude of gradients between training instances. Consider RMSE and the squared error objective, minimising squared error also has the effect of minimising the RMSE metric.
Member Author

### trivialfis commented Jun 9, 2019 • edited

 @RAMitchell Thanks, that's what I thought, see latest commit I just did squared log error. I'm doing some experiments and got confused that some predictions coming in metric are lesser than -1, while all my labels are greater than 0.
Member

### RAMitchell commented Jun 9, 2019

 I would just work through a very simple example with one training instance then see how it converges compared to your calculations on paper. You can also prepend your loss function with a factor of 1/2 like we do with squared error to get rid of the 2 in the derivative.
Member Author

### trivialfis commented Jun 9, 2019 • edited

 Okay, rmsle can not be used with squarederror, which might generate negative predictions, although I'm not sure why yet. Good news is the new objective looks promising. :) Normal squared error overfits. @RAMitchell Here is a quick run: Squared Error [0] dtrain-rmse:391.351 dtest-rmse:340.637 [1] dtrain-rmse:363.34 dtest-rmse:342.916 [2] dtrain-rmse:338.147 dtest-rmse:343.795 [3] dtrain-rmse:316.266 dtest-rmse:345.421 [4] dtrain-rmse:293.192 dtest-rmse:350.667 [5] dtrain-rmse:275.919 dtest-rmse:351.936 [6] dtrain-rmse:259.895 dtest-rmse:353.104 [7] dtrain-rmse:241.846 dtest-rmse:354.231 [8] dtrain-rmse:223.262 dtest-rmse:360.915 [9] dtrain-rmse:213.259 dtest-rmse:361.501 [10] dtrain-rmse:200.845 dtest-rmse:369.748 [11] dtrain-rmse:188.686 dtest-rmse:369.981 [12] dtrain-rmse:178.102 dtest-rmse:370.835 [13] dtrain-rmse:166.397 dtest-rmse:377.675 [14] dtrain-rmse:157.123 dtest-rmse:378.786 [15] dtrain-rmse:150.385 dtest-rmse:384.103 [16] dtrain-rmse:147.095 dtest-rmse:384.186 [17] dtrain-rmse:140.522 dtest-rmse:384.017 [18] dtrain-rmse:130.82 dtest-rmse:386.613 [19] dtrain-rmse:122.93 dtest-rmse:387.239 Finished Squared Error in: 0.5658330917358398 Squared Log Error [0] dtrain-rmsle:1.20707 dtest-rmsle:1.13542 [1] dtrain-rmsle:1.08544 dtest-rmsle:1.00958 [2] dtrain-rmsle:0.977126 dtest-rmsle:0.896455 [3] dtrain-rmsle:0.882881 dtest-rmsle:0.798379 [4] dtrain-rmsle:0.803323 dtest-rmsle:0.715883 [5] dtrain-rmsle:0.738639 dtest-rmsle:0.649231 [6] dtrain-rmsle:0.688928 dtest-rmsle:0.596975 [7] dtrain-rmsle:0.651433 dtest-rmsle:0.559778 [8] dtrain-rmsle:0.624716 dtest-rmsle:0.534039 [9] dtrain-rmsle:0.606279 dtest-rmsle:0.51714 [10] dtrain-rmsle:0.592096 dtest-rmsle:0.506786 [11] dtrain-rmsle:0.581805 dtest-rmsle:0.501063 [12] dtrain-rmsle:0.574321 dtest-rmsle:0.498102 [13] dtrain-rmsle:0.569913 dtest-rmsle:0.495869 [14] dtrain-rmsle:0.566171 dtest-rmsle:0.494843 [15] dtrain-rmsle:0.561991 dtest-rmsle:0.49469 [16] dtrain-rmsle:0.557604 dtest-rmsle:0.494806 [17] dtrain-rmsle:0.554808 dtest-rmsle:0.495154 [18] dtrain-rmsle:0.551964 dtest-rmsle:0.495313 [19] dtrain-rmsle:0.549387 dtest-rmsle:0.495596 Finished Squared Log Error in: 0.2771596908569336  import numpy as np import xgboost as xgb from typing import Tuple from time import time kRows = 10000 kCols = 100 kOutlier = 10000 kNumberOfOutliers = 64 kRatio = 0.7 def generate_data() -> Tuple[xgb.DMatrix]: x = np.random.randn(kRows, kCols) y = np.random.randn(kRows) y += np.abs(np.min(y)) # Create outliers for i in range(0, kNumberOfOutliers): ind = np.random.randint(0, len(y)-1) y[ind] += np.random.randint(0, kOutlier) train_portion = int(kRows * kRatio) train_x: np.ndarray = x[: train_portion] train_y: np.ndarray = y[: train_portion] dtrain = xgb.DMatrix(train_x, label=train_y) test_x = x[train_portion:] test_y = y[train_portion:] dtest = xgb.DMatrix(test_x, label=test_y) return dtrain, dtest def run(dtrain: xgb.DMatrix, dtest: xgb.DMatrix) -> None: print('Squared Error') squared_error = { 'objective': 'reg:squarederror', 'eval_metric': 'rmse', 'tree_method': 'gpu_hist', } start = time() xgb.train(squared_error, dtrain=dtrain, num_boost_round=20, evals=[(dtrain, 'dtrain'), (dtest, 'dtest')]) print('Finished Squared Error in:', time() - start, '\n') print('Squared Log Error') squared_log_error = { 'objective': 'reg:squaredlogerror', 'eval_metric': 'rmsle', 'tree_method': 'gpu_hist', } start = time() xgb.train(squared_log_error, dtrain=dtrain, num_boost_round=20, evals=[(dtrain, 'dtrain'), (dtest, 'dtest')]) print('Finished Squared Log Error in:', time() - start) if __name__ == '__main__': dtrain, dtest = generate_data() run(dtrain, dtest) 
Member Author

### trivialfis commented Jun 9, 2019 • edited

 @RAMitchell I would just work through a very simple example with one training instance then see how it converges compared to your calculations on paper. Thanks for the suggestion. Let me figure out why squared error objective can generate negative predictions. You can also prepend your loss function with a factor of 1/2 like we do with squared error to get rid of the 2 in the derivative. That brings some convenience. Will do.
 Add rmsle metric and reg:squaredlogerror objective. 
 f6b7e01 

### trivialfisforce-pushed the trivialfis:rmsle branch from e25c0eb to f6b7e01Jun 10, 2019

 Fix windows compilation. 
 9aa7078 

# Codecov Report

Merging #4541 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4541   +/-   ##
=======================================
Coverage   79.42%   79.42%
=======================================
Files          10       10
Lines        1735     1735
=======================================
Hits         1378     1378
Misses        357      357

Δ = absolute <relative> (impact), ø = not affected, ? = missing data

Member Author

### trivialfis commented Jun 10, 2019

 it is a good opportunity to write a blog post for xgboost.ai explaining applications of a custom function Let me do it in another PR.

### trivialfis merged commit 2f1319f into dmlc:master Jun 10, 2019 12 checks passed

#### 12 checks passed

Jenkins Linux: Build Stage built successfully
Details
Jenkins Linux: Formatting Check Stage built successfully
Details
Jenkins Linux: Get sources Stage built successfully
Details
Jenkins Linux: Test Stage built successfully
Details
Jenkins Win64: Build Stage built successfully
Details
Jenkins Win64: Get sources Stage built successfully
Details
Jenkins Win64: Test Stage built successfully
Details
codecov/patch Coverage not affected when comparing 9683fd4...9aa7078
Details
codecov/project 79.42% remains the same compared to 9683fd4
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/jenkins/pr-merge This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Member Author

### trivialfis commented Jun 10, 2019

 Thanks!