Regression Error at boundaries, is normalization on output required? #158

mchinen · 2019-12-18T20:57:39Z

When training with svm-train -s 4 -t 2 -n .6 -c .4 <myfile>
I find that the predictions are very much compressed. For example, myfile has labels in the 1 to 5 region, with a significant in 4 to 5, but the highest predicted value on the train set is below 4.0. It seems that there are fewer predictions in the 1.0 to 2.0 region as well.

I've played with NU_SVR and EP_SVR and the other parameters and haven't found a good solution to this. Here is my train file. Even when normalizing the labels to 0-1 I get the same behavior, where the highest predicted value is .72.

First, I'd like to know if I'm doing something incorrectly. Next, if this is a correct model, why is it so compressed? I would like the predictions to be closer to the boundaries of the training labels. I understand that we would expect some compression towards the mean in regression, but this seems more than I would expect. Should I normalize the predicted output to match the input label distribution?

Unnormalized:
mysvmtrainfile.txt
Normalized:
normsvmtrain.txt

The text was updated successfully, but these errors were encountered:

cjlin1 · 2019-12-20T20:20:45Z

It seems you haven't done proper parameter selection ./gridregression.py ~/Downloads/mysvmtrainfile.txt ... [local] -1 -5 -8 0.55566 (best c=16.0, g=1.0, p=0.25, mse=0.294086) 16.0 1.0 0.25 0.294086 libsvm-3.24$ ./svm-train -s 3 -c 16 -g 1 -p 0.25 ~/Downloads/mysvmtrainfile.txt .*.* optimization finished, #iter = 1778 nu = 0.509791 obj = -979.425784, rho = -2.770594 nSV = 238, nBSV = 161 libsvm-3.24$ ./svm-predict ~/Downloads/mysvmtrainfile.txt mysvmtrainfile.txt.model o Mean squared error = 0.208275 (regression) Squared correlation coefficient = 0.786998 (regression) A cross validation r^2 about 0.78 isn't too bad libsvm-3.24$ wc -l o 376 o libsvm-3.24$ grep -e "4\." o |wc -l 85 libsvm-3.24$ cut -f 1 -d ' ' ~/Downloads/mysvmtrainfile.txt | grep -e "4\." |wc -l 100

…

On 2019-12-18 12:57, Michael Chinen wrote: When training with svm-train -s 4 -t 2 -n .6 -c .4 <myfile> I find that the predictions are very much compressed. For example, myfile has labels in the 1 to 5 region, with a significant in 4 to 5, but the highest predicted value on the train set is below 4.0. It seems that there are fewer predictions in the 1.0 to 2.0 region as well. I've played with NU_SVR and EP_SVR and the other parameters and haven't found a good solution to this. I have Any ideas? Here is my train file. Even when normalizing the labels to 0-1 I get the same behavior, where the highest value is .72. Unnormalized: mysvmtrainfile.txt [1] Normalized: normsvmtrain.txt [2] -- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [3], or unsubscribe [4]. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#158?email_source=notifications\u0026email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ", "url": "#158?email_source=notifications\u0026email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ] Links: ------ [1] https://github.com/cjlin1/libsvm/files/3980481/mysvmtrainfile.txt [2] https://github.com/cjlin1/libsvm/files/3980504/normsvmtrain.txt [3] #158?email_source=notifications&email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ [4] https://github.com/notifications/unsubscribe-auth/ABI3BHWMSRJXVU6YO3WPLXTQZKFEXANCNFSM4J4RT4NQ

mchinen · 2019-12-20T22:34:04Z

Thanks so much, that does seem to be the issue. I hadn't realized the importance of searching the parameters before reading your PDF, and used our last model's parameters. I modified grid.py to do a search and found better parameters which were wildly different. I found I also needed to tune the nu paramter.

However, I see my problem is confounded by another issue that I also resolved:

When I use the svm-predict myinput.txt mymodel.txt binary I got predictions that I expected
When I use svm_predict() after svm_load(mymodel.txt) I get different incorrect predictions, because I was zero indexing the .index variable. Once I resolved that things worked as expected.

mchinen changed the title ~~Regression Error at boundaries, is normalization required?~~ Regression Error at boundaries, is normalization on output required? Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression Error at boundaries, is normalization on output required? #158

Regression Error at boundaries, is normalization on output required? #158

mchinen commented Dec 18, 2019 •

edited

Loading

cjlin1 commented Dec 20, 2019 via email

mchinen commented Dec 20, 2019

Regression Error at boundaries, is normalization on output required? #158

Regression Error at boundaries, is normalization on output required? #158

Comments

mchinen commented Dec 18, 2019 • edited Loading

cjlin1 commented Dec 20, 2019 via email

mchinen commented Dec 20, 2019

mchinen commented Dec 18, 2019 •

edited

Loading