-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression Error at boundaries, is normalization on output required? #158
Comments
mchinen
changed the title
Regression Error at boundaries, is normalization required?
Regression Error at boundaries, is normalization on output required?
Dec 18, 2019
It seems you haven't done proper parameter selection
./gridregression.py ~/Downloads/mysvmtrainfile.txt
...
[local] -1 -5 -8 0.55566 (best c=16.0, g=1.0, p=0.25, mse=0.294086)
16.0 1.0 0.25 0.294086
libsvm-3.24$ ./svm-train -s 3 -c 16 -g 1 -p 0.25
~/Downloads/mysvmtrainfile.txt
.*.*
optimization finished, #iter = 1778
nu = 0.509791
obj = -979.425784, rho = -2.770594
nSV = 238, nBSV = 161
libsvm-3.24$ ./svm-predict ~/Downloads/mysvmtrainfile.txt
mysvmtrainfile.txt.model o
Mean squared error = 0.208275 (regression)
Squared correlation coefficient = 0.786998 (regression)
A cross validation r^2 about 0.78 isn't too bad
libsvm-3.24$ wc -l o
376 o
libsvm-3.24$ grep -e "4\." o |wc -l
85
libsvm-3.24$ cut -f 1 -d ' ' ~/Downloads/mysvmtrainfile.txt | grep -e
"4\." |wc -l
100
…On 2019-12-18 12:57, Michael Chinen wrote:
When training with svm-train -s 4 -t 2 -n .6 -c .4 <myfile>
I find that the predictions are very much compressed. For example,
myfile has labels in the 1 to 5 region, with a significant in 4 to 5,
but the highest predicted value on the train set is below 4.0. It
seems that there are fewer predictions in the 1.0 to 2.0 region as
well.
I've played with NU_SVR and EP_SVR and the other parameters and
haven't found a good solution to this. I have Any ideas? Here is my
train file. Even when normalizing the labels to 0-1 I get the same
behavior, where the highest value is .72.
Unnormalized:
mysvmtrainfile.txt [1]
Normalized:
normsvmtrain.txt [2]
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub [3], or unsubscribe
[4]. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage",
"potentialAction": { ***@***.***": "ViewAction", "target":
"#158?email_source=notifications\u0026email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ",
"url":
"#158?email_source=notifications\u0026email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ",
"name": "View Issue" }, "description": "View this Issue on GitHub",
"publisher": { ***@***.***": "Organization", "name": "GitHub", "url":
"https://github.com" } } ]
Links:
------
[1] https://github.com/cjlin1/libsvm/files/3980481/mysvmtrainfile.txt
[2] https://github.com/cjlin1/libsvm/files/3980504/normsvmtrain.txt
[3]
#158?email_source=notifications&email_token=ABI3BHTWSZKFX2YNIOVM6BLQZKFEXA5CNFSM4J4RT4N2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBOQ3AQ
[4]
https://github.com/notifications/unsubscribe-auth/ABI3BHWMSRJXVU6YO3WPLXTQZKFEXANCNFSM4J4RT4NQ
|
Thanks so much, that does seem to be the issue. I hadn't realized the importance of searching the parameters before reading your PDF, and used our last model's parameters. I modified grid.py to do a search and found better parameters which were wildly different. I found I also needed to tune the nu paramter. However, I see my problem is confounded by another issue that I also resolved:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When training with
svm-train -s 4 -t 2 -n .6 -c .4 <myfile>
I find that the predictions are very much compressed. For example, myfile has labels in the 1 to 5 region, with a significant in 4 to 5, but the highest predicted value on the train set is below 4.0. It seems that there are fewer predictions in the 1.0 to 2.0 region as well.
I've played with NU_SVR and EP_SVR and the other parameters and haven't found a good solution to this. Here is my train file. Even when normalizing the labels to 0-1 I get the same behavior, where the highest predicted value is .72.
First, I'd like to know if I'm doing something incorrectly. Next, if this is a correct model, why is it so compressed? I would like the predictions to be closer to the boundaries of the training labels. I understand that we would expect some compression towards the mean in regression, but this seems more than I would expect. Should I normalize the predicted output to match the input label distribution?
Unnormalized:
mysvmtrainfile.txt
Normalized:
normsvmtrain.txt
The text was updated successfully, but these errors were encountered: