Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard errors too small when performing weighted fit #103

Closed
fincardona opened this issue Jan 21, 2019 · 10 comments
Closed

Standard errors too small when performing weighted fit #103

fincardona opened this issue Jan 21, 2019 · 10 comments

Comments

@fincardona
Copy link

fincardona commented Jan 21, 2019

Hello,

I want to fit some data with this model :
@. model(z, p) = p[1] / cos(deg2rad(z)) + p[2].

When z = collect(0:5:50) and p = [A, B] (where A = 1.408363211310102e-6 and
B = 1.4034083334666746e-5) I get my theoretical curve.

The data I want to fit are y. I know also the standard deviation of y, let's call it delta_y.

When I fit my data with:
model_fit = LsqFit.curve_fit(model, z, y, p0) I obtain a reasonable standard error.
When instead I weight my fit with the inverse of the variance:
model_fit = LsqFit.curve_fit(model, z, y, 1 ./delta_y.^2, p0) I get an insane standard error that is about 13 orders of magnitude smaller than the previous case.

How is it possible?

I have:
y = [1.54378e-5, 1.54341e-5, 1.55381e-5, 1.54877e-5, 1.55381e-5, 1.56225e-5, 1.56568e-5, 1.56822e-5, 1.58282e-5, 1.60016e-5, 1.62545e-5] ;

delta_y = [1.0376e-7, 1.05538e-7, 1.82407e-7, 2.07246e-7, 1.65643e-7, 1.76996e-7, 1.75132e-7, 2.05185e-7, 1.28095e-7, 1.07386e-7, 1.9007e-7] ;

p0 = [0., 0.] ;

to compute the standard error I do:
LsqFit.standard_error(model_fit).

Thanks

edit from @pkofod

using LsqFit
@. model(z, p) = p[1] / cos(deg2rad(z)) + p[2]
z = collect(0:5:50)
A = 1.408363211310102e-6
B = 1.4034083334666746e-5
p = [A, B]

y = [1.54378e-5, 1.54341e-5, 1.55381e-5, 1.54877e-5, 1.55381e-5, 1.56225e-5, 1.56568e-5, 1.56822e-5, 1.58282e-5, 1.60016e-5, 1.62545e-5]
delta_y = [1.0376e-7, 1.05538e-7, 1.82407e-7, 2.07246e-7, 1.65643e-7, 1.76996e-7, 1.75132e-7, 2.05185e-7, 1.28095e-7, 1.07386e-7, 1.9007e-7] 
p0 = [0., 0.]

model_fit = LsqFit.curve_fit(model, z, y, p0)
model_fit_weighted = LsqFit.curve_fit(model, z, y, 1 ./delta_y.^2, p0) 

LsqFit.stderror(model_fit)
LsqFit.stderror(model_fit_weighted)
@fincardona fincardona changed the title Standard errors too big when performing weighted fit Standard errors too small when performing weighted fit Jan 21, 2019
@pkofod
Copy link
Member

pkofod commented Jan 21, 2019

Have you tried master? There's been a lot of fixes wrt these things, but there hasn't been a new release in some time.

@fincardona
Copy link
Author

Yes I did

@pkofod
Copy link
Member

pkofod commented Mar 19, 2019

Yes I did

This is what I get on the latest release

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.1.0 (2019-01-21)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using LsqFit

julia> @. model(z, p) = p[1] / cos(deg2rad(z)) + p[2]
model (generic function with 1 method)

julia> z = collect(0:5:50)
11-element Array{Int64,1}:
  0
  5
 10
 15
 20
 25
 30
 35
 40
 45
 50

julia> A = 1.408363211310102e-6
1.408363211310102e-6

julia> B = 1.4034083334666746e-5
1.4034083334666746e-5

julia> p = [A, B]
2-element Array{Float64,1}:
 1.408363211310102e-6 
 1.4034083334666746e-5

julia> y = [1.54378e-5, 1.54341e-5, 1.55381e-5, 1.54877e-5, 1.55381e-5, 1.56225e-5, 1.56568e-5, 1.56822e-5, 1.58282e-5, 1.60016e-5, 1.62545e-5]
11-element Array{Float64,1}:
 1.54378e-5
 1.54341e-5
 1.55381e-5
 1.54877e-5
 1.55381e-5
 1.56225e-5
 1.56568e-5
 1.56822e-5
 1.58282e-5
 1.60016e-5
 1.62545e-5

julia> delta_y = [1.0376e-7, 1.05538e-7, 1.82407e-7, 2.07246e-7, 1.65643e-7, 1.76996e-7, 1.75132e-7, 2.05185e-7, 1.28095e-7, 1.07386e-7, 1.9007e-7]
11-element Array{Float64,1}:
 1.0376e-7 
 1.05538e-7
 1.82407e-7
 2.07246e-7
 1.65643e-7
 1.76996e-7
 1.75132e-7
 2.05185e-7
 1.28095e-7
 1.07386e-7
 1.9007e-7 

julia> p0 = [0., 0.]
2-element Array{Float64,1}:
 0.0
 0.0

julia> model_fit = LsqFit.curve_fit(model, z, y, p0)
LsqFit.LsqFitResult{Array{Float64,1},Array{Float64,1},Array{Float64,2},Array{Float64,1}}([1.36546e-6, 1.40822e-5], [9.85793e-9, 1.87738e-8, -6.93777e-8, 8.12605e-9, -2.81006e-9, -3.36839e-8, 2.09492e-9, 6.69162e-8, 3.64786e-8, 1.16489e-8, -4.80249e-8], [1.0 1.0; 1.00382 1.0; … ; 1.41421 1.0; 1.55572 1.0], true, Float64[])

julia> model_fit_weighted = LsqFit.curve_fit(model, z, y, 1 ./delta_y.^2, p0)
LsqFit.LsqFitResult{Array{Float64,1},Array{Float64,1},Array{Float64,2},Array{Float64,1}}([1.36088e-6, 1.40818e-5], [0.0466688, 0.130197, -0.408229, 0.0142291, -0.0490184, -0.221321, -0.0207224, 0.296754, 0.234703, 0.0441051, -0.292448], [9.63763e6 9.63763e6; 9.51145e6 9.47526e6; … ; 1.31694e7 9.3122e6; 8.185e6 5.26122e6], true, [9.28838e13, 8.97806e13, 3.0055e13, 2.32824e13, 3.64463e13, 3.19208e13, 3.26039e13, 2.37525e13, 6.09447e13, 8.67171e13, 2.76804e13])

julia> LsqFit.stderror(model_fit)
2-element Array{Float64,1}:
 6.899145801254534e-8
 8.165418175339927e-8

julia> LsqFit.stderror(model_fit_weighted)
2-element Array{Float64,1}:
 2.403839002660062e-7
 2.829232824215933e-7

@pkofod
Copy link
Member

pkofod commented Mar 19, 2019

Please reopen if you disagree that this is fixed.

@pkofod pkofod closed this as completed Mar 19, 2019
@jebej
Copy link
Contributor

jebej commented Jul 15, 2019

If a vector of 1s is passed as weights, the errors do not match the unweigthed fit. I assume that this is because of the missing MSE scaling when estimating the covariance matrix in the else case here:

LsqFit.jl/src/curve_fit.jl

Lines 195 to 204 in 89c5b20

if isempty(fit.wt)
r = fit.resid
# compute the covariance matrix from the QR decomposition
Q, R = qr(J)
Rinv = inv(R)
covar = Rinv*Rinv'*mse(fit)
else
covar = inv(J'*J)
end

@jebej
Copy link
Contributor

jebej commented Jul 15, 2019

I am not an expert however, and while adding the scaling makes the two cases match up, I am not sure if that is the correct thing to do.

@pkofod
Copy link
Member

pkofod commented Jul 17, 2019

Would you mind posting a simple example?

@jebej
Copy link
Contributor

jebej commented Jul 23, 2019

using LsqFit

seq_lengths = [1,2,3,4,6,9,13,18,26,38,55,78,113,162,234,336,483,695,1000];

mean_Pg = [0.9965055343611369,0.9936278616188986,0.991402684125329,0.988779893392475,0.9839052359078795,
0.97561643370953,0.967542364407281,0.9591518093474497,0.9158611491602895,0.9107968163163118,
0.8780392546173316,0.8034082322818229,0.7858116656065092,0.7211658805816595,0.6547494271420033,
0.5966066491668779,0.5391289873890097,0.5149058476594188,0.5032240784925216];

exp_decay = (m,p) -> p[2].*p[1].^m .+ p[3];

# no weights (i.e. equal weights)
fitres1 = curve_fit(exp_decay,seq_lengths,mean_Pg,[1,0.5,0.5]);
@show stderror(fitres1)

# equal weights
fitres2 = curve_fit(exp_decay,seq_lengths,mean_Pg,one.(mean_Pg),[1,0.5,0.5]);
@show stderror(fitres2)

which gives

stderror(fitres1) = [0.00021317, 0.00714513, 0.00699326]
stderror(fitres2) = [0.0221648, 0.74293, 0.72714]

@pkofod
Copy link
Member

pkofod commented Aug 27, 2019

I never got around to this, but I think this is to be expected, and what is stated in the manual. When you're passing in your own weights, you're bypassing the mechanism as stated. Essentially, you're assuming variance one on all the residuals.

@jebej
Copy link
Contributor

jebej commented Aug 27, 2019

I'm not sure that makes sense; shouldn't the case where you are not passing weights correspond to uniform weights?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants