Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case of negative model_ss for IV2SLS #101

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jsr-p
Copy link
Contributor

@jsr-p jsr-p commented Feb 8, 2024

When using the package I got an ValueError: math domain error when creating a table for IV2SLS models.
The reason for this is that we take the sqrt when computing the resid_std_error and in my case, the model_ss was negative.
Yes, the residual sum of squares ( model_ss) can be negative in the context of IV estimation; see this great answer.
This PR creates a function that handles this case.
However, I am not sure whether setting it to nan is the best solution, although at least this signals to the user that the model_ss is not interpretable; it should be dropped altogether for IV models.

@toobaz
Copy link
Collaborator

toobaz commented Feb 8, 2024

Hi @jsr-p , thanks for reporting this!

I am not sure nan is what we want: indeed dropping it altogether might be more appropriate.
I see in the Stata page linked by the stats.stackexchange post: "Whether a negative R2 should be reported or simply suppressed is a matter of taste."

I have no preference, do you?

@jsr-p
Copy link
Contributor Author

jsr-p commented Feb 8, 2024

When I tried to exclude it another error got raised later on while it assumes that resid_std_error is available for all objects.
We could insert an if statement at that point.
However, I do like that people get enlightened indirectly when noticing an nan for their R2 :D

I do not have any preferences 😄

@toobaz
Copy link
Collaborator

toobaz commented Feb 8, 2024

When I tried to exclude it another error got raised later on while it assumes that resid_std_error is available for all objects.

I think (but I should double check) that to omit it you should just set it to None.

However, I do like that people get enlightened indirectly when noticing an nan for their R2 :D

Well, if the goal is (some form of) enlightening, I would go for the negative one. Except I should understand how to compute it.

@jsr-p
Copy link
Contributor Author

jsr-p commented Feb 8, 2024

Hmm, thinking more about it, and actually opening up the good ol' c&t book + the documentation of linearmodels I have come to the conclusion that we should take the square root of the s2 attribute for all models coming from linearmodels.
The s2 attribute is the estimated variance of the residuals. The square root of this is the estimated standard deviation of the residuals i.e. the residual standard error.
if debiased=true then the s2 will have the degrees of freedom correction n / (n - k) (which is false by default :o).
The resulting resid_std_err value is also the one used to compute the standard errors shown in the regression output.
I.e. this is the actual $\hat{\sigma}$ that is interesting when interpreting the regression output.
This also holds for the panel data models from linearmodels.

The problem in the IV case can be seen here in the source code. The model_ss are not the residual sum of squares that are used to compute the standard errors.

I.e. the solution is actually simpler :=)

Update resid_std_error linearmodels

Update resid_std_error linearmodels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants