New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add _fit_auto_regression functions #139
add _fit_auto_regression functions #139
Conversation
mesmer/calibrate_mesmer/train_gv.py
Outdated
params_gv["AR_order_sel"] = AR_order_sel | ||
params_gv["AR_std_innovs"] = 0 | ||
|
||
res = list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to result
? res
might stand for residuals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep although if the function is small enough it will be clear from the context
Codecov Report
@@ Coverage Diff @@
## master #139 +/- ##
==========================================
+ Coverage 79.43% 79.59% +0.15%
==========================================
Files 29 30 +1
Lines 1405 1416 +11
==========================================
+ Hits 1116 1127 +11
Misses 289 289
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
) | ||
|
||
# TODO: are the names appropriate? | ||
data_vars = {"intercept": intercept, "coeffs": coeffs, "standard_deviation": std} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: attach the order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always in favour of full names so coefficients rather than coeffs but I don't mind too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful
) | ||
|
||
# TODO: are the names appropriate? | ||
data_vars = {"intercept": intercept, "coeffs": coeffs, "standard_deviation": std} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always in favour of full names so coefficients rather than coeffs but I don't mind too much
Standard deviation of the residuals. | ||
""" | ||
|
||
from statsmodels.tsa.ar_model import AutoReg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why here and not wrapped in try except in top level? Is that an xarray pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this for the linear regression because the sklearn class has the same name as ours (LinearRegression
) so I just followed the same pattern.
Hmm this is indeed tricky. Can we split into two steps: calculate coeffs for each scenario separately, then average over as a second? That might help us to have the same internal patterns, even if the interfaces do different stuff |
AR_int_tmp = 0 | ||
AR_coefs_tmp = np.zeros(AR_order_sel) | ||
AR_std_innovs_tmp = 0 | ||
data = gv[scen] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we split out a function for this inner loop, we might be able to make the pattern look more like linear regression does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I guess it's almost as small as it can be already, the inner function would have to be called something like _fit_auto_regression_with_mean_over_runs
I guess.
(Side note, what does 'run' mean here? Is that ensemble member?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is definitely a good idea. However, I am still not sure what our "outer" data structure should be (I have a prototype of a DataList
but I am not entirely convinced).
run is an ensemble member. I just followed Lea's terminology. But I agree we should standardize this stuff...
Thanks for the feedback. I am still very unsure how this whole thing should look like in the end but I think these changes make sense regardless... So I would probably merge this more or less as is and try to refactor the next chunk and hope this helps me to see some patterns... I have a slight preference towards shorter names (as long as their meaning is clear, which is subjective so maybe I should just use the long ones anyway :-P). |
isort . && black . && flake8
CHANGELOG.rst
Adds
_fit_auto_regression_xr
and_fit_auto_regression_np
- thin wrappers aroundstatsmodels.tsa.ar_model.AutoReg
.In contrast to the linear regression I have:
DataArray
- I think that simplifies the code and looks pretty neatI am not sure we can follow exactly the same pattern as for the
LinearRegression
class because the coeffs are averaged over the scens. It would be nice to have the same pattern for both but I am not sure which will have to give...@yquilcaille @znicholls