Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add _fit_auto_regression functions #139

Merged
merged 8 commits into from Apr 7, 2022

Conversation

mathause
Copy link
Member

@mathause mathause commented Mar 31, 2022

  • Tests added
  • Passes isort . && black . && flake8
  • Fully documented, including CHANGELOG.rst

Adds _fit_auto_regression_xr and _fit_auto_regression_np - thin wrappers around statsmodels.tsa.ar_model.AutoReg.

In contrast to the linear regression I have:

  • reworked the internal code by creating a dummy DataArray - I think that simplifies the code and looks pretty neat
  • not written any 'numerical' tests - it only checks the shape etc. of the results

I am not sure we can follow exactly the same pattern as for the LinearRegression class because the coeffs are averaged over the scens. It would be nice to have the same pattern for both but I am not sure which will have to give...

@yquilcaille @znicholls

params_gv["AR_order_sel"] = AR_order_sel
params_gv["AR_std_innovs"] = 0

res = list()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to result? res might stand for residuals.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep although if the function is small enough it will be clear from the context

mesmer/core/auto_regression.py Outdated Show resolved Hide resolved
tests/integration/test_auto_regression.py Outdated Show resolved Hide resolved
tests/integration/test_auto_regression.py Outdated Show resolved Hide resolved
tests/integration/test_auto_regression.py Outdated Show resolved Hide resolved
tests/integration/test_auto_regression.py Outdated Show resolved Hide resolved
mesmer/calibrate_mesmer/train_gv.py Outdated Show resolved Hide resolved
tests/integration/test_auto_regression.py Outdated Show resolved Hide resolved
tests/integration/test_auto_regression.py Outdated Show resolved Hide resolved
tests/integration/test_auto_regression.py Outdated Show resolved Hide resolved
tests/integration/test_auto_regression.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Apr 1, 2022

Codecov Report

Merging #139 (7d286f8) into master (71a9303) will increase coverage by 0.15%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #139      +/-   ##
==========================================
+ Coverage   79.43%   79.59%   +0.15%     
==========================================
  Files          29       30       +1     
  Lines        1405     1416      +11     
==========================================
+ Hits         1116     1127      +11     
  Misses        289      289              
Flag Coverage Δ
unittests 79.59% <100.00%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mesmer/calibrate_mesmer/train_gv.py 84.72% <100.00%> (-0.62%) ⬇️
mesmer/calibrate_mesmer/train_lv.py 81.66% <100.00%> (-0.60%) ⬇️
mesmer/core/auto_regression.py 100.00% <100.00%> (ø)
mesmer/core/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71a9303...7d286f8. Read the comment docs.

)

# TODO: are the names appropriate?
data_vars = {"intercept": intercept, "coeffs": coeffs, "standard_deviation": std}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: attach the order

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always in favour of full names so coefficients rather than coeffs but I don't mind too much

Copy link
Collaborator

@znicholls znicholls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful

)

# TODO: are the names appropriate?
data_vars = {"intercept": intercept, "coeffs": coeffs, "standard_deviation": std}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always in favour of full names so coefficients rather than coeffs but I don't mind too much

Standard deviation of the residuals.
"""

from statsmodels.tsa.ar_model import AutoReg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why here and not wrapped in try except in top level? Is that an xarray pattern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this for the linear regression because the sklearn class has the same name as ours (LinearRegression) so I just followed the same pattern.

@znicholls
Copy link
Collaborator

I am not sure we can follow exactly the same pattern as for the LinearRegression class because the coeffs are averaged over the scens. It would be nice to have the same pattern for both but I am not sure which will have to give

Hmm this is indeed tricky. Can we split into two steps: calculate coeffs for each scenario separately, then average over as a second? That might help us to have the same internal patterns, even if the interfaces do different stuff

AR_int_tmp = 0
AR_coefs_tmp = np.zeros(AR_order_sel)
AR_std_innovs_tmp = 0
data = gv[scen]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we split out a function for this inner loop, we might be able to make the pattern look more like linear regression does

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I guess it's almost as small as it can be already, the inner function would have to be called something like _fit_auto_regression_with_mean_over_runs I guess.

(Side note, what does 'run' mean here? Is that ensemble member?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is definitely a good idea. However, I am still not sure what our "outer" data structure should be (I have a prototype of a DataList but I am not entirely convinced).

run is an ensemble member. I just followed Lea's terminology. But I agree we should standardize this stuff...

@mathause
Copy link
Member Author

mathause commented Apr 3, 2022

Thanks for the feedback. I am still very unsure how this whole thing should look like in the end but I think these changes make sense regardless... So I would probably merge this more or less as is and try to refactor the next chunk and hope this helps me to see some patterns...

I have a slight preference towards shorter names (as long as their meaning is clear, which is subjective so maybe I should just use the long ones anyway :-P).

@mathause mathause merged commit 91a2868 into MESMER-group:master Apr 7, 2022
@mathause mathause deleted the add_fit_auto_regression branch April 7, 2022 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants