Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converted dat file to json file to remove XGBoost versioning issues. #304

Merged
merged 15 commits into from Dec 20, 2021

Conversation

kperrynrel
Copy link
Collaborator

@kperrynrel kperrynrel commented Dec 1, 2021

  • Code changes are covered by tests
  • Code changes have been evaluated for compatibility/integration with TrendAnalysis
  • New functions added to __init__.py
  • API.rst is up to date, along with other sphinx docs pages
  • Example notebooks are rerun and differences in results scrutinized
  • Updated changelog

@kperrynrel kperrynrel linked an issue Dec 1, 2021 that may be closed by this pull request
@kandersolar
Copy link
Member

Cool! We should also change xgboost >= 1.3.3, <1.5.0 back to xgboost >= 1.3.3 in setup.py and make sure the CI still passes with the latest version of xgboost. And I guess make a new v2.1.2 changelog file with a note about removing the version restriction.

This PR is currently going to master (not development); just pointing that out in case it doesn't match @mdeceglie's release plan.

@kperrynrel
Copy link
Collaborator Author

image
So I've got the model loading but I keep getting a weird MacOS error for the requirements.txt that I've never seen before-- @kanderso-nrel , @mdeceglie have you seen this before? Is there a way to get libomp into the MacOS environment?

@kandersolar
Copy link
Member

@mdeceglie ran into that a while ago, check the thread in the rdtools teams channel from Aug 27. Seems like he fixed it by installing from conda; maybe we should do the same for the OS X tests.

Those tests were passing before though, so maybe for some reason it's caused by the new versions you're specifying in setup.py and requirements.txt? Are those changes necessary for some other reason or can we revert them?

@kandersolar
Copy link
Member

Oh, it's the requirements job that fails, not the pytest one, and specifically it's the import rdtools step:

- name: Import RdTools
run: |
python -c "import rdtools" # note: this imports from pwd, not site-packages

We could just remove that step -- it seemed like a good idea to include, but that step isn't actually necessary for the job to fulfill its purpose, so I'd say let's just remove it if it's causing problems. @kperrynrel, want to delete those workflow lines and see what happens here?

We might have to do something with conda if we pursue #303, but we can cross that bridge when we come to it.

@kperrynrel
Copy link
Collaborator Author

@kanderso-nrel removed the rdtools import and it's all passing. Let me know what you guys think!

setup.py Outdated
@@ -45,7 +45,7 @@
'h5py >= 2.7.1',
'plotly>=4.0.0',
'joblib >= 0.16.0',
'xgboost >= 1.3.3, <1.5.0',
'xgboost >= 1.5.1',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for updating the minimum? I thought the motivation for switching to JSON was so that we wouldn't have to worry about keeping track of xgboost versions. 1.3.3 seems to work for me locally with the JSON model, so maybe we don't actually need to change the version to 1.5.1 here and in the two requirements files? So long as it's not a headache for us it's nice to not be overly restrictive on dependency version ranges, so keeping 1.3.3 would be preferable if it doesn't break anything.

@kperrynrel
Copy link
Collaborator Author

@kanderso-nrel I just updated it in a previous push when I was trying to debug the failing rdtools import. I can drop back down again, it'll work

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @kperrynrel! I'll defer to @mdeceglie for what he wants to do about v2.1.2, changelog file, etc

@mdeceglie mdeceglie changed the base branch from master to v2.1.2 December 8, 2021 23:56
@kandersolar
Copy link
Member

Oh, can we remove joblib as an explicit dependency in setup.py now that we aren't using joblib.load anymore? It gets pulled in away because scikit-learn needs it, but if we're not using it directly then there's no need for us to track the dependency ourselves.

@mdeceglie
Copy link
Collaborator

As an integration test, I installed this in a fresh environment and ran TrendAnalysis_example_pvdaq4.ipynb and I am getting different results in the "Modifying and inspecting the filters" section. Can others reproduce this? This varies from the existing result
Screen Shot 2021-12-09 at 11 23 37 AM

@kperrynrel
Copy link
Collaborator Author

@kanderso-nrel successfully removed joblib in new commit

@kperrynrel
Copy link
Collaborator Author

@mdeceglie I did redownload the package in the new environment and was able to reproduce your results:
image
I had to regenerate the model as a JSON, and that involved retraining it with the same values. I didn't set a seed so the model would vary somewhat even with the same parameters + training data. I can retrain it again and see how much variance we get

@kperrynrel
Copy link
Collaborator Author

Hey all, I just wanted to keep things simple, so I resaved the old model in the new JSON format and replicated the original results in the jupyter notebook:
image

Copy link
Collaborator

@mdeceglie mdeceglie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @kperrynrel!

@mdeceglie mdeceglie merged commit 36e5730 into NREL:v2.1.2 Dec 20, 2021
@mdeceglie mdeceglie mentioned this pull request Dec 20, 2021
6 tasks
mdeceglie added a commit that referenced this pull request Dec 22, 2021
* update citation (#300)

* update sphinx citation

* less specific

* update readme with new citation

* capitalize Zenodo

* further capitalization

* Add DOI links to supporting refs

* DOI fix

* even more doi fixing

* Converted dat file to json file to remove XGBoost versioning issues. (#304)

* Converted dat file to json file.

* Updated the xgboost versions to no latest version restriction.

* Updated the model loading for xgb to handle multiple XGB versions.

* debugging mac os XGB error??

* bumped xgb to lower version to handle mac OS

* update versioning.

* more xgboost version updates...

* updated the workflows script to remove rdtools pip install.

* dropped xgboost version down

* updated the XGB model pull with os.path.join()

* fixed pep8 issues.

* Removed the joblib requirement per @kanderso's request.

* Regenerated the old model and saved as a json.

* run notebook for cell numbering

Co-authored-by: Perry <kperry@nrel.gov>
Co-authored-by: Michael Deceglie <Michael.Deceglie@nrel.gov>

* Update changelog

* double backticks in 2.1.1 log

* add cell 2 back in

* adjust changelog

* add PR to a changelog entry

Co-authored-by: Kirsten Perry <70228568+kperrynrel@users.noreply.github.com>
Co-authored-by: Perry <kperry@nrel.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clipping filter error with xgboost>=1.5.0
3 participants