-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix dataflow submission #1691
fix dataflow submission #1691
Conversation
The dependencies in `workflows/dataflow/setup.py` were incompatible with constraints.txt. Dataflow now supports running via a docker image, which will be easier to ensure a consistent environment. * switch to using docker for dataflow jobs * ensure dependencies are compatible with the versions installed in the dataflow base image These dependencies mostly match our stack, but did force some updates to libraries in the google stack like tensorflow and the google-cloud-* packages. Resolves #1678
* regression data needed updating (probably bc of new numpy version) * fix newly enforced behavior that np.pad cannot pad int data with np.nan
Looks like we need to
|
@spencerkclark The regression tests are failing because some of the coarsening routines want to pad an array of ints with I've been tracking down a lot of errors with this PR, so I would definitely would appreciate your help fixing that. Feel free to push directly here if you have any quick fixes. |
This ensures that the dummy coordinates we construct during the vertical remapping process have dtype np.float32 from the start. Our coarsening procedure of these coordinates requires they be of a floating point type.
for some reason these tests fail when run via the `coverage` utility. We don't look at coverage stats much, so I took the expedient option and removed it for this small number of tests.
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All new deps are ok.
@@ -18,93 +18,110 @@ astor==0.8.1 | |||
astunparse==1.6.3 | |||
async-generator==1.10 | |||
async-timeout==3.0.1 | |||
attrs==19.3.0 | |||
avro-python3==1.9.2.1 | |||
atomicwrites==1.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MIT
babel==2.8.0 | ||
backcall==0.2.0 | ||
backoff==1.10.0 | ||
base58==2.1.0 | ||
beautifulsoup4==4.10.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mit
bleach==3.3.0 | ||
blinker==1.4 | ||
bokeh==2.2.3 | ||
bs4==0.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a real package. is related to beautifulsoup4 above. https://pypi.org/project/bs4/
cffi==1.14.4 | ||
cfgv==3.2.0 | ||
cftime==1.2.1 | ||
chardet==3.0.4 | ||
charset-normalizer==2.0.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MIT
colorcet==2.0.6 | ||
commonmark==0.9.1 | ||
conda-lock==0.4.1 | ||
configparser==5.0.2 | ||
crcmod==1.7 | ||
cryptography==36.0.1 | ||
cycler==0.10.0 | ||
cython==0.29.27 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apache
testpath==0.4.4 | ||
tf-estimator-nightly==2.8.0.dev2021122109 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apache 2
testpath==0.4.4 | ||
tf-estimator-nightly==2.8.0.dev2021122109 | ||
threadpoolctl==3.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bsd 3
typing-inspect==0.6.0 | ||
typing-extensions==4.0.1 | ||
typing-inspect==0.7.1 | ||
typing-utils==0.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apache 2
tzlocal==2.1 | ||
urllib3==1.25.10 | ||
uritemplate==4.1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bsd-3 or apache 2
xarray==0.19.0 | ||
xgcm==0.6.1 | ||
xmltodict==0.12.0 | ||
xpartition==0.2.0 | ||
yarl==1.6.3 | ||
yq==2.11.0 | ||
zarr==2.7.0 | ||
zipp==3.7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Nice that we can use a docker image for dataflow now.
@@ -76,13 +76,10 @@ def test_MultiDatasetMapper_keys(multi_dataset_mapper, expected_keys): | |||
|
|||
|
|||
def test_MultiDatasetMapper_value(multi_dataset_mapper, datasets): | |||
single_time = datasets[0][TIME_NAME].isel({TIME_NAME: 0}).item() | |||
time_key = pd.to_datetime(single_time).strftime(TIME_FMT) | |||
single_time = pd.to_datetime(datasets[0][TIME_NAME].isel({TIME_NAME: 0}).item()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, the fact that calling item()
on a np.datetime64
array returns an integer rather than a np.datetime64
scalar is weird, but a known issue (pydata/xarray#3256). This is definitely the right thing to do (I probably should have written things this way to begin with).
I think with this change we could probably still use sel
down below, but the meaning of this test is still the same with isel
, so there's no reason to change it. Interesting that it worked before, but now no longer works (I guess pandas used to automatically cast integer indexers to a DatetimeIndex to np.datetime64
values, and now it no longer does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it was able to look things up with the int
. Definitely outside of my expertise.
master is failing because the dependencies in
workflows/dataflow/setup.py
were incompatible withconstraints.txt. Dataflow now supports running via a docker image, which will be
easier to ensure a consistent environment.
base image
These dependencies mostly match our stack, but did force some updates to
libraries in the google stack like tensorflow and the google-cloud-* packages.
Resolves #1678