New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client side processing #338
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick initial review comments
openeo/rest/localconnection.py
Outdated
from openeo.rest.datacube import DataCube | ||
from openeo.internal.jupyter import VisualDict, VisualList | ||
|
||
class LocalConnection(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be discussed: should rest.Connection
and this LocalConnection
have common parent?
…eneo-python-client into client_side_proc
@soxofaan after the last commits this is the current syntax: import openeo
local_conn = openeo.local.connection.LocalConnection("./results/")
local_conn.list_collections() I would still like to simplify it a bit similarly to the Edit: does it make sense to have two files with the same name? |
Specific requirements added here b3067cd. Now it can be installed using |
|
||
from openeo.metadata import CollectionMetadata | ||
from openeo.internal.graph_building import PGNode | ||
from openeo.rest.datacube import DataCube |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to think about: maybe openeo.rest.DataCube
is tied too tightly with the "REST" kind of cubes, and it won't play nice with "local" execution. Maybe we need a local DataCube variant. Or maybe the "rest" DataCube can be decoupled enough from REST aspects.
No, I think ultimately the local processing could also be covered from the central entry point If you want to make it a bit more compact you could have
with
That doesn't look too bad to me
I think that's fine. It will just be an implementation detail and the end user would/should not be confronted with that. |
@soxofaan I updated the sample notebook including a sample workflow of local processing using the recently released repos https://github.com/Open-EO/openeo-processes-dask and https://github.com/Open-EO/openeo-pg-parser-networkx after Christmas we should have a meeting discussing the next steps to integrate it better. |
@soxofaan from our last meeting:
|
squashed PR #338 (by clausmichele) fixed merge confluct and did black/darker cleanups
squashed PR #338 (by clausmichele) fixed merge conflict and did black/darker cleanups
I also see that there is (still/again) a merge conflict on |
I also noticed this is dependency-wise quite a heavy feature.
Install of the
That's a very heavy addition (not only in number of packages, but also in terms of volume of compiled wheels to download). It would be good if this could be trimmed down. |
Should fix merge conflict
Solved! Edit: I don't understand why GitHub states that there is still a merge conflict on |
@soxofaan how did you get the list of all the dependencies required by installing the extra edit: found a package called Anyway, the large dependencies number is not something can be addressed easily. For instance, the I would propose to discuss about this in a common meeting with the EODC team. |
I just played in temporary virtual envs and compared the list of installed package
of course there are necessary packages for local processing, but I also see a lot related to web dev that do not make sense just for this purpose, e.g.:
(and I guess these drag in a number of smaller secondary dependencies as well) |
tests/local/conftest.py
Outdated
import pandas as pd | ||
|
||
@pytest.fixture() | ||
def create_local_netcdf(tmp_path_factory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this has to be a fixture. I guess for tests you want to produce several kinds of netcdf/geotiff files with varying dimensions, label ranges, etc
so you just need a helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, fixture removed and created a helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need rioxarray
if I want to test also geoTIFFs. Does it make sense to include it in tests_require
or not? I wonder if these tests should run only if installing the localprocessing
part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, in CI we want to run all tests, but with localprocessing being quite heavy in terms of dependencies (all have to be installed with each run), this will make our CI a lot slower, so I'm not sure we should do that.
As long as it is an experimental feature we should skip the localprocessing test during the standard CI run and only run them manually as health check while development.
So in short: for now: keep all localprocessing related dependencies (also the ones just for testing) out of the standard requirements, and just put them in the localprocessing extra
return metadata | ||
|
||
def _get_geotiff_metadata(file_path): | ||
data = rioxarray.open_rasterio(file_path.as_posix(),chunks={}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like there is a lot of overlap/duplication with _get_netcdf_zarr_metadata
Can't this be handled more generically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it can be simplified, but at the moment I would like to focus on other more important parts. Could we proceed with the merge and tackle this later?
I just had a look again at the dependency problem now that
|
I finetuned the PR's branch (e.g. so that no regular non-localprocessing code paths where touched), Great work, thanks! |
@soxofaan I'm starting this PR to discuss how to properly implement the functionality required for client side processing.