Implement periodic refresh of the base dir and the remote server #375

ricardogsilva · 2021-02-25T11:35:44Z

Trends.Earth shall periodically refresh its own datasets list. Refreshing means:

Scanning the base dir for the presence of new datasets
Polling the remote server for updates on datasets being generated and possibly initiate their download

Implement the code that performs this periodic refresh in an asynchronous fashion, in order to prevent locking up the main QGIS UI. After refreshing, the dataset list shown on the datasets tab of the dock shall also be updated.

Set the refresh frequency to 60 seconds by default. We will deal with how this is exposed to the user in #374

The text was updated successfully, but these errors were encountered:

luipir · 2021-03-24T12:06:36Z

as reported in #371 this should be the local folder organization:

Datasets can be located everywhere on the user's machine, just as long as their respective .json file is under the Trends.Earth base dir. This means that inside the json file there must be a property that informs the plugin of the full path to the dataset. The current implementation features a file property, which we can continue using, but with a full path instead of just a filename.

Trends.Earth base dir

The structure of the base dir shall be:

> {trends-earth-base-dir}/
|   v jobs/
|   |   v {algorithm-group-1}/
|   |   |   v {algorithm-name1}/
|   |   |   |   v {execution-date-1}/
|   |   |   |   |   job_1.json
|   |   |   |   |   job_2.json
|   |   |   |   |   ...
|   |   |   |   |   job_n.json
|   |   |   |   > {execution-date-2}/
|   |   |   |   > ...
|   |   |   |   > {execution-date-n}/
|   |   |   > {algorithm-name-2}/
|   |   |   > ...
|   |   |   > {algorithm-name-n}/
|   |   > {algorithm-group-2}/
|   |   > ...
|   |   > {algorithm-group-n}/
|   |
|   v downloaded-sample-datasets/
|   |    dataset_1.json
|   |    dataset_2.json
|   |    ...
|   |    dataset_n.json
|   v imported-datasets/
|   |    dataset_1.json
|   |    dataset_2.json
|   |    ...
|   |   dataset_n.json
|   v in-transit/
|   |   dataset_1.json
|   |   dataset_2.json
|   |   ...
|   |   dataset_n.json
|   v outputs/
|   |   v {algorithm-group-1}/
|   |   |   v {algorithm-name1}/
|   |   |   |   v {execution-date-1}/
|   |   |   |   |   dataset_1.json
|   |   |   |   |   dataset_2.json
|   |   |   |   |   ...
|   |   |   |   |   dataset_n.json
|   |   |   |   > {execution-date-2}/
|   |   |   |   > ...
|   |   |   |   > {execution-date-n}/
|   |   |   > {algorithm-name-2}/
|   |   |   > ...
|   |   |   > {algorithm-name-n}/
|   |   > {algorithm-group-2}/
|   |   > ...
|   |   > {algorithm-group-n}/

jobs - tree contain run or running jobs done locally or remotely
downloaded-sample-datasets - This directory is used to put all json files and all datasets that are downloaded from Trends.Earth data servers
imported-datasets - This directory is used to put all json files of datasets that are imported. It may optionally also be used to copy the imported files (but we'll likely provide a UI control for the user to decide if the dataset shall be copied or not)
in-transit-datasets - This directory is used to put all json files of datasets that are still being generated on a remote server. The contents of this directory are therefore volatile.
outputs - All datasets and json files that are generated by a Trends.Earth algorithm shall be placed under this directory. It is further discretized according to each algorithm's group and name and execution date (as in year, month, day)

luipir · 2021-03-26T15:40:29Z

@ricardogsilva I can't move the card in ready or in progress

luipir · 2021-04-07T15:04:28Z

but with a full path instead of just a filename

@ricardogsilva full path or path relative to Trends.Earth base dir ? or by default relative and full path only if it's an absolute path?

luipir · 2021-04-08T16:54:59Z

@ricardogsilva @Samweli FYI in https://github.com/luipir/trends.earth/tree/implement_periodic_refresh_%23375 I started implementing the interaction with the server...
At the moment, any run should save run descriptor in the Trends.Earth base dir an that will be the base to populate datasets and update the status of the run.

The implementation tried to follow the same architecture followed by @azvoleff to read and validate response (e.g. using marshmallow schemas) that allow us to load (and validate) and dump object schema in a simple way.

I used a schema that weren't used in the code APIResponseSchema that seems follow the possible response of the server... btw the population of the schema depend of the status of the processing. => for this reason I had to create some workaround to e.g. infer the location where to save the json descriptor because there weren't any indication of the script-name in the the first response of the run.

I'll continue populating the datasets (witth a path walk visitor that instance a datasets descriptor) and updating the run descriptor from the dataset descriptor.

luipir · 2021-04-21T10:31:54Z

as discussed in slack;
Luigi Pirelli 12:04 PM
@ricardo @Samweli Mwakisambwe a design doubt... we are planning having processing buttons and datasets but the original code do a distinction between running the alg, Jobs and datasets(result download). In the original code Dataset json is a copy of the Job json but independent to have ability to move bands in other places...
In our design Jobs and datasets are almost fused. After a run relative Job is saved and dataset representation is shown in datasets tab. From that tab, Job status is controlled and result data are downloaded if necessary (e..g a dataset is materialised locally). In this moment and in the current design Job and Dataset are the same entity.
All this is correct?... in the meantime I follow writing design with this constraint.
New

ricardo 12:15 PM
not sure I understand everything, but it seems to me that the job representation and the dataset representation should be different entities.
job representation is built from an algorithms currently being run - it can be returned from the GEE API, or it can be generated by an algorithm that is running locally. Job represetnations should only exist while the relevant algorithm is being executed/downloaded, as these are the way we can keep track of new datasets that are not in the plugin's base dir yet (they are still being generated). IMO, job representations now become an implementation detail and should not be directly shown to the end user. We ought to be able to build dataset representations from job representations and then show those in our datasets tab.
dataset representations are not necessarily bound to a running job, although they can be. For example, when a user chooses to import some file from his filesystem on to Trends.Earth - in this case we build a dataset representation and there is no underlying job.
Does this answer your question @luigi Pirelli?

Luigi Pirelli 12:20 PM
probably... I'm just scared to maintain all different entity linked file lists synched among them
We should add a Jobs subdirectories in the trends-earth-base-dir where to store Jobs

luipir · 2021-04-28T08:12:26Z

FYI there is no api (or I can't find) to get a single exection by run_id => I've always to get all execution list and update datasets all together => I can't update a dataset one by one nor run update asynchronously one by one.
This is an API limitation

…management. Fixes ConservationInternational#375

luipir · 2021-04-29T11:22:25Z

@ricardogsilva should be implemented via #395

luipir · 2021-04-30T09:03:27Z

in the meantime I continue with downloading dataset

luipir · 2021-04-30T10:19:29Z

in-transit-datasets - This directory is used to put all json files of datasets that are still being generated on a remote server. The contents of this directory are therefore volatile.

reason of this folder? is it really necessary? status is inside the json

…management. Fixes #375 (#395) Datasets periodic refresh and centralised Jobs/Datasets memory store management. Fixes #375

ricardogsilva added Enhancement size: 8 This is a full day job. labels Feb 25, 2021

ricardogsilva added this to the Final Decision Tree implementation plan milestone Feb 25, 2021

ricardogsilva added this to Backlog in Sprint 2 (Plugin dock) - Decision Tree implementation Feb 25, 2021

ricardogsilva assigned luipir Mar 19, 2021

ricardogsilva removed this from Backlog in Sprint 2 (Plugin dock) - Decision Tree implementation Mar 19, 2021

ricardogsilva added this to Backlog in Sprint 3 (Plugin dock) - Decision Tree implementation Mar 19, 2021

luipir moved this from Backlog to In Progress in Sprint 3 (Plugin dock) - Decision Tree implementation Mar 24, 2021

luipir moved this from In Progress to Backlog in Sprint 3 (Plugin dock) - Decision Tree implementation Mar 25, 2021

luipir moved this from Backlog to In Progress in Sprint 3 (Plugin dock) - Decision Tree implementation Mar 26, 2021

luipir moved this from In Progress to Ready in Sprint 3 (Plugin dock) - Decision Tree implementation Mar 30, 2021

luipir moved this from Ready to In Progress in Sprint 3 (Plugin dock) - Decision Tree implementation Mar 31, 2021

luipir added a commit to luipir/trends.earth that referenced this issue Apr 29, 2021

Datasets periodic refresh and centralised Jobs/Datasets memory store …

183204f

…management. Fixes ConservationInternational#375

ricardogsilva linked a pull request May 3, 2021 that will close this issue

Datasets periodic refresh and centralised Jobs/Datasets memory store management. Fixes #375 #395

Merged

ricardogsilva pushed a commit that referenced this issue May 3, 2021

Datasets periodic refresh and centralised Jobs/Datasets memory store …

5f7562d

…management. Fixes #375 (#395) Datasets periodic refresh and centralised Jobs/Datasets memory store management. Fixes #375

ricardogsilva closed this as completed May 3, 2021

Sprint 3 (Plugin dock) - Decision Tree implementation automation moved this from In Progress to Done May 3, 2021

This was referenced May 3, 2021

Hook downloading of remote data action to the button on the bottom of the datasets section #405

Closed

Hook importing of local data action to the button on the bottom of the datasets section #406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement periodic refresh of the base dir and the remote server #375

Implement periodic refresh of the base dir and the remote server #375

ricardogsilva commented Feb 25, 2021 •

edited

luipir commented Mar 24, 2021 •

edited

luipir commented Mar 26, 2021

luipir commented Apr 7, 2021

luipir commented Apr 8, 2021

luipir commented Apr 21, 2021

luipir commented Apr 28, 2021

luipir commented Apr 29, 2021 •

edited

luipir commented Apr 30, 2021

luipir commented Apr 30, 2021

Implement periodic refresh of the base dir and the remote server #375

Implement periodic refresh of the base dir and the remote server #375

Comments

ricardogsilva commented Feb 25, 2021 • edited

luipir commented Mar 24, 2021 • edited

Trends.Earth base dir

luipir commented Mar 26, 2021

luipir commented Apr 7, 2021

luipir commented Apr 8, 2021

luipir commented Apr 21, 2021

luipir commented Apr 28, 2021

luipir commented Apr 29, 2021 • edited

luipir commented Apr 30, 2021

luipir commented Apr 30, 2021

ricardogsilva commented Feb 25, 2021 •

edited

luipir commented Mar 24, 2021 •

edited

luipir commented Apr 29, 2021 •

edited