Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement periodic refresh of the base dir and the remote server #375

Closed
ricardogsilva opened this issue Feb 25, 2021 · 9 comments · Fixed by #395
Closed

Implement periodic refresh of the base dir and the remote server #375

ricardogsilva opened this issue Feb 25, 2021 · 9 comments · Fixed by #395

Comments

@ricardogsilva
Copy link

ricardogsilva commented Feb 25, 2021

Trends.Earth shall periodically refresh its own datasets list. Refreshing means:

  1. Scanning the base dir for the presence of new datasets
  2. Polling the remote server for updates on datasets being generated and possibly initiate their download

Implement the code that performs this periodic refresh in an asynchronous fashion, in order to prevent locking up the main QGIS UI. After refreshing, the dataset list shown on the datasets tab of the dock shall also be updated.

Set the refresh frequency to 60 seconds by default. We will deal with how this is exposed to the user in #374

@luipir
Copy link
Contributor

luipir commented Mar 24, 2021

as reported in #371 this should be the local folder organization:

Datasets can be located everywhere on the user's machine, just as long as their respective .json file is under the Trends.Earth base dir. This means that inside the json file there must be a property that informs the plugin of the full path to the dataset. The current implementation features a file property, which we can continue using, but with a full path instead of just a filename.

Trends.Earth base dir

The structure of the base dir shall be:

> {trends-earth-base-dir}/
|   v jobs/
|   |   v {algorithm-group-1}/
|   |   |   v {algorithm-name1}/
|   |   |   |   v {execution-date-1}/
|   |   |   |   |   job_1.json
|   |   |   |   |   job_2.json
|   |   |   |   |   ...
|   |   |   |   |   job_n.json
|   |   |   |   > {execution-date-2}/
|   |   |   |   > ...
|   |   |   |   > {execution-date-n}/
|   |   |   > {algorithm-name-2}/
|   |   |   > ...
|   |   |   > {algorithm-name-n}/
|   |   > {algorithm-group-2}/
|   |   > ...
|   |   > {algorithm-group-n}/
|   |
|   v downloaded-sample-datasets/
|   |    dataset_1.json
|   |    dataset_2.json
|   |    ...
|   |    dataset_n.json
|   v imported-datasets/
|   |    dataset_1.json
|   |    dataset_2.json
|   |    ...
|   |   dataset_n.json
|   v in-transit/
|   |   dataset_1.json
|   |   dataset_2.json
|   |   ...
|   |   dataset_n.json
|   v outputs/
|   |   v {algorithm-group-1}/
|   |   |   v {algorithm-name1}/
|   |   |   |   v {execution-date-1}/
|   |   |   |   |   dataset_1.json
|   |   |   |   |   dataset_2.json
|   |   |   |   |   ...
|   |   |   |   |   dataset_n.json
|   |   |   |   > {execution-date-2}/
|   |   |   |   > ...
|   |   |   |   > {execution-date-n}/
|   |   |   > {algorithm-name-2}/
|   |   |   > ...
|   |   |   > {algorithm-name-n}/
|   |   > {algorithm-group-2}/
|   |   > ...
|   |   > {algorithm-group-n}/
  • jobs - tree contain run or running jobs done locally or remotely
  • downloaded-sample-datasets - This directory is used to put all json files and all datasets that are downloaded from Trends.Earth data servers
  • imported-datasets - This directory is used to put all json files of datasets that are imported. It may optionally also be used to copy the imported files (but we'll likely provide a UI control for the user to decide if the dataset shall be copied or not)
  • in-transit-datasets - This directory is used to put all json files of datasets that are still being generated on a remote server. The contents of this directory are therefore volatile.
  • outputs - All datasets and json files that are generated by a Trends.Earth algorithm shall be placed under this directory. It is further discretized according to each algorithm's group and name and execution date (as in year, month, day)

@luipir
Copy link
Contributor

luipir commented Mar 26, 2021

@ricardogsilva I can't move the card in ready or in progress

@luipir
Copy link
Contributor

luipir commented Apr 7, 2021

but with a full path instead of just a filename

@ricardogsilva full path or path relative to Trends.Earth base dir ? or by default relative and full path only if it's an absolute path?

@luipir
Copy link
Contributor

luipir commented Apr 8, 2021

@ricardogsilva @Samweli FYI in https://github.com/luipir/trends.earth/tree/implement_periodic_refresh_%23375 I started implementing the interaction with the server...
At the moment, any run should save run descriptor in the Trends.Earth base dir an that will be the base to populate datasets and update the status of the run.

The implementation tried to follow the same architecture followed by @azvoleff to read and validate response (e.g. using marshmallow schemas) that allow us to load (and validate) and dump object schema in a simple way.

I used a schema that weren't used in the code APIResponseSchema that seems follow the possible response of the server... btw the population of the schema depend of the status of the processing. => for this reason I had to create some workaround to e.g. infer the location where to save the json descriptor because there weren't any indication of the script-name in the the first response of the run.

I'll continue populating the datasets (witth a path walk visitor that instance a datasets descriptor) and updating the run descriptor from the dataset descriptor.

@luipir
Copy link
Contributor

luipir commented Apr 21, 2021

as discussed in slack;
Luigi Pirelli 12:04 PM
@ricardo @Samweli Mwakisambwe a design doubt... we are planning having processing buttons and datasets but the original code do a distinction between running the alg, Jobs and datasets(result download). In the original code Dataset json is a copy of the Job json but independent to have ability to move bands in other places...
In our design Jobs and datasets are almost fused. After a run relative Job is saved and dataset representation is shown in datasets tab. From that tab, Job status is controlled and result data are downloaded if necessary (e..g a dataset is materialised locally). In this moment and in the current design Job and Dataset are the same entity.
All this is correct?... in the meantime I follow writing design with this constraint.
New

ricardo 12:15 PM
not sure I understand everything, but it seems to me that the job representation and the dataset representation should be different entities.
job representation is built from an algorithms currently being run - it can be returned from the GEE API, or it can be generated by an algorithm that is running locally. Job represetnations should only exist while the relevant algorithm is being executed/downloaded, as these are the way we can keep track of new datasets that are not in the plugin's base dir yet (they are still being generated). IMO, job representations now become an implementation detail and should not be directly shown to the end user. We ought to be able to build dataset representations from job representations and then show those in our datasets tab.
dataset representations are not necessarily bound to a running job, although they can be. For example, when a user chooses to import some file from his filesystem on to Trends.Earth - in this case we build a dataset representation and there is no underlying job.
Does this answer your question @luigi Pirelli?

Luigi Pirelli 12:20 PM
probably... I'm just scared to maintain all different entity linked file lists synched among them
We should add a Jobs subdirectories in the trends-earth-base-dir where to store Jobs

@luipir
Copy link
Contributor

luipir commented Apr 28, 2021

FYI there is no api (or I can't find) to get a single exection by run_id => I've always to get all execution list and update datasets all together => I can't update a dataset one by one nor run update asynchronously one by one.
This is an API limitation

luipir added a commit to luipir/trends.earth that referenced this issue Apr 29, 2021
@luipir
Copy link
Contributor

luipir commented Apr 29, 2021

@ricardogsilva should be implemented via #395

@luipir
Copy link
Contributor

luipir commented Apr 30, 2021

in the meantime I continue with downloading dataset

@luipir
Copy link
Contributor

luipir commented Apr 30, 2021

  • in-transit-datasets - This directory is used to put all json files of datasets that are still being generated on a remote server. The contents of this directory are therefore volatile.

reason of this folder? is it really necessary? status is inside the json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement size: 8 This is a full day job.
Projects
No open projects
2 participants