Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets periodic refresh and centralised Jobs/Datasets memory store management. Fixes #375 #395

Conversation

luipir
Copy link
Contributor

@luipir luipir commented Apr 29, 2021

Datasets are scanned from:

cached jobs
eventually any other josn present in base_data_directly (jumping jobs folder)
actually polling is set to 60000ms get form config: "trends_earth/advanced/refresh_polling_time"

next step are:

check for local generated Jobs/Datasets
download remote datasets
NOTE. There are some architecturally critical aspects:
A) Originally there is no difference between Job/Dataset => I hope I coded to avoid misalignments
B) Jobs=Dataset and Job list are only a API image of that available remotely => Jobs and Dataset are naturally aligned
C) Job Dataset are download in a user defined location. e.g. there is no constraint where to save datasets or job descriptors => this can be a source of misalignment.

To approach has these critical aspects:

Jobs and Datasets are sets of Job and Dataset and are singletons (kind of memory store with an image of it's components as json file in base_data_directory
Any Jobs or Dataset know where save itself (using dump method)
=> still need a procedure and criteria to clean Jobs/Datasets

return wrapper


def json_serial(obj):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI moved here to be used in other context

job_dict['script'] = {"name": script_name, "slug": script_slug}

# do import here to avoid circular import
from LDMP.jobs import Jobs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import here to avoid circular import


class tr_jobs(object):
def tr(message):
return QCoreApplication.translate("tr_jobs", message)


def json_serial(obj):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to init to be used also by Datasets.py



@singleton
class Jobs(QObject):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataStore containing all downloaded Jobs responses

def __init__(self, datasets: Optional[List[Dataset]] = None):

@singleton
class Datasets(QObject):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

centralised DataStore with all Datasets availabel in disk or generated by Jobs

self.updated.emit()


class DatasetSchema(Schema):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema is located here and not in shema.py because schema.py belogs to a nested project and I do not want/permission to touch it

@ricardogsilva ricardogsilva linked an issue May 3, 2021 that may be closed by this pull request
Copy link

@ricardogsilva ricardogsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed online, lets get this merged!

@ricardogsilva ricardogsilva merged commit 5f7562d into ConservationInternational:decision-trees May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement periodic refresh of the base dir and the remote server
2 participants