Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filters to job/task Rest API #370

Open
jcdiazvelez opened this issue Mar 15, 2024 · 1 comment
Open

Add filters to job/task Rest API #370

jcdiazvelez opened this issue Mar 15, 2024 · 1 comment
Assignees

Comments

@jcdiazvelez
Copy link
Contributor

I would like to be able to retrieve a subset of jobs or tasks. The primary need at the moment is to restrict the number of tasks when analyzing benchmark statistics (/datasets/{dset.did}/bulk/task_stats). For large datasets this becomes prohibitively expensive.
I can imagine other more general uses using filters:

For the example in mind you might filter by "jobnumber < 100"

or something like "taskname == ppc".

@dsschult
Copy link
Collaborator

dsschult commented Apr 2, 2024

Due to the database structure and contents, it's hard to do this directly with a single API query, but I could certainly write an example script and modify the API to make something like this possible:

ret = await client.request("GET", f"/datasets/{dataset_id}/tasks", {"status": "complete"})
task_ids = []
for task_id, task in ret.items():
  if task['name'] == "propagate" and task["job_index"] < 1000:
    task_ids.append(task_id)
task_stats = {}
while task_ids:
  query_task_ids = task_ids[:100]
  task_ids = task_ids[100:]
  for line in client.request_stream("GET", f"/datasets/{dataset_id}/bulk/task_stats", {"last": True, "task_ids": query_task_ids}):
    stat = json.loads(line)
    task_stats[stat["task_id"]] = stat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants