Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Enhancement/get node resources #2987

Merged

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Apr 12, 2022

What do these changes do?

A resource is a number (int or float) or a string that represents a named resource (currently such as CPU, RAM, VRAM, AIRAM, MPI).
Each resource has 2 such values, one that represents a reservation (for schedule-time) to define if a service can be started (is there still machine with enough CPUs to start he service?) and a limit (for runtime) that is used to stop a service that would go over that limit.
A resource limit is always at least equal to a resource reservation.

Currently the resources used for running osparc services are statically defined on the service docker images.
These values may be retrieved using the following new API endpoints in the webserver:

  • GET /catalog/services/{service_key}/{service_version}/resources
  • GET /projects/{puuid}/nodes/{node_uuid}/resources

And in the catalog

  • GET /services/{service_key}/{service_version}/resources

In this PR, only the static values are ever retrieved. In the near future, it will be possible to modify the limits on a particular service.

This means the current implementation in the webserver ALWAYS returns the values from the catalog service

NOTE: this implementation returns "raw" resources such as VRAM. these are used as is for running dynamic services

Related issue/s

ITISFoundation/osparc-issues#618

How to test

Checklist

@sanderegg sanderegg added this to the Macarons milestone Apr 12, 2022
@sanderegg sanderegg self-assigned this Apr 12, 2022
@codecov
Copy link

codecov bot commented Apr 12, 2022

Codecov Report

Merging #2987 (9fa8567) into master (74f158c) will increase coverage by 0.0%.
The diff coverage is 83.6%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #2987    +/-   ##
=======================================
  Coverage    79.5%   79.6%            
=======================================
  Files         686     688     +2     
  Lines       28571   28694   +123     
  Branches     3683    3701    +18     
=======================================
+ Hits        22731   22847   +116     
+ Misses       5030    5029     -1     
- Partials      810     818     +8     
Flag Coverage Δ
integrationtests 65.5% <30.5%> (-0.3%) ⬇️
unittests 75.3% <83.6%> (+0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...mcore_service_catalog/api/dependencies/director.py 100.0% <ø> (+18.1%) ⬆️
...er/src/simcore_service_webserver/security_roles.py 100.0% <ø> (ø)
...ages/models-library/src/models_library/generics.py 95.1% <50.0%> (+3.2%) ⬆️
...er/src/simcore_service_webserver/catalog_client.py 57.6% <64.2%> (+1.1%) ⬆️
...s-library/src/models_library/services_resources.py 73.3% <73.3%> (ø)
...vice_webserver/projects/projects_nodes_handlers.py 71.7% <82.5%> (+4.7%) ⬆️
...src/simcore_service_catalog/api/routes/services.py 66.1% <83.7%> (+7.0%) ⬆️
...rary/src/models_library/service_settings_labels.py 100.0% <100.0%> (ø)
...mcore_service_catalog/api/dependencies/services.py 100.0% <100.0%> (ø)
...talog/src/simcore_service_catalog/core/settings.py 100.0% <100.0%> (ø)
... and 16 more

@sanderegg sanderegg force-pushed the enhancement/get_node_resources branch 3 times, most recently from 8270e68 to 295b0e7 Compare April 20, 2022 20:47
@sanderegg sanderegg changed the title WIP: ✨ Enhancement/get node resources ✨ Enhancement/get node resources Apr 21, 2022
@sanderegg sanderegg marked this pull request as ready for review April 21, 2022 08:36
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Some comments and thoughts

logger.debug("received %s", f"{service_settings}")

def _from_service_settings(
settings: List[SimcoreServiceSettingLabelEntry],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THOUGHT: I wonder if at some point this function should be attached as utils close to SimcoreServiceSettingsLabelEntrty .

See for instance in integration-library
There a model can be exported to or parsed from to label annotations

NOTE1: export and parse are the fundamental operations in pydantic

NOTE2: OCI standard officially call to the docker labels as "label annotations".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes unless there are dependencies that the model_library cannot cope with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I find a bit confusing to rename labels into annotations actually. but if the next docker version does it I'll do it too.

get_default_service_resources
),
):
# TODO: --> PC: I'll need to go through that with you for function services,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to review this. Currently these servcies do not consume any significant resources. So either we set them to 0 or remove them from the listing. Let's discuss about it offline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is out of the discussion. I disagree. or we will end up with things suddenly consuming resource somewhere totally wild.

aiohttp_client(app, server_kwargs={"port": app_cfg["main"]["port"]})
)
# @pytest.fixture
# def app_cfg(default_app_cfg: ConfigDict, unused_tcp_port_factory: Callable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to delete? Why commented

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, that is what I was trying to ask you this morning...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to discuss app_cfg and client fixtures, cause I find them a bit confusing. and I wonder if I'm not missing something there.

@sanderegg sanderegg force-pushed the enhancement/get_node_resources branch from 7a699ad to 9fa8567 Compare April 21, 2022 20:24
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Just not sure about the 2Gb limit for all services which do not specify any limit. As said below I would go for something much lower and let the users adjust if their services fail.

Comment on lines +1093 to +1096
"bootOptions": {
"title": "Bootoptions",
"type": "object",
"description": "Some services provide alternative parameters to be injected at boot time. The user selection should be stored here, and it will overwrite the services's defaults."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come I see bootOptions here? Uhm... I wonder what happened.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well this is generated with the make openapi-spec recipe.. not sure when it was run last time...

Comment on lines +31 to +32
"limit": ByteSize(2 * 1024**3),
"reservation": ByteSize(2 * 1024**3),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, so by default you get 2 GB limit and reservation? Are we sure about this? Should we not bring the values in the MB ranges?

For example, when defining services like the sim4life. You have an Nginx without any constraints. It is not worth it to assign so many resources for a service which requires almost 100 MB.

I'd suggest to bump it down to 100MB (or something in this range), and let people explicitly ask for more resources if required. I think this will help with reducing memory consumption by a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so that is the value that was here already before. Open to discussion but in a separate PR cause it might have other implications. That is what the current director-v0 also does (but not wired to this one for now), so it could have issues with some older services.

Now concerning your use case:

  • this call returns the total envelope, not the nginx service. so for computing the nginx service in your compose it should go with: get service limit/resources - what is defined in the compose for specifics = nginx (+ maybe other) services limits

@sanderegg sanderegg merged commit 98d6049 into ITISFoundation:master Apr 22, 2022
@sanderegg sanderegg deleted the enhancement/get_node_resources branch April 22, 2022 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants