Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Use S3 links in internal computational backend cluster, prepare for temporary tokens (⚠️ devops) #3006

Merged

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Apr 26, 2022

What do these changes do?

Until this PR the data upload/download through the computational services was done the following way:

  • Download:
    1. director-v2 asks for the presigned links to node_ports, which asks storage, which asks minio (with a TTL of X where X is big)
    2. passes the presigned links to the dask-sidecar, which effectively downloads it
  • Upload:
    1. director-v2 asks for the presigned links to node_ports, which asks storage, which asks minio (with a TTL of X where X is big)
    2. passes the presigned links to the dask-sidecar, which uses them to upload service outputs and the log file

New env variables (⚠️ devops):

  • COMPUTATIONAL_BACKEND_DEFAULT_CLUSTER_FILE_LINK_TYPE defines the file type to use with the internal cluster (defaults to S3, available=[s3, presigned])
  • COMPUTATIONAL_BACKEND_DEFAULT_FILE_LINK_TYPE defines the file type for the other clusters (defaults to presigned)

When using S3 links, it is necessary to have access to the underlying S3 backend (on the contrary presigned links embed this access in their encoding). Therefore director-v2 now also transmits the S3 access credentials to the dask-sidecar when S3 file type is enabled.
In this PR, the default credentials are passed. In the next iteration, a temporary S3 access shall be computed using AWS STS interface. Which should allow the usage of external clusters at the price of some security.
Nevertheless, this solution cannot scale very much, as STS caps to a 1000 of these temporary credentials.

storage new API:
-POST v0/simcore-s3:access returns the S3 credentials (currently returns the default)

NOTE

Maybe another solution would be instead to use the multipart upload options. At least for the frontend, and probably also for any external dask cluster.

Bonus:

  • removed all the pytest-docker infrastructure in storage and use pytest-simcore for all unit tests (moved to ♻️ Maintenance/remove pytest docker storage #3011 )
  • fixes the case where dask-sidecar would not unzip a zip input file that is not meant to be kept as a zip in the service (for example osparc-python-runner)
  • fixes download/upload progress percentage not increasing, and added also average transfer rate to show in the logs
  • when an error happened in the dask-sidecar task, it is now shown in the director-v2 logs (nevertheless it cannot always be shown because of missing packages, @pcrespov not sure I will keep it)
  • overall cleanup

Related issue/s

How to test

make build up-prod or make build-devel up-devel
create a pipeline with inputs, run the pipeline.

Checklist

@sanderegg sanderegg added this to the Macarons milestone Apr 26, 2022
@sanderegg sanderegg self-assigned this Apr 26, 2022
@sanderegg sanderegg force-pushed the enhancement/generate_temp_s3_access branch from da86799 to 6ffef00 Compare April 26, 2022 16:13
@codecov
Copy link

codecov bot commented Apr 26, 2022

Codecov Report

Merging #3006 (72b58ce) into master (122703f) will increase coverage by 0.0%.
The diff coverage is 90.9%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #3006    +/-   ##
=======================================
  Coverage    79.7%   79.7%            
=======================================
  Files         688     690     +2     
  Lines       28748   28872   +124     
  Branches     3707    3719    +12     
=======================================
+ Hits        22917   23038   +121     
+ Misses       5015    5010     -5     
- Partials      816     824     +8     
Flag Coverage Δ
integrationtests 65.7% <84.4%> (+0.1%) ⬆️
unittests 75.5% <87.7%> (+0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ls-library/src/models_library/projects_nodes_io.py 96.3% <ø> (ø)
...rvice_dask_sidecar/computational_sidecar/errors.py 100.0% <ø> (ø)
...-sidecar/src/simcore_service_dask_sidecar/tasks.py 80.5% <ø> (ø)
...es/dynamic_sidecar/docker_service_specs/sidecar.py 78.5% <ø> (ø)
...-sdk/src/simcore_sdk/node_ports_v2/nodeports_v2.py 94.3% <50.0%> (+<0.1%) ⬆️
...rc/simcore_service_director_v2/core/application.py 90.3% <50.0%> (-1.4%) ⬇️
...r-v2/src/simcore_service_director_v2/utils/dask.py 95.1% <60.0%> (-1.4%) ⬇️
...k/src/simcore_sdk/node_ports_common/filemanager.py 81.8% <71.4%> (ø)
.../simcore-sdk/src/simcore_sdk/node_ports_v2/port.py 84.9% <75.0%> (-0.5%) ⬇️
...simcore_service_director_v2/modules/dask_client.py 91.9% <88.2%> (-0.5%) ⬇️
... and 24 more

@sanderegg sanderegg force-pushed the enhancement/generate_temp_s3_access branch 2 times, most recently from 09b96b1 to ba5895a Compare April 28, 2022 20:36
@sanderegg sanderegg changed the title ✨ Allow generation of temporary S3 access ✨ Use S3 links in internal computational backend cluster, prepare for temporary tokens Apr 29, 2022
@sanderegg sanderegg changed the title ✨ Use S3 links in internal computational backend cluster, prepare for temporary tokens ✨ Use S3 links in internal computational backend cluster, prepare for temporary tokens (⚠️ devops) Apr 29, 2022
@sanderegg sanderegg force-pushed the enhancement/generate_temp_s3_access branch from 4b068f3 to 6b45df0 Compare April 29, 2022 09:59
@sanderegg sanderegg force-pushed the enhancement/generate_temp_s3_access branch from 6b45df0 to 95db602 Compare April 29, 2022 10:30
@sanderegg sanderegg marked this pull request as ready for review April 29, 2022 10:31
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Nice! Just minor comments

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Please see my comments regarding the settings.

mocked_log_publishing_cb.assert_called()
mocked_log_publishing_cb.reset_mock()

# USE-CASE 3: if destination is a zip, but we pass a target mime type that is not, then we decompress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this description correct?
I would expect not to decompress.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is correct. it uses the target mime type, which is what the service expects. Example:
my service has an input port that is defined as in:
type: data:*/*
Then it means it wants to have it unzipped.

If the service is defined as:
type: data:application/zip
then it means it wants a zip file

s3_settings: S3Settings = await sts.get_or_create_temporary_token_for_user(
request.app, user_id
)
return {"data": s3_settings.dict()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the default values. Regarding my PR #2993 I would still not add this. Will enabled it once we decide on how to use STS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes do not add it. my gut feeling is that probably that STS stuff is not the way to go.

@sonarcloud
Copy link

sonarcloud bot commented May 2, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@sanderegg sanderegg requested a review from GitHK May 2, 2022 10:12
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +110 to +124
@cached_property
def endpoint(self) -> str:
return AnyHttpUrl.build(
scheme="http",
host=self.STORAGE_HOST,
port=f"{self.STORAGE_PORT}",
path=f"/{self.STORAGE_VTAG}",
)

@cached_property
def storage_endpoint(self) -> str:
"""used to re-create STORAGE_ENDPOINT: used by node_ports and must be
in style host:port
without scheme or version tag"""
return f"{self.STORAGE_HOST}:{self.STORAGE_PORT}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might already be aware of this one, just wanted to point it out.

class MixinServiceSettings:
"""Mixin with common helpers based on validated fields with canonical name
Example:
- Subclass should define host, port and vtag fields as
class MyServiceSettings(BaseCustomSettings, MixinServiceSettings):
{prefix}_HOST: str
{prefix}_PORT: PortInt
{prefix}_VTAG: VersionTag [Optional]
# optional
{prefix}_SCHEME: str (urls default to http)
{prefix}_USER: str
{prefix}_PASSWORD: SecretStr
"""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, but good to know. anyway for this one it's a different one, and I expect it to disappear soonish

@sanderegg sanderegg merged commit 85c998f into ITISFoundation:master May 2, 2022
@sanderegg sanderegg deleted the enhancement/generate_temp_s3_access branch May 2, 2022 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants