-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating Dataset functionality with job scheduling and Docker image for ngen worker container #148
Integrating Dataset functionality with job scheduling and Docker image for ngen worker container #148
Conversation
7a5f7c7
to
574eef6
Compare
Creating new package for (pending) relocation of core types that will need to be migrated here to avoid circular dependencies.
Updating the Dockerfile for ngen-deps to also install the expected dependencies for the Python test BMI package via pip, to help optimize the main ngen build.
Updating the Dockerfile for ngen-deps to install s3fs-fuse as a system dependencies, as it will be used to mount object store buckets in the local file system.
Updating Dockerfile instruction building the (now) noah-owp-modular submodule.
Small optimization to combine two RUN statements into one and reduce layers.
Updating to have image create the parent directories that will contain DMOD dataset directories during execution.
Updating entrypoint script to account for Dataset functionality as the means to move data around within the system, like config, forcing, and hydrofabric data; also updating script to be able to mount object store datasets into the container's file system via s3fs.
Adding support for holding SecretReferences and environment variable values within helper DockerServiceParameters type.
Updating Launcher.create_service() to have it utilize the newly added properties to DockerServiceParameters for secrets and environment variables.
Adding new _generate_docker_cmd_args function and several other helper functions for its use, in order to support new needs for generating Docker entrypoint CMD args appropriately after recent changes for dataset utilization.
Adjusting Launcher.start_service() to use new function for generating Docker CMD arg values, and modifying things to ensure a new set of CMD args is generated for each individual allocation/worker, since these in part reflect dataset needs and thus could be different.
Updating Launcher.create_service() to have the DockerServiceParameters used be created (when appropriate) with Docker secrets for object store user access and MinIO-deployment-related environment variables.
574eef6
to
71390bd
Compare
_MOUNT_DIR="${ALL_DATASET_DIR}/${2}/${1}" | ||
# TODO (later): this is a non-S3 implementation URL; add support for S3 directly also | ||
# This is based on the nginx proxy config (hopefully) | ||
_URL="http://minio_proxy:9000/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what is going on here??? But not 💯 sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not sure why exactly I was doing things the way I was planning, and then seemingly temporarily not doing them that way ...
I've cleaned this up a bit, but the _URL is based on the proxy hostname. I also fixed a problem (i.e., just now) where the proxy hostname and service name hadn't been consistent with this in the HA config.
Updating non-desktop configuration of object_store stack to have proxy service name and its hostname be 'minio_proxy' to be consistent with the desktop config and avoid any unexpected collision problems with any other generic 'nginx' service.
Removing function (and usage) that would determine what minio URL to use based on the data category, which was not always using the proxy.
Work related to preparing the Docker image itself to handle data via Datasets, in particular object store datasets that have the backing data bucket mounted directly into the container file system. Also updating scheduler code to integrate this functionality.
Note that this should remain a draft PR until #147 is complete.