Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Dataset functionality with job scheduling and Docker image for ngen worker container #148

Merged
merged 24 commits into from
Mar 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
facd6e8
Add minio package as a project requirement.
robertbartel Mar 1, 2022
bdce172
Remove TODO comment for completed task.
robertbartel Mar 2, 2022
aedc112
Creating dmod.core package for core types.
robertbartel Mar 2, 2022
f5e34ea
Update ngen-deps Dockerfile with testing deps.
robertbartel Mar 2, 2022
68c5ef2
Update ngen-deps Dockerfile with s3fs-fuse deps.
robertbartel Mar 2, 2022
e63285a
Update ngen Dockerfile from noah-mp to noah-owp.
robertbartel Mar 2, 2022
8c1806e
Combine layers in ngen Dockerfile.
robertbartel Mar 2, 2022
a8d1f4b
Update ngen Dockerfile to create datasource dirs.
robertbartel Mar 2, 2022
7273f02
Update ngen entrypoint script for datasets.
robertbartel Mar 4, 2022
6eff182
Add secrets, env vars to DockerServiceParameters.
robertbartel Mar 4, 2022
aca457d
Update create_service() for service param updates.
robertbartel Mar 4, 2022
1de2f4d
Add Launcher helper functions for Docker CMD args.
robertbartel Mar 4, 2022
739da7e
Adjust service creation in Launcher.
robertbartel Mar 4, 2022
97166c7
Update create_service params for object store use.
robertbartel Mar 4, 2022
e91587c
Bump dmod-scheduler version, reflecting elsewhere.
robertbartel Mar 4, 2022
c0a7bc7
Add CONFIG DataCategory value.
robertbartel Mar 4, 2022
ea25296
Add simple NGEN_OUTPUT DataFormat value.
robertbartel Mar 4, 2022
134cb3e
Turn off secure in object store client.
robertbartel Mar 4, 2022
71390bd
Fix type hint problem with Job in scheduler.py.
robertbartel Mar 10, 2022
2980850
Consistent name of minio proxy service.
robertbartel Mar 16, 2022
5e54315
Ensure worker entrypoint uses minio proxy.
robertbartel Mar 17, 2022
1768905
Fix inverted boolean test condition.
robertbartel Mar 17, 2022
6ce35f2
Remove unnecessary env var add to workers.
robertbartel Mar 17, 2022
91600f6
Fix sytax problem after env_var removal.
robertbartel Mar 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions docker/main/ngen/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
ARG DOCKER_INTERNAL_REGISTRY
ARG DATASET_DIRECTORIES="config forcing hydrofabric observation output"
#FIXME is base no longer nwm specific??? How about deps?
#Base is missing a few simple deps, like git...
#FROM ${DOCKER_INTERNAL_REGISTRY}/nwm-base
Expand Down Expand Up @@ -45,7 +46,7 @@ RUN git clone --single-branch --branch $BRANCH $REPO_URL \
&& chmod u+x build_sub \
&& if [ "${NGEN_ACTIVATE_FORTRAN}" == "ON" ]; then \
./build_sub extern/iso_c_fortran_bmi; \
if [ "${BUILD_NOAH_OWP}" == "true" ] ; then ./build_sub extern/noah-mp-modular; fi; \
if [ "${BUILD_NOAH_OWP}" == "true" ] ; then ./build_sub extern/noah-owp-modular; fi; \
fi \
&& if [ "${NGEN_ACTIVATE_C}" == "ON" ]; then \
if [ "${BUILD_CFE}" == "true" ] ; then ./build_sub extern/cfe; fi; \
Expand Down Expand Up @@ -82,12 +83,13 @@ RUN git clone --single-branch --branch $BRANCH $REPO_URL \
&& cd $WORKDIR && rm -rf ngen boost

USER root
#Remove the boost headers now that ngen is compiled
RUN rm -rf ${BOOST_ROOT}
RUN echo "export PATH=${PATH}" >> /etc/profile
# Remove the boost headers now that ngen is compiled; also update path and make sure dataset directory is there
RUN rm -rf ${BOOST_ROOT} && echo "export PATH=${PATH}" >> /etc/profile
USER ${USER}
COPY --chown=${USER} entrypoint.sh ${WORKDIR}
RUN chmod +x ${WORKDIR}/entrypoint.sh
# Change permissions for entrypoint and make sure dataset volume mount parent directories exists
RUN chmod +x ${WORKDIR}/entrypoint.sh \
&& for d in ${DATASET_DIRECTORIES}; do mkdir -p /dmod/dataset/${d}; done
WORKDIR ${WORKDIR}
ENV PATH=${WORKDIR}:$PATH
ENTRYPOINT ["entrypoint.sh"]
Expand Down
7 changes: 6 additions & 1 deletion docker/main/ngen/deps/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -105,13 +105,18 @@ RUN echo "cd ${WORKDIR}" >> ${USER_HOME}/.profile \
#&& mkdir -p ${WORKDIR}/domains

# Handle final required dependencies separately so we don't have to, e.g., rebuild MPI if we want to update Python
# Also include (not quite authoritative) pip packages required for the test Python BMI library
ARG REQUIRE="sudo gcc g++ musl-dev make cmake tar git gfortran libgfortran python3>=${MIN_PYTHON} python3-dev>=${MIN_PYTHON} py3-pip py3-numpy>=${MIN_NUMPY} py3-numpy-dev>=${MIN_NUMPY} py3-pandas netcdf netcdf-dev hdf5 hdf5-dev bzip2 texinfo expat expat-dev flex bison"
RUN apk update && apk upgrade \
&& if [ -n "${REPOS}" ]; then \
apk add --repository ${REPOS} --no-cache ${REQUIRE}; \
else \
apk add --no-cache ${REQUIRE}; \
fi
fi \
&& pip install numpy pandas pyyaml bmipy netCDF4

# For now, the s3fs-fuse package is only available in the Alpine testing repo
RUN apk add --repository http://dl-cdn.alpinelinux.org/alpine/edge/testing s3fs-fuse

ENV BOOST_ROOT=${WORKDIR}/boost
USER ${USER}
Expand Down
107 changes: 91 additions & 16 deletions docker/main/ngen/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,101 @@
# $1 will have the number of nodes associated with this run
# $2 will have the host string in MPI form, i.e. hostname:N, hostname:M
# $3 will have the unique job id
# $4 will be the name of the output dataset (which will imply a directory location)
# $5 will be the name of the hydrofabric dataset (which will imply a directory location)
# $6 will be the name of the configuration dataset (which will imply a directory location)
# $7 and beyond will have colon-joined category+name strings (e.g., FORCING:aorc_csv_forcings_1) for Minio object store
# datasets to mount

MPI_NODE_COUNT="${1:?No MPI node count given}"
MPI_HOST_STRING="${2:?No MPI host string given}"
JOB_ID=${3:?No Job id given}
OUTPUT_DATASET_NAME="${4:?}"
HYDROFABRIC_DATASET_NAME="${5:?}"
CONFIG_DATASET_NAME="${6:?}"

ACCESS_KEY_SECRET="object_store_exec_user_name"
SECRET_KEY_SECRET="object_store_exec_user_passwd"
DOCKER_SECRETS_DIR="/run/secrets"
ACCESS_KEY_FILE="${DOCKER_SECRETS_DIR}/${ACCESS_KEY_SECRET}"
SECRET_KEY_FILE="${DOCKER_SECRETS_DIR}/${SECRET_KEY_SECRET}"

ALL_DATASET_DIR="/dmod/dataset"
OUTPUT_DATASET_DIR="${ALL_DATASET_DIR}/output/${OUTPUT_DATASET_DIR}"
HYDROFABRIC_DATASET_DIR="${ALL_DATASET_DIR}/hydrofabric/${HYDROFABRIC_DATASET_NAME}"
CONFIG_DATASET_DIR="${ALL_DATASET_DIR}/config/${HYDROFABRIC_DATASET_NAME}"

S3FS_PASSWD_FILE="${HOME}/.passwd-s3fs"

# Mount an object store dataset of the given name and data category (which implies mount point directory)
mount_object_store_dataset()
{
# Dataset name is $1
# Dataset category (lower case) is $2
_MOUNT_DIR="${ALL_DATASET_DIR}/${2}/${1}"
# TODO (later): this is a non-S3 implementation URL; add support for S3 directly also
# This is based on the nginx proxy config (hopefully)
_URL="http://minio_proxy:9000/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what is going on here??? But not 💯 sure...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not sure why exactly I was doing things the way I was planning, and then seemingly temporarily not doing them that way ...

I've cleaned this up a bit, but the _URL is based on the proxy hostname. I also fixed a problem (i.e., just now) where the proxy hostname and service name hadn't been consistent with this in the HA config.

s3fs ${1} ${_MOUNT_DIR} -o passwd_file=${HOME}/.passwd-s3fs -o url=${_URL} -o use_path_request_style
}

parse_object_store_strings()
{
while [ ${#} -gt 0 ]; do
_CAT="$(echo "${1}"| sed -e 's/\([^:]*\):.*/\1/' | awk '{print tolower($0)}')"
_NAME="$(echo "${1}"| sed -e 's/\([^:]*\):\(.*\)/\2/')"
mount_object_store_dataset ${_NAME} ${_CAT}
shift
done
}

check_for_dataset_dir()
{
# Dataset dir is $1
_CATEG="$(echo "${1}" | sed "s|${ALL_DATASET_DIR}/\([^/]*\)/.*|\1|" | awk '{print toupper($0)}')"
if [ ! -d "${1}" ]; then
echo "Error: expected ${_CATEG} dataset directory ${1} not found." 2>&1
exit 1
fi
}

# Read Docker Secrets files for Object Store access, if they exist
if [ -e "${ACCESS_KEY_FILE}" ]; then
ACCESS_KEY="$(cat "${ACCESS_KEY_FILE}")"
fi
if [ -e "${SECRET_KEY_FILE}" ]; then
SECRET_KEY="$(cat "${SECRET_KEY_FILE}")"
fi

# Execute object store routine if we have an access key
if [ -n "${ACCESS_KEY:-}" ]; then
# Of course, bail if we don't have the secret key also
if [ -z "${SECRET_KEY:-}" ]; then
echo "Error: ACCESS_KEY provided for Minio object store access, but no SECRET_KEY provided" 2>&1
exit 1
fi

# Configure auth for s3fs
echo ${ACCESS_KEY}:${SECRET_KEY} > "${S3FS_PASSWD_FILE}"
chmod 600 "${S3FS_PASSWD_FILE}"

# Parse args and mount any object stores datasets appropriately
parse_object_store_strings ${@:7}
fi

# Sanity check that the output, hydrofabric, and config datasets are available (i.e., their directories are in place)
check_for_dataset_dir "${CONFIG_DATASET_DIR}"
check_for_dataset_dir "${HYDROFABRIC_DATASET_DIR}"
check_for_dataset_dir "${OUTPUT_DATASET_DIR}"

# Move to the output dataset mounted directory
cd ${OUTPUT_DATASET_DIR}

#Make sure we are in workdir
cd ${WORKDIR}
#This is the input location that image_and_domain.yaml specificies as the run time mount location
domain_location=/ngen/data
#This is the output location that image_and_domain.yaml specifices as the run time mount location
output_dir=/ngen/output
#Create a tmp dir based on the job id to dump output to
tmp_domain=$output_dir/tmp_$3
mkdir -p $tmp_domain
#Soft link the mounted static inputs
ln -s $domain_location $tmp_domain/
#cd to the tmp dir to run
cd $tmp_domain
#Execute the model on the linked data
ngen ./data/catchment_data.geojson "" ./data/nexus_data.geojson "" ./data/refactored_example_realization_config.json > std_out.log 2> std_err.log
ngen ${HYDROFABRIC_DATASET_DIR}/catchment_data.geojson "" ${HYDROFABRIC_DATASET_DIR}/nexus_data.geojson "" ${CONFIG_DATASET_DIR}/realization_config.json > std_out.log 2> std_err.log

#Capture the return value to use as service exit code
ngen_return=$?
echo 'ngen returned with a return value: ' $ngen_return
#Remove soft link, which will have the same name as the last element of domain_location path
unlink data
#Exit with the model's exit code
exit $ngen_return
4 changes: 2 additions & 2 deletions docker/object_store/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,9 @@ services:
- exec_user_passwd
- exec_user_name

nginx:
minio_proxy:
image: nginx:1.21.1-alpine
hostname: nginx
hostname: minio_proxy
networks:
#- minio_distributed
- requests-net
Expand Down
2 changes: 2 additions & 0 deletions python/lib/core/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# About
Python package for core types depended upon by multiple other DMOD Python packages, which must be located in an isolated "core" package to avoid circular dependencies.
3 changes: 3 additions & 0 deletions python/lib/core/dmod/core/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from ._version import __version__

name = 'core'
1 change: 1 addition & 0 deletions python/lib/core/dmod/core/_version.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = '0.5.0'
19 changes: 19 additions & 0 deletions python/lib/core/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from setuptools import setup, find_namespace_packages

with open('README.md', 'r') as readme:
long_description = readme.read()

exec(open('dmod/core/_version.py').read())

setup(
name='dmod-core',
version=__version__,
description='',
long_description=long_description,
author='',
author_email='',
url='',
license='',
install_requires=[],
packages=find_namespace_packages(exclude=('tests', 'schemas', 'ssl', 'src'))
)
10 changes: 6 additions & 4 deletions python/lib/modeldata/dmod/modeldata/data/meta_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ class DataFormat(Enum):
"V2D": float, "PSFC": float, "SWDOWN": float, "LWDOWN": float, "offset": int}
)
""" The default format for "raw" AORC forcing data. """
NGEN_OUTPUT = (3, ["id", "Time"], None, {"id": str, "Time": datetime})
""" Representation of the format for Nextgen output, with unknown/unspecified configuration of output fields. """
# TODO: consider whether a datetime format string is necessary for each type value
# TODO: consider whether something to indicate the time step size is necessary
# TODO: need format specifically for Nextgen model output (i.e., for evaluations)
Expand Down Expand Up @@ -413,10 +415,11 @@ class DataCategory(Enum):
"""
The general category values for different data.
"""
FORCING = 0
HYDROFABRIC = 1
OUTPUT = 2
CONFIG = 0
FORCING = 1
HYDROFABRIC = 2
OBSERVATION = 3
OUTPUT = 4

@classmethod
def get_for_name(cls, name_str: str) -> Optional['DataCategory']:
Expand All @@ -437,7 +440,6 @@ def __init__(self, begin: datetime, end: datetime, variable: Optional[str] = Non
datetime_pattern=self.get_datetime_str_format())


# TODO: ***** fix DataDomain to include a data format value, then refactor this type to just use that as the domain
class DataRequirement(Serializable):
"""
A definition of a particular data requirement needed for an execution task.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,8 @@ def __init__(self, minio_host_str: str, access_key: Optional[str] = None, secret
super(ObjectStoreDatasetManager, self).__init__(datasets)
# TODO: add checks to ensure all datasets passed to this type are ObjectStoreDataset
self._minio_host_str = minio_host_str
self._client = Minio(minio_host_str, access_key=access_key, secret_key=secret_key)
# TODO (later): may need to look at turning this back on
self._client = Minio(minio_host_str, access_key=access_key, secret_key=secret_key, secure=False)

def _decode_object_name_to_file_path(self, object_name: str) -> str:
"""
Expand Down
2 changes: 1 addition & 1 deletion python/lib/monitor/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@
author_email='',
url='',
license='',
install_requires=['docker', 'Faker', 'dmod-communication>=0.4.2', 'dmod-redis>=0.1.0', 'dmod-scheduler>=0.4.0'],
install_requires=['docker', 'Faker', 'dmod-communication>=0.4.2', 'dmod-redis>=0.1.0', 'dmod-scheduler>=0.5.0'],
packages=find_namespace_packages(exclude=('test', 'src'))
)
2 changes: 1 addition & 1 deletion python/lib/scheduler/dmod/scheduler/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.4.1'
__version__ = '0.5.0'
Loading