Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor fv3 runtime modules and image construction #185

Merged
merged 14 commits into from
Mar 21, 2020
Merged
22 changes: 17 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#################################################################################
# GLOBALS #
#################################################################################
VERSION = 0.1.0
frodre marked this conversation as resolved.
Show resolved Hide resolved
ENVIRONMENT_SCRIPTS = .environment-scripts
PROJECT_DIR := $(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
BUCKET = [OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')
Expand All @@ -22,8 +23,22 @@ endif
#################################################################################
# COMMANDS #
#################################################################################
build_image:
docker build . -t $(IMAGE) -t $(GCR_IMAGE)
.PHONY: wheels build_images push_image
wheels:
pip wheel --no-deps .
pip wheel --no-deps external/vcm

# pattern rule for building docker images
build_image_%:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. 🤩

docker build -f docker/$*/Dockerfile . -t us.gcr.io/vcm-ml/$*:$(VERSION)

build_image_prognostic_run: wheels

build_images: build_image_fv3net build_image_prognostic_run

push_image:
docker push us.gcr.io/vcm-ml/fv3net:$(VERSION)
docker push us.gcr.io/vcm-ml/prognostic_run:$(VERSION)

enter: build_image
docker run -it -v $(shell pwd):/code \
Expand All @@ -33,9 +48,6 @@ enter: build_image
# -e GOOGLE_APPLICATION_CREDENTIALS=/google_creds.json \
# -v $(HOME)/.config/gcloud/application_default_credentials.json:/google_creds.json \

push_image: build_image
docker push $(GCR_IMAGE)


## Make Dataset
.PHONY: data update_submodules create_environment overwrite_baseline_images
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,18 @@ The main data processing pipelines for this project currently utilize Google Clo
Dataflow and Kubernetes with Docker images. Run scripts to deploy these workflows
along with information can be found under the `workflows` directory.

## Building the fv3net docker images

The workflows use a pair of common images:

|Image| Description|
|-----|------------|
| `us.gcr.io/vcm-ml/prognostic_run` | fv3gfs-python with minimal fv3net and vcm installed |
| `us.gcr.io/vcm-ml/fv3net` | fv3net image with all dependencies including plotting |

These images can be built and pushed to GCR using `make build_images` and
`make push_images`, respectively.

## Dataflow

Dataflow jobs run in a "serverless" style where data is piped between workers who
Expand Down Expand Up @@ -117,6 +129,7 @@ If you get an error `Could not create workflow; user does not have write access
trying to submit the dataflow job, do `gcloud auth application-default login` first and then retry.



## Deploying on k8s with fv3net

Docker images with the python-wrapped model and fv3run are available from the
Expand Down
30 changes: 0 additions & 30 deletions docker/Dockerfile.kubernetes

This file was deleted.

22 changes: 0 additions & 22 deletions docker/download_inputdata.sh

This file was deleted.

10 changes: 3 additions & 7 deletions Dockerfile → docker/fv3net/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ ENV PROJECT_NAME=fv3net
USER root
RUN apt-get update && apt-get install -y gfortran
ADD environment.yml $FV3NET/
ADD Makefile $FV3NET/
ADD .environment-scripts $ENVIRONMENT_SCRIPTS
RUN fix-permissions $FV3NET
WORKDIR $FV3NET
Expand All @@ -20,21 +19,18 @@ ENV PATH=/opt/conda/envs/fv3net/bin:$PATH
RUN bash $ENVIRONMENT_SCRIPTS/build_environment.sh $PROJECT_NAME
RUN jupyter labextension install @pyviz/jupyterlab_pyviz

# Add rest of fv3net directory
USER root
ADD . $FV3NET
# install gcloud sdk
RUN cd / && \
curl https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-284.0.0-linux-x86_64.tar.gz |\
tar xz
ENV PATH=/google-cloud-sdk/bin:${PATH}
#RUN /google-cloud-sdk/bin/gcloud init

# Add rest of fv3net directory
ADD . $FV3NET

RUN fix-permissions $FV3NET
USER $NB_UID

# RUN gcloud init

# setup the local python packages

RUN bash $ENVIRONMENT_SCRIPTS/install_local_packages.sh $PROJECT_NAME
18 changes: 0 additions & 18 deletions docker/install_gcloud.sh

This file was deleted.

8 changes: 8 additions & 0 deletions docker/prognostic_run/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM us.gcr.io/vcm-ml/fv3gfs-python:v0.2.1


COPY docker/prognostic_run/requirements.txt /tmp/requirements.txt
RUN pip3 install -r /tmp/requirements.txt
COPY fv3net-0.1.0-py3-none-any.whl /wheels/fv3net-0.1.0-py3-none-any.whl
COPY vcm-0.1.0-py3-none-any.whl /wheels/vcm-0.1.0-py3-none-any.whl
RUN pip3 install --no-deps /wheels/fv3net-0.1.0-py3-none-any.whl && pip3 install /wheels/vcm-0.1.0-py3-none-any.whl
5 changes: 5 additions & 0 deletions docker/prognostic_run/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
scikit-learn==0.22.1
dask
joblib
zarr
scikit-image
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ dependencies:
- h5netcdf
- h5py>=2.10
- hypothesis
- pandas=1.0.1
- intake
- intake-xarray
- metpy
Expand Down
2 changes: 2 additions & 0 deletions fv3net/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@

TOP_LEVEL_DIR = pathlib.Path(__file__).parent.parent.absolute()
COARSENED_DIAGS_ZARR_NAME = "gfsphysics_15min_coarse.zarr"

__version__ = "0.1.0"
3 changes: 3 additions & 0 deletions fv3net/runtime/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from . import sklearn_interface as sklearn
from .state_io import init_writers, append_to_writers, CF_TO_RESTART_MAP
from .config import get_runfile_config, get_namelist
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ class dotdict(dict):
__delattr__ = dict.__delitem__


def get_config():
def get_runfile_config():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by a runfile config? Should we formalize this somehow? I see the value of the concept, but given what this function does it seems like get_sklearn_config or something would be a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, and have removed this function in #193. I think i will merge this as is though, just to get the ball rolling.

with open("fv3config.yml") as f:
config = yaml.safe_load(f)
return dotdict(config["scikit_learn"])
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
from sklearn.externals import joblib
from sklearn.utils import parallel_backend

import state_io
from . import state_io

__all__ = ["open_model", "predict", "update"]

def open_sklearn_model(url):

def open_model(url):
# Load the model
with fsspec.open(url, "rb") as f:
return joblib.load(f)
Expand All @@ -30,17 +32,3 @@ def update(model, state, dt):
)

return state_io.rename_to_orig(updated), state_io.rename_to_orig(tend)


if __name__ == "__main__":
import sys

state_path = sys.argv[1]
model = open_sklearn_model(sys.argv[2])

with open(state_path, "rb") as f:
data = state_io.load(f)

tile = data[0]
preds = update(model, tile, dt=1)
print(preds)
2 changes: 1 addition & 1 deletion workflows/end_to_end/example-workflow-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,4 +84,4 @@ experiment:
extra_args:
prognostic_yaml_adjust: workflows/prognostic_c48_run/prognostic_config.yml
ic_timestep: "20160801.001500"
docker_image: us.gcr.io/vcm-ml/prognostic-run-orchestration
docker_image: us.gcr.io/vcm-ml/prognostic_run:v0.1.0
10 changes: 0 additions & 10 deletions workflows/prognostic_c48_run/Dockerfile

This file was deleted.

25 changes: 7 additions & 18 deletions workflows/prognostic_c48_run/Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
IMAGE=test-image
IMAGE = us.gcr.io/vcm-ml/prognostic_run:v0.1.0
KEY_ARGS= -v $(GOOGLE_APPLICATION_CREDENTIALS):/key.json \
-e GOOGLE_APPLICATION_CREDENTIALS=/key.json
LOCAL_DIR_ARGS = -w /code -v $(shell pwd):/code
Expand All @@ -7,26 +7,10 @@ RUN_ARGS = --rm $(KEY_ARGS) $(LOCAL_DIR_ARGS) $(IMAGE)
RUN_INTERACTIVE = docker run -ti $(RUN_ARGS)
RUN ?= docker run $(RUN_ARGS)
SKLEARN_MODEL = gs://vcm-ml-data/test-annak/ml-pipeline-output/2020-01-17_rf_40d_run.pkl
FV3CONFIG = fv3config.yml
FV3NET_VERSION ?=2020-01-23-prognostic-rf
FV3CONFIG = gs://vcm-ml-data/end-to-end-experiments/2020-02-26-physics-off/annak-prognostic-physics-off-1773255e/prognostic_run_prognostic_yaml_adjust_prognostic_config.yml_ic_timestep_20160801.001500_docker_image_prognostic-run-orchestration/job_config/fv3config.yml

all: sklearn_run

fv3net-0.1.0-py3-none-any.whl:
pip wheel --no-deps git+ssh://git@github.com/VulcanClimateModeling/fv3net.git@$(FV3NET_VERSION)

build: fv3net-0.1.0-py3-none-any.whl
docker build . -t $(IMAGE)

fv3net-local:
pip wheel --no-deps ../../.

vcm-local:
pip wheel --no-deps ../../external/vcm

build_local: fv3net-local vcm-local
docker build . -t $(IMAGE)

dev:
$(RUN_INTERACTIVE) bash

Expand All @@ -36,9 +20,14 @@ test_run_sklearn: state.pkl
state.pkl:
fv3run --dockerimage test-image --runfile save_state_runfile.py $(FV3CONFIG) save_state/
cp save_state/state.pkl .

sklearn_run_local: #rundir
fv3run --dockerimage $(IMAGE) --runfile sklearn_runfile.py $(FV3CONFIG) rundir

sklearn_run: #rundir
fv3run --dockerimage us.gcr.io/vcm-ml/prognostic-run-orchestration --runfile sklearn_runfile.py $(FV3CONFIG) ../../scratch/rundir

clean:
rm -rf net_precip net_heating/ PW

.PHONY: fv3net vcm build dev sklearn_run
3 changes: 1 addition & 2 deletions workflows/prognostic_c48_run/fv3config.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
scikit_learn:
model: gs://vcm-ml-data/test-annak/ml-pipeline-output/2020-01-17_rf_40d_run.pkl
scikit_learn: model:gs://vcm-ml-data/end-to-end-experiments/2020-02-26-physics-off/annak-prognostic-physics-off/train_sklearn_model_train-config-file_example_base_rf_training_config.yml_delete-local-results-after-upload_False/sklearn_model.pkl
zarr_output: diags.zarr
data_table: default
diag_table: gs://vcm-ml-data/2020-01-15-noahb-exploration/2hr_strong_dampingone_step_config/C48/20160805.000000/diag_table
Expand Down
18 changes: 8 additions & 10 deletions workflows/prognostic_c48_run/sklearn_runfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,16 @@
import zarr

import fv3gfs
import sklearn_interface
import state_io
from fv3gfs._wrapper import get_time
from fv3net import runtime
from mpi4py import MPI
import config

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

SPHUM = "specific_humidity"
DELP = "pressure_thickness_of_atmospheric_layer"
VARIABLES = list(state_io.CF_TO_RESTART_MAP) + [DELP]
VARIABLES = list(runtime.CF_TO_RESTART_MAP) + [DELP]

cp = 1004
gravity = 9.81
Expand All @@ -32,8 +30,8 @@ def compute_diagnostics(state, diags):
)


args = config.get_config()
NML = config.get_namelist()
args = runtime.get_runfile_config()
NML = runtime.get_namelist()
TIMESTEP = NML["coupler_nml"]["dt_atmos"]

times = []
Expand All @@ -55,7 +53,7 @@ def compute_diagnostics(state, diags):

if rank == 0:
logger.info("Downloading Sklearn Model")
MODEL = sklearn_interface.open_sklearn_model(args.model)
MODEL = runtime.sklearn.open_model(args.model)
logger.info("Model downloaded")
else:
MODEL = None
Expand All @@ -81,7 +79,7 @@ def compute_diagnostics(state, diags):

if rank == 0:
logger.debug("Computing RF updated variables")
preds, diags = sklearn_interface.update(MODEL, state, dt=TIMESTEP)
preds, diags = runtime.sklearn.update(MODEL, state, dt=TIMESTEP)
if rank == 0:
logger.debug("Setting Fortran State")
fv3gfs.set_state(preds)
Expand All @@ -91,8 +89,8 @@ def compute_diagnostics(state, diags):
diagnostics = compute_diagnostics(state, diags)

if i == 0:
writers = state_io.init_writers(GROUP, comm, diagnostics)
state_io.append_to_writers(writers, diagnostics)
writers = runtime.init_writers(GROUP, comm, diagnostics)
runtime.append_to_writers(writers, diagnostics)

times.append(get_time())

Expand Down
Loading