/ metaflow Public
Fixing directory issue of local metadataprovider with aws batch. #141
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge.
There is an issue in syncing metadata for local metadata provider when using AWS Batch.
I am using a local metadata provider with executes on Batch with S3 datastore. I was able to complete the flow successfully on batch and was trying to analyze results in a notebook. But when I access properties of the run like
run.datait throws an error.
I am using https://github.com/valayDave/mnist-experiments-with-metaflow/blob/without_conda_test/hello_mnist.py as the file and running the following command :
python hello_mnist.py --with batch:cpu=2,memory=4000,image=tensorflow/tensorflow:latest-py3 run --num_training_examples 1000.
The odd thing is that there are two
.metaflow/.metaflowfolders in my directory and both of them holding the same flows and runs. When moved the data under
.metaflow/FLOWNAME/RUN_ID/STEP_NAME/TASK_ID/_metaand executed my notebook, It worked perfectly fine. I am using version 2.0.2.
I investigated that there is data sync after a step from S3 which brings metadata back to the client and does a copy tree operation.
While on batch because of the METAFLOW_DATASTORE_SYSROOT_LOCAL is being referenced to DATASTORE_LOCAL_DIR which is
DATASTORE_LOCAL_DIR = '.metaflow'in the metaflow_config, the flow-related files are created under
.metaflow/.metaflow. So when it Tars and untars the data back on the local client and performs copy tree, it creates a new
.metaflowfolder on the local machine.
Removing the METAFLOW_DATASTORE_SYSROOT_LOCAL the fixed the problem.