New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing directory issue of local metadataprovider with aws batch. #141
Conversation
@@ -145,7 +145,6 @@ def launch_job( | |||
.environment_variable('METAFLOW_USER', attrs['metaflow.user']) \ | |||
.environment_variable('METAFLOW_SERVICE_URL', BATCH_METADATA_SERVICE_URL) \ | |||
.environment_variable('METAFLOW_SERVICE_HEADERS', json.dumps(BATCH_METADATA_SERVICE_HEADERS)) \ | |||
.environment_variable('METAFLOW_DATASTORE_SYSROOT_LOCAL', DATASTORE_LOCAL_DIR) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to the code outlining why METAFLOW_DATASTORE_SYSROOT_LOCAL
isn't needed here, so that it doesn't get added inadvertently later on? cc - @romain-intel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I can update the PR by putting a comment on top?
metaflow/plugins/aws/batch/batch.py
Outdated
# metadata (provided in task_finished in batch_decorator.py and in _sync_metadata in batch_cli.py) assumes that DATASTORE_LOCAL_DIR is where the metadata is stored on the remote batch instance. | ||
# This is the default value if METAFLOW_DATASTORE_SYSROOT_LOCAL is NOT set; | ||
# if it is, the resulting path is different (see get_datastore_root_from_config in datastore/local.py) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@savingoyal Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valayDave Just made a few minor changes to the comment. We are good to push this PR now.
There is an issue in syncing metadata for local metadata provider when using AWS Batch.
I am using a local metadata provider with executes on Batch with S3 datastore. I was able to complete the flow successfully on batch and was trying to analyze results in a notebook. But when I access properties of the run like
run.data
it throws an error.I am using https://github.com/valayDave/mnist-experiments-with-metaflow/blob/without_conda_test/hello_mnist.py as the file and running the following command :
python hello_mnist.py --with batch:cpu=2,memory=4000,image=tensorflow/tensorflow:latest-py3 run --num_training_examples 1000
.The odd thing is that there are two
.metaflow/.metaflow
folders in my directory and both of them holding the same flows and runs. When moved the data under.metaflow/.metaflow/FLOWNAME/RUN_ID/STEP_NAME/TASK_ID/_meta
to.metaflow/FLOWNAME/RUN_ID/STEP_NAME/TASK_ID/_meta
and executed my notebook, It worked perfectly fine. I am using version 2.0.2.I investigated that there is data sync after a step from S3 which brings metadata back to the client and does a copy tree operation.
While on batch because of the METAFLOW_DATASTORE_SYSROOT_LOCAL is being referenced to DATASTORE_LOCAL_DIR which is
DATASTORE_LOCAL_DIR = '.metaflow'
in the metaflow_config, the flow-related files are created under.metaflow/.metaflow
. So when it Tars and untars the data back on the local client and performs copy tree, it creates a new.metaflow
under the.metaflow
folder on the local machine.Removing the METAFLOW_DATASTORE_SYSROOT_LOCAL the fixed the problem.