You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When running a UDF that performs some initializations (for example, commonly - downloading and caching a model on disk), there can be a problem where multiple workers attempt to do so at the same time and end up thrashing each other.
We should allow for a mechanism to perform node-level initializations in a UDF, that execute once-per-node. Alternatively, if the user is provided with a unique worker ID through the daft.context, then they could also do more intelligent things such as using a different cache folder for each stateful UDF's initializations.
The text was updated successfully, but these errors were encountered:
This is possible already without any Daft-provided functionality. Users can utilize a library such as https://github.com/tox-dev/py-filelock to perform node-level locking.
from filelock import FileLock
class MyUDF:
def __init__(self):
with FileLock("/tmp/.myudf.lock"):
download_model_to_disk()
self.model = load_model()
Documentation will be added in an FAQ section for UDFs.
Is your feature request related to a problem? Please describe.
When running a UDF that performs some initializations (for example, commonly - downloading and caching a model on disk), there can be a problem where multiple workers attempt to do so at the same time and end up thrashing each other.
We should allow for a mechanism to perform node-level initializations in a UDF, that execute once-per-node. Alternatively, if the user is provided with a unique worker ID through the
daft.context
, then they could also do more intelligent things such as using a different cache folder for each stateful UDF's initializations.The text was updated successfully, but these errors were encountered: