Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #97 (overwriting data storage folders with joblib) #99

Merged
merged 2 commits into from
Nov 15, 2021

Conversation

timcallow
Copy link
Contributor

This PR is a fix for issue #97. The data folder from which joblib reads large arrays is created with a random name so that when multiple jobs are submitted from the same directory, the data written in one job does not overwrite the data in another.

Comment on lines 159 to 161
joblib_folder = "".join(
random.choices(string.ascii_uppercase + string.digits, k=30)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although very unlikely, but if a joblib_folder already exists it will still be (silently) overwritten. I would suggest

while True:
    try:
        joblib_folder = "".join(random.choices(string.ascii_uppercase + string.digits, k=30))
        os.mkdir(joblib_folder)
        break
    except FileExistsError as e:
        print(e)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the suggested change

@timcallow timcallow merged commit 3d2856e into atomec-project:develop Nov 15, 2021
@timcallow timcallow deleted the fix_joblib_datastorage branch November 15, 2021 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelization fails for jobs running simultaneously from same folder
2 participants