-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocess Fails Accessing Sharrow Cache #876
Comments
Is it somehow recompiling? The trace shows it's calling sharrow flow rewrite(). It would make sense if multiple processors are rewriting the same file and they clash. |
Tried mp with sharrow on for the SANDAG ABM3 model as well using 4 cores. It also crashed with the same error in vehicle allocation. I agree that the problem seems to be sharrow recompiling after missing one of the vehicle type alternatives. Interestingly, I do not see sharrow recompiling the the vehicle_allocation model when running in single process... Log files are attached below: |
It looks like you are maybe setting the We can tell it's not a data-type problem because two processes are competing to compile the same "flow", which has a unique hash that ensures the data types are actually the same (if they were different it wouldn't crash as there would be different files with different hashes, it would just be slow). You can fix this by using the same explicit absolute-path cache location for both model runs, or just by activating the persistent sharrow cache, as I did here in the example exercise script. This moves the sharrow cache out of a run-specific directory and into a common run-agnostic directory, so it gets re-used in cases like this. |
I got it in trip destination as well |
@JoeJimFlood are you running multiprocessing with a small(ish) household sample size? For the fuel and body type variables discussed higher up this thread, they were getting created as categorical within a preprocessor, and not having stable dtype on that account. Your issue with |
I am closing this, as the original bug in this issue has been addressed by changing the data type of variables in the vehicle allocation pre-processor. @JoeJimFlood, if you are still encountering problems you can reopen or create a new issue with additional details. |
Describe the bug
Model run is crashing with a permissions error trying to access the sharrow flow cache in the vehicle allocation model.
I was successfully able to complete the run with 4, 8, 12, and 20 cores, but 16 and 24 cores failed. (Machine has 24 cores total.) I suspect the two runs that failed just got unlucky with the different processes trying to access the same file at the same time. I do not know why this has happened on the vehicle allocation model both times though.
Potential fix: Put in a wait statement if access to the sharrow cache is denied?
To Reproduce
Steps to reproduce the behavior:
Run MTC example model with sharrow on and multiprocess with many cores.
Screenshots
![image](https://private-user-images.githubusercontent.com/51132108/347057674-ff099dd8-011b-48c9-aa0c-248f75adbd0b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNDIxMTUsIm5iZiI6MTczOTM0MTgxNSwicGF0aCI6Ii81MTEzMjEwOC8zNDcwNTc2NzQtZmYwOTlkZDgtMDExYi00OGM5LWFhMGMtMjQ4Zjc1YWRiZDBiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDA2MzAxNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE5YjgxMWJkZTU1YzYyMGExMjUxNzRkNDQ4NDU3ZWRiMjEyNWJjMjVmMDJhMzc5ZWQwZWMxZmE2OGFmZGQxMDgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.xHH5CoFIlO3uZsixTJoH6oBeLlgsFaQ1pyn_PwA3gv0)
crash_logs.zip
Additional context
I will try to re-run 16 and 24 and see if I hit the same error again.
The text was updated successfully, but these errors were encountered: