Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collect_results! problem on AWS external drive #409

Closed
denfc opened this issue Mar 20, 2024 · 9 comments
Closed

collect_results! problem on AWS external drive #409

denfc opened this issue Mar 20, 2024 · 9 comments
Labels
data Related to data management help wanted The developers need some help here! running-listing Functionality for running and listing simulation runs saving-files Functionality for saving files

Comments

@denfc
Copy link

denfc commented Mar 20, 2024

I love Dr Watson and have been using it pretty much every day since I found it last year. Ready to scale up some code I've been running on AWS, I left the code where it is but --- only! --- changed the output directory to an AWS S3 account (because the final output will overflow my 55GB regular account) by replacing the first line below (which was working perfectly) with the second:

const` bsonFileDir::String = datadir("bsonOut")
changed to
const bsonFileDir::String = "/mnt/illData/bsonOut"

I ran the code and was pleased to see the 222 output files arrive successfully in /mnt/illData/bsonOut via, e.g., wsave(joinpath(bsonFileDir, bsonName), resultD), but

collect_results!(bsonFileDir, update = true, black_list = [:Class])

just hung, unable to read the files that the same script had put there seconds before. A quick search of the web suggested that for an external drive one can use @load for an individual file, but that doesn't get collect_results! working. Interestingly, readdir works, so Julia has no problem seeing the files.

I'm running Julia Version 1.10.2 (2024-03-01) and DrWatson v2.14.1. I could, of course, replicate the project on the S3 drive, and perhaps the problem will go away, but I would prefer to have only the one version sitting where it is. As I'm sure others have used external drives, I'm wondering if this problem is perhaps unique to AWS or is there a work-around or some error that I've missed?

Thanks for any insight that can be provided.

-- denfc

P.S. Running it in a (VSCode) "process" instead of the REPL allowed me to see the error messages:

[ Info: Starting a new result collection...
[ Info: Scanning folder /mnt/illData/bsonOut for result files.
[ Info: Added 222 entries. Updated 0 entries. Deleted 0 entries.

[17083] signal (7.2): Bus error
in expression starting at /home/ubuntu/... Script.jl:54
unsafe_store! at ./pointer.jl:146 [inlined]
unsafe_store! at ./pointer.jl:146 [inlined]
jlunsafe_store! at /home/ubuntu/.julia/packages/JLD2/VWinU/src/JLD2.jl:51 [inlined]
jlunsafe_store! at /home/ubuntu/.julia/packages/JLD2/VWinU/src/misc.jl:15 [inlined]
_write at /home/ubuntu/.julia/packages/JLD2/VWinU/src/mmapio.jl:190 [inlined]
jlwrite at /home/ubuntu/.julia/packages/JLD2/VWinU/src/misc.jl:27 [inlined]
commit at /home/ubuntu/.julia/packages/JLD2/VWinU/src/datatypes.jl:348
h5fieldtype at /home/ubuntu/.julia/packages/JLD2/VWinU/src/data/writing_datatypes.jl:378
h5type at /home/ubuntu/.julia/packages/JLD2/VWinU/src/data/writing_datatypes.jl:384
commit at /home/ubuntu/.julia/packages/JLD2/VWinU/src/data/writing_datatypes.jl:200
commit_compound at /home/ubuntu/.julia/packages/JLD2/VWinU/src/data/writing_datatypes.jl:185
unknown function (ip: 0x7ff1747688f9)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
h5fieldtype at /home/ubuntu/.julia/packages/JLD2/VWinU/src/data/writing_datatypes.jl:105
unknown function (ip: 0x7ff174768d35)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
commit_compound at /home/ubuntu/.julia/packages/JLD2/VWinU/src/data/writing_datatypes.jl:159
unknown function (ip: 0x7ff1747688f9)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
h5type at /home/ubuntu/.julia/packages/JLD2/VWinU/src/data/writing_datatypes.jl:137
unknown function (ip: 0x7ff174765b45)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
h5type at /home/ubuntu/.julia/packages/JLD2/VWinU/src/data/writing_datatypes.jl:142
write_dataset at /home/ubuntu/.julia/packages/JLD2/VWinU/src/datasets.jl:653
#write#110 at /home/ubuntu/.julia/packages/JLD2/VWinU/src/compression.jl:137
write at /home/ubuntu/.julia/packages/JLD2/VWinU/src/compression.jl:125 [inlined]
#write#109 at /home/ubuntu/.julia/packages/JLD2/VWinU/src/compression.jl:121 [inlined]
write at /home/ubuntu/.julia/packages/JLD2/VWinU/src/compression.jl:121
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
#89 at /home/ubuntu/.julia/packages/JLD2/VWinU/src/fileio.jl:14
unknown function (ip: 0x7ff174760a15)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
#jldopen#69 at /home/ubuntu/.julia/packages/JLD2/VWinU/src/loadsave.jl:4
jldopen at /home/ubuntu/.julia/packages/JLD2/VWinU/src/loadsave.jl:1 [inlined]
#fileio_save#88 at /home/ubuntu/.julia/packages/JLD2/VWinU/src/fileio.jl:6 [inlined]
fileio_save at /home/ubuntu/.julia/packages/JLD2/VWinU/src/fileio.jl:5
unknown function (ip: 0x7ff174760319)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
#action#33 at /home/ubuntu/.julia/packages/FileIO/xOKyx/src/loadsave.jl:219
action at /home/ubuntu/.julia/packages/FileIO/xOKyx/src/loadsave.jl:196 [inlined]
#action#32 at /home/ubuntu/.julia/packages/FileIO/xOKyx/src/loadsave.jl:185 [inlined]
action at /home/ubuntu/.julia/packages/FileIO/xOKyx/src/loadsave.jl:185 [inlined]
#save#20 at /home/ubuntu/.julia/packages/FileIO/xOKyx/src/loadsave.jl:129
save at /home/ubuntu/.julia/packages/FileIO/xOKyx/src/loadsave.jl:125 [inlined]
#_wsave#34 at /home/ubuntu/.julia/packages/DrWatson/rXaRB/src/DrWatson.jl:33 [inlined]
_wsave at /home/ubuntu/.julia/packages/DrWatson/rXaRB/src/DrWatson.jl:33 [inlined]
#wsave#35 at /home/ubuntu/.julia/packages/DrWatson/rXaRB/src/DrWatson.jl:44 [inlined]
wsave at /home/ubuntu/.julia/packages/DrWatson/rXaRB/src/DrWatson.jl:42 [inlined]
#collect_results!#89 at /home/ubuntu/.julia/packages/DrWatson/rXaRB/src/result_collection.jl:200
collect_results! at /home/ubuntu/.julia/packages/DrWatson/rXaRB/src/result_collection.jl:84
#collect_results!#88 at /home/ubuntu/.julia/packages/DrWatson/rXaRB/src/result_collection.jl:74
collect_results! at /home/ubuntu/.julia/packages/DrWatson/rXaRB/src/result_collection.jl:74
unknown function (ip: 0x7ff174733079)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
_include at ./loading.jl:2136
include at ./Base.jl:495
jfptr_include_46403.1 at /home/ubuntu/julia-1.10.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
exec_options at ./client.jl:318
_start at ./client.jl:552
jfptr__start_82738.1 at /home/ubuntu/julia-1.10.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7ff175a29d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 419945137 (Pool: 419904699; Big: 40438); GC: 147

  • Terminal will be reused by tasks, press any key to close it.
@Datseris Datseris added help wanted The developers need some help here! running-listing Functionality for running and listing simulation runs saving-files Functionality for saving files data Related to data management labels Mar 21, 2024
@Datseris
Copy link
Member

Thanks for raising the issue. I have never used AWS so unfortunately I have no idea how to help here... Hopefully a good soul will see this issue and give us some light!

@denfc
Copy link
Author

denfc commented Mar 21, 2024

Thanks for the response, George. As noted above, I was able now to add the error messages. I'll start searching for something about them, and I'm going to share them with a couple of people who know more about AWS than I. -- Denis

@JonasIsensee
Copy link
Member

Can you try passing iotype=IOStream to the loading call?
(Are kws forwarded through collect_results?)

@denfc
Copy link
Author

denfc commented Mar 21, 2024

Although I may encounter the problem again, for now I'm bypassing it. The problem does indeed lie with my use of AWS S3, which although designed "for virtually any use case", is cheapest, and correspondingly slowest, when used for long-term storage, which is what I primarily have it for. My expert colleague, i.e., the one with knowledge who set me up on AWS originally, and I did play with the options for mounting the S3 drive but could not make it fast enough to be open for Julia when it wants to load files.

Instead, we increased the amount of storage I have in my standard AWS account, which should just allow this first planned scale-up to work. After it does, I'll copy the Dr Watson results file into S3. If the next scale-up requires more space, I'm not sure what I'll do (spend money?!), so it would be great to have a Julia solution.

I'll leave it up to George to comment on @JonasIsensee 's suggestion of considering modifying the collects_results! function to pass iotype = IOStream. Might that work?

@Datseris
Copy link
Member

The rest of the saving/loadings functions have been updated to allow passing arbitrary keywords to the save/load command, but apparently collect_results! has slipped away and does not have such an option. Should be a simple PR adding another keyword to collect_results! to allow propagating a named tuple of arguments to the load (which would allow the IO type).

@Datseris
Copy link
Member

(note: whether specifying IOtype would work or not I have no idea; I have no idea about AWS in general and I am beyond capacity to learn it now...)

@denfc
Copy link
Author

denfc commented Mar 21, 2024

@Datseris Often I feel that "beyond capacity" could be my name ... .

But for future reference, I did want to note that I'm having no difficulty using jldopen in the REPL to read and work with a jld2 file that I (just) created (via a Pluto notebook) in that same S3 drive by opening and working with a single large hdf5 file also on S3.

Maybe something to do with BSON files? We'll find out one day, but there's no rush.

@denfc
Copy link
Author

denfc commented Mar 22, 2024

https://juliacloud.github.io/AWSS3.jl/stable/

Haven't played with it yet.

@denfc denfc closed this as not planned Won't fix, can't repro, duplicate, stale Apr 1, 2024
@denfc
Copy link
Author

denfc commented Apr 1, 2024

I played with the above package for a short time, but I did not get collect_results! to work with S3. I have a couple of ideas, but instead I have managed to work around my problem, so I'm no longer investigating it, and I'm closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Related to data management help wanted The developers need some help here! running-listing Functionality for running and listing simulation runs saving-files Functionality for saving files
Projects
None yet
Development

No branches or pull requests

3 participants