Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOLfYN: Chunking Data #314

Open
ssolson opened this issue Apr 26, 2024 · 2 comments
Open

DOLfYN: Chunking Data #314

ssolson opened this issue Apr 26, 2024 · 2 comments

Comments

@ssolson
Copy link
Contributor

ssolson commented Apr 26, 2024

@jmcvey3 I was talking to a DOLfYN user at Sandia today and they indicated that they were working with a large file and needed to write their own chunking script around DOLfYN bc the file size was larger than their RAM. DOLfYN froze on them which was a bad user experience for them.

Would it be both possible and useful to add a chucking feature to DOLfYN?

@jmcvey3
Copy link
Contributor

jmcvey3 commented Apr 27, 2024

Yes though it might be a separate tool. Let me ping Levi; I believe he ran some tool to do this to split up massive the binary files before even reading them with dolfyn.

@lkilcher
Copy link

lkilcher commented May 8, 2024

I have a vague recollection of trying (maybe succeeding?) to create a script that chunks binary files into smaller binary files, but honestly I don't know where it is if it ever was successful. Also, this seems like a difficult piece of code to maintain, so I wouldn't recommend baking it into DOLfYN.

Most of the binary reader tools allow you to specify a range of pings you want to read. Therefore, you should be able to write a script that loops over ranges of pings. In other words: I don't think I typically chunked binary files, instead I chunked them when reading them into "raw" ncdf files.

If that doesn't work, let me know what kind of file it is is, and maybe try killing the process shortly (sometime before memory runs out), and letting me know what top-level function(s) are running when you do this? That will at least tell us what loop is creating the memory leak.

Does that help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants