-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New GUI idea based on H5Py #33
Comments
@ErichZimmer is back :) @eguvep - what do you think? |
@ErichZimmer can you please take a look at NetCDF files? the xarray project and our sister pivpy project uses it (as a competitor for HDF5) and xarray provides some great extension over pandas to get easy things e.g. |
@alexlib I made a mostly functional GUI built around HDF5 and is parallel capable through some work arounds with current limitations on dependencies. So far, the only extra dependency for this GUI is H5Py. I'll try other styles to see their performances, but HDF5 is performing pretty well so far... PS, please ignore spelling errors as I am low on time and my mobile hotspot won't let me edit previous posts for some reason.. |
Dear Erich! I am very happy to read that you are back and it is great to see your immediate productive postings! Another thing – mentioned in our previous discusson – is compatibility. Our simple and stupid CSV files are the lowest common denominator with almost every other code (like awk or other command line stuff; they are even human readable) and we follow the UNIX philosophy by using text files. I would strongly vote for an CSV import and export option to not destroy this compatibility. Or are there any command line tools for extracting HDF5 data (I am a novice in HDF5)? Can we be shure, that changes in the HDF5 code do not break the GUI? As far as I can see, HDF5 seems to be fairly mature, right? I had a quick look at the other data-formats, @alexlib. And I worked with NetCDF before (there is a JPIV extension for generating synthetic PIV images based on that format and the SIG project). It is hard to tell – if not impossible – which format is best. HD5 seems to be slightly more flexible, so it seems possible to put really everything into the files. Everything that is hard to decide, can be decided randomly in my opinion ;-) So lets give HDF5 a try! Regards! Peter |
I think we need to split the two topics: My suggestion is to try pandas with CSV first and then if the performance is not sufficient, keep working with pandas and HDF5. Regarding HDF5 - they're fast and flexible, and there are some things like HDFView or h5dump that help to see their content. NetCDF - the only benefit is the straightforward continuation and connection with the pivpy - after all, we probably want to have GUI also for the post-processing, colorful images, vorticity, strain, etc. |
For clarity, the PIV database could even be a separate project. A PIV-database object could provide methods
|
Great idea. What is the structure of this project? For pivpy we use xarray DataSet - it's a pandas-dataframe-on-steroids with metadata attached to it. I didn't find another similar solution that provides me an option to average along a "named" dimension and have a lot of underlying mechanics of all kind of possible numerical operators. |
Do you know any way of chunking PIV data so the user doesn't have to load too much stuff into memory? The reason why I chose a HDF5 format was because at most, there is only two complete PIV results (2 frames) loaded into memory and the rest is stored on the hard drive. In my case, I analyzed ~3,000 images to get ~3000 results of which I only have to load and work on one result at a time on the GUI. Since I am not using chunking, the results can have different sizes for whatever reason (different size windowing/overlap). Perhaps, NetCDF partnered up with xarray would be the way to go, but for now, I'll stick with a HDF5 format until I learn more about NetCDF (I like xarray a lot though so I'll try..) By the way, there is an export page on the GUI to export our results in multiple different ways and file types. |
The parallel or chunked reading is not from xarray, but from dask |
Well, this got a little more confusing... But I'll see what I can do as the GUI is currently made to switch internal formats relatively easy. For HDF5, the GUI is setup like this: PS, nearly flipped out hitting the close with comment button since that thing is HUGE on my phone :( |
some simple facts: NetCDF4 = HDF5 with some extra limitations and it's own API. same performance which branch are you at @ErichZimmer ? I'll try to see if I understand if there is a point to use xarray + netcdf file for it. |
@alexlib , |
Hi @ErichZimmer, |
That looks very impressive! |
@ErichZimmer looks very nice. We need to figure out how to merge this into the existing one. Through AddIn or otherwise, by some coding. |
@eguvep Currently, the advanced GUI is not compatible with the add-ins system. However, an option can be selected in the add-ins panel to enable the advanced GUI and all its features. I just have to figure out how the list boxes are going to be coded as they are completely incompatible with the simple GUI. |
Computational speed is important but only if we find a way to install the package as simple as it is now. We shall probably first try numba. We can always create a professional version with a different name and installation instructions, eg openpiv-Python-pro code the advanced users |
I think, we could make it compatible by making the code more modular with the help of the add-in system. In my dreams ;-) every user can compos her or his individual GUI by selecting or deselecting the features they need or do not need. |
@alexlib numba works great on the @eguvep That would be very nice and is a great idea. Some GUI's have good control on what features are needed and what isn't. The advanced GUI is starting to incorporate this in an attempt to make the main code similar enough to the the simpler version, but it is hard to combine the two as there isn't much double coding (the only similar function is initializing the widgets). I probably wasn't thinking about the Add-Ins system until I was done with the main functions. |
@eguvep @ErichZimmer please also take a look on the way the GUI for this tracker is arranged. Seems quite simple in terms of uncluttered environment with multiple options. I think this is the same concept as for napari |
@eguvep @alexlib Regards, PS, maybe we can create an executable with an embedded python interpreter for users that don't want to bother with installing python. If we do go this route, an executable would have to made in each operating system. Just stay away from the ones that attempt to transcribe python to c or c++. It'll make you lose your hair at the end of the day ;) |
Should we try to keep h5py or netCDF based GUIs? Using them allow for a huge amount of opportunities before exporting files, but at the cost of complexity and some additional computation costs. |
I agree that one of those would be great. I think the main part here is a fast I/O and if possible, access from outside of the GUI, e.g. from a Jupyter notebook - allowing interaction with the data from a post-processing package. I do not mind h5py or netcdf - as long as we in the future interface them, i.e. we will add h5data.to_necdf() and netcdfdata._to_h5() later on. |
|
In my point of view, one of the main design-goals of the GUI is simplicity, so that none-programmers can easily understand and contribute. The add-in system is structuring the code even more to make it even more accessible. On the other hand, I see the advantages of an efficient binary file format. Do you really see no way of using h5py in the scope of a plug-in? This would be the most desirable solution, in my opinion. |
Additionally, a second order image dewarping function is being developed, but it is going slow due to my lack of expertise in mathematics (took too long of a break :P ) |
Great. Where is it? We had another repo by Theo with a similar development - better is we learn from both |
My internet is a little too slow to push the GUIs to my fork, so I'll try again later. The theory is based on the article, |
please see https://github.com/TKaeufer/Open_PIV_mapping |
Is the repository public? |
see my fork - I invited you https://github.com/alexlib/Open_PIV_mapping |
That repository is much different then my attempt which uses a meshed region of interest and for-loops to find the points. As soon as I get a decent internet connection, I'll hopefully get everything pushed to a fork or repository for everyone to see (including my spaghetti-coding skills :P). |
Just did some tests and your fork/repository is quite better and more robust then my implementation. I'll see if there is some enhancements/refactoring I can do. Does this repository allow for the calibration of vectors as a post-processing method? |
It is Theo work in progress, he have chosen to work in the image space. But it should work on vectors as well |
The image pair is from PIV Challenge 2014 case A (testing micro-PIV). |
To avoid major overhead with shared dictionaries, the files are stored in a temporary folder before loaded into the GUI and deleted. This makes multiprocessing as fast as the simple GUI and removes the need for a batch size. Is this method alright?
|
I am not quite sure about the step of saving to npz and then loading to hdf5 - could it be maybe stored already in hdf5, to save one conversion or loading/saving step? |
H5py doesn't directly support parallel writing, so it's this wierd work around or the other one based on a shared memory dictionary that is then loaded into h5py. I am still looking for better options through mpi4py, but so far, it isn't successful and complicates the installation process of the GUI. In my opinion, this issue is one of the few problems with h5py where other libraries (Ex: not using h5py like in the simple GUI) would be better. |
I understand. So there are two options: a) use mutiprocessing and RAM - to keep all the parallel results in memory, b) store every result by a separate worker to a temporary file and then combine them. |
take a look at can it help? it seems to have some solution and it's pip-installable. |
|
It would be great to incorporate the script into something like OpenPIV.tools or its own calibration file as some cameras (e.g. my raspberry pi controlled 1 Mp global shutter sensor) have quite a fisheye distortion and messes up the measurements. |
The subpixel function works for the original script, so I'll simply use the original script by Theo. |
Good idea. Move the discussion to openpiv-Python repo issues please |
Zarr is creating a file for each frame, so I'll have to figure out what I'm doing wrong here. It does allow multiprocessing though ;) |
Using npy files wasn't a smart decision. They save and load fast, but the individual file sizes can get up to 3 MB for 50,000 vectors. For large sessions, this uses up quite a bit of space before it is deleted. Zarr is still making a bunch of files and in a way, acts like the temporary npy files. I'll try mpi4py again for built in parallel with h5py. |
Using a batch system similar to the shared memory dictionary system, the results can be processed in parallel and loaded in serial. If we are to use this system, then Zarr might be a good file system to use as it operates in a very similar fashion with multiple linked files. |
It also allows for exporting the session in HDF5 and netCDF. |
I found that the temporary file system works best, so I'll keep it to now. It doesn't take any extra space on the hard drive. |
Here is the somewhat buggy h5py gui. |
It requires h5py as an extra dependency. |
To not pollute your GUI with features that cannot be merged (at least I wasn't able to due to my basic programming knowledge), I'm going to close this issue so I can focus more on your GUI. |
I also moved the h5py GUI to a new repository to eliminate accidentally pushing the wrong GUI to my fork of your GUI. I honestly like your GUI a little more because of its simplicity. |
Background/Issue
The current GUI stores data in separate files which can make it hard to do more thorough data processing. To combat this, an already suggested solution was to store all results in a single dictionary of the dataset and export the results in a manner the user deems sufficient. However, on large processing sessions (>60,000 images), the GUI can become quite slow especially on lower performing laptops. Furthermore, the performance of the GUI starts to decrease with these large sessions. This can cause a disruption to efficient work flows and an increase in glitches (mostly applies lower performing computers).
Proposed solution
After exploring different ways of storing huge amounts of data, H5Py was found to perform pretty well even on underperforming computers (e.g. my laptop 😢). When properly configured, most data is stored in the hard drive, leaving RAM mostly unused unlike dictionary style designs. Additionally, the structuring of an HDF5 file makes it very simple to load specific sections of data/results which has its advantages. Taking advantage of these features, the HDF5 file is structured like the following;
Possible downfalls
PS, I'm back 😁 (got medically discharged from an injury) and ready to relearn everything/hopefully not be so ill-informed on testing methods like I was back then -_-. Additionally, your inputs on using HDF5 or others for storage would be helpful for further research and designing.
The text was updated successfully, but these errors were encountered: