-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle data with samples #19
Comments
This speaks to a more general discussion probably about what should be in a filestore and what should be in databrokers. For my money this is less a philosophical issue and more a file-size issue. If the file-size is greater than XXXX it should go into filestore? It may also be a searchability issue though. We don't want to search through large files of data for metadata, but we don't want to lose large amounts of metadata into large datafiles that are not in databroker. What are your thoughts? the file-size limit may be the simplest thing. |
on a similar topic, wherever the x-ray data go, when we find out later what the x-ray wavelength was, how can we then associate it? mutable databroker? non-mutable but some kind of event stream that zips the info together? |
I am talking stricktly about the numerical array data. I presume that we'd parse any metadata in those files out into a dict somewhere? To you second post: |
I am not sure the best way forward. I thought about parsing out the numerical array data, but then we need tools to do that reliably, which is ok when people are using known file formats but could be a big overhead to maintain. The reason it is an important question is that if we are generating thousands of processed PDFs, FQ's etc. etc., when we decide to store rather than recompute them, do we parse out those arrays to a filestore and propapate a token, or do we just store the arrays in databroker.....I don't know the answer. the current issue is just forcing our hand to make this decision I guess.
The answers to 2 and 3 will depend on performance I guess |
In the Pro category we should add:
The file format issue is a problem no matter which way we turn. If we are going to store data we will either a) need to parse it on the way into some uniform storage method (filestore, hdf5, filestore+hdf5, raw json, etc.) I would say the definition of overkill is "creating a complex solution where a simple one works just fine?" 😄 |
Usually when we are taking data it gets put into a databroker. The databroker takes care of storing large data sets. However, we may get data with the samples (eg XRD patterns taken on a lab source) which is not in the databroker. How should we handle this?
The way I see this there are two options (although there may be more):
The text was updated successfully, but these errors were encountered: