Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spool with zero patches for binary data #381

Closed
ahmadtourei opened this issue May 29, 2024 · 3 comments
Closed

spool with zero patches for binary data #381

ahmadtourei opened this issue May 29, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@ahmadtourei
Copy link
Collaborator

ahmadtourei commented May 29, 2024

Description

Although indexing with sp = dc.spool(data_path).update() continues to 100% and creates the index file in the data directory (verified by index_path_1==index_path_2==data_path=="/mnt/DAS/data_1" as below), when printing the spool (sp), it surprisingly indexes it again, and results in a zero size spool:

DASCore DirectorySpool 🧵 (0 Patches) Path: /mnt/DAS/data_1

index_path_1 = dc.spool(data_path, index_path=data_path).indexer.index_path
index_path_2 = dc.spool(data_path).indexer.index_path

Data is in binary ".raw" format. Each data directory has an XML file with metadata. I see the BinaryReader class in dascore.io but I'm unsure if we currently support this raw format with XML metadata.

Example

Expected behavior

Versions

  • OS [e.g. Ubuntu 20.04]: Ubuntu 22.04.3 2023.10.17 LTS
  • DasCore Version [e.g. 0.0.5]: 0.1.1
  • Python Version [e.g. 3.10]: 3.12.3
@ahmadtourei ahmadtourei added the bug Something isn't working label May 29, 2024
@d-chambers
Copy link
Contributor

Interesting.

Can you check if the index actually exists? spool.index.index_path.exists().

I think what is happening is DASCore can't read any of the files in the directory, so it goes through all of them (hence the progress bar) but ends up not creating an index because it doesn't have any recognizable contents. Then when you print the spool it sees there is no index file and tries to index again.

We don't currently support this file format, but perhaps we could. We don't yet support reading formats which have multiple files but Madagascar does something similar so it may be worth looking into. Is there a spec/example file you can share?

The BinaryReader isn't for a specific format, its just a way that FiberIO subclasses tell DASCore they need to read the file in binary mode (e.g., open(data_path, 'rb')) as oppose to using pytables or h5py.

@ahmadtourei
Copy link
Collaborator Author

Can you check if the index actually exists?

Yes, it returns True. However, I cannot see it in the directory using "ls -a" in the terminal.

I'll update you about sharing an example file tomorrow. Thanks!

@d-chambers
Copy link
Contributor

closed by #384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants