Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dusql to create a table of file info #9

Open
aidanheerdegen opened this issue Jun 18, 2019 · 4 comments
Open

Use dusql to create a table of file info #9

aidanheerdegen opened this issue Jun 18, 2019 · 4 comments

Comments

@aidanheerdegen
Copy link
Member

When nccompress times out and is re-run on the same directory it ends up trying to open every file to check if it is netCDF. This is slow and inefficient if it has already done that before.

Could use dusql to create a file info database, and create a table that stores the format information, and if it has already been compressed, so the work is not duplicated when it it is run again.

@ccarouge
Copy link
Member

Can it handle independent changes made to the directory well? I mean would nccompress need to run a dusql update each time it runs to make sure the database is up to date? How long would that take? How would the update keep the file info table in sync? Maybe those are trivial questions when dealing with databases but I don't know much about databases.

@aidanheerdegen
Copy link
Member Author

Those are good questions and I don't know all the answers myself. At the very least dusql can be run and the database updated but we'll still know if file modification times have changed since their status was determined last time it was run (if that information is saved in this other proposed table, which clearly, it should be).

@ScottWales
Copy link

We could add a minimum age to dusql.scan(), so that it only runs a full scan if the last scan was > 1 hour ago or whatever.

The scan takes however long it takes to os.walk() the directory tree, plus however long it takes to ID the file type for newly added files (file ID isn't currently implemented, but should be doable from looking at the first few bytes of the file)

@ScottWales
Copy link

See coecms/dusql#29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants