Allow local file caching to be disabled when appropriate #6

zaneselvans · 2022-04-07T01:19:39Z

Local file caching is via simplecache:: is hugely valuable when you have a lot of cheap disk and a slower net connection (WFH),but it's not necessarily appropriate in a cloud computing context (e.g. our JupyterHub or CI/CD) where the network is extremely fast, there are no data egress fees, and fast disk is more likely to be constrained.

If we are going to use our Intake data catalog as a primary means of accessing versioned, processed data, the user should be able to turn off caching when appropriate. Is this as easy as not setting PUDL_INTAKE_CACHE so there's no designated location for the cache? Or can it / should it be set explicitly in the arguments to the data source?

The text was updated successfully, but these errors were encountered:

@martindurant

With some pointers from @martindurant in [this issue](intake/intake-parquet#26) I got anonymous public access working, and caching can now be turned off when appropriate. Accessing the partitioned data is still very slow in a variety of contexts for reasons I don't understand. I also hit a snag attempting to create a consolidated external `_metadata` file to hopefully speed up access to the partitioned data so... not sure what to do there. The current Tox/pytest setup expects to find data locally, which won't work right now on GitHub. Need to set the tests up better for real world use, and less for exploring different catalog configurations. Closes #5, #6

zaneselvans · 2022-04-19T04:43:52Z

Fixed in 7fb38ff

zaneselvans added intake Intake data catalogs performance Make data go faster by using less memory, disk, network, compute, etc. Epic and removed Epic labels Apr 7, 2022

zaneselvans mentioned this issue Apr 7, 2022

EPA CEMS Intake Catalog catalyst-cooperative/pudl#1564

Open

15 tasks

zaneselvans changed the title ~~Ensure users can disable local file caching when appropriate~~ Allow local file caching to be disabled when appropriate Apr 11, 2022

zaneselvans mentioned this issue Apr 14, 2022

Allowing public catalog access and efficient file caching intake/intake-parquet#26

Closed

zaneselvans closed this as completed Apr 19, 2022

zaneselvans self-assigned this Apr 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow local file caching to be disabled when appropriate #6

Allow local file caching to be disabled when appropriate #6

zaneselvans commented Apr 7, 2022

zaneselvans commented Apr 19, 2022

Allow local file caching to be disabled when appropriate #6

Allow local file caching to be disabled when appropriate #6

Comments

zaneselvans commented Apr 7, 2022

zaneselvans commented Apr 19, 2022