refactor fips filtering mechanics #338

knaaptime · 2022-06-30T18:45:23Z

currently CI tests are failing on windows because the datasets are too large to hold in memory for the CI-provisioned VMs. It's not a bug, since things are working as intended, but the same failure will happen to anyone on a memory-constrained machine (or, say, Binder).

The current design of the DataStore class is basically brute-force. When you ask for a dataset, it will load the whole thing into memory from a parquet file (either local or remote), then filter it down to the subset you require (using the get_* functions). That was by design at first, but now there are good filtering options that can be passsed in the pandas/geopandas read_parquet functions that make it possible to do the subsetting during the file i/o, so that only the necessary data gets loaded into memory. This is a lot more efficient but the filtering syntax is a bit cumbersome, so it will require some serious refactoring

I've got some working code that takes a better approach, but will still need to road test it for awhile

The text was updated successfully, but these errors were encountered:

knaaptime self-assigned this Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor fips filtering mechanics #338

refactor fips filtering mechanics #338

knaaptime commented Jun 30, 2022

refactor fips filtering mechanics #338

refactor fips filtering mechanics #338

Comments

knaaptime commented Jun 30, 2022