Manage temp tensor files in memory rather than sending them to storage #2819

nvoxland-al · 2024-04-04T19:54:40Z

🚀 🚀 Pull Request

Bug fix (non-breaking change which fixes expected existing functionality)
Enhancement/New feature (adds functionality without impacting existing logic)
Breaking change (fix or feature that would cause existing functionality to change)

With a large number of temp tensors, the on-disk metadata management gets time consuming. This PR avoids the overhead by keeping them in-memory.

Does not attempt to limit the temp tensor cache, but they are currently only used for class_labels which will not be large amounts of data

nvoxland-al · 2024-04-04T19:55:11Z

Currently does not work with scheduler=processed. Going to get feedback before looking at handling that better.

codecov · 2024-04-04T21:03:54Z

Attention: Patch coverage is 96.03175% with 5 lines in your changes are missing coverage. Please review.

Files	Patch %	Lines
deeplake/core/storage/provider.py	94.44%	3 Missing ⚠️
deeplake/core/storage/local.py	92.30%	1 Missing ⚠️
deeplake/core/storage/lru_cache.py	90.90%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

sonarcloud · 2024-04-12T01:23:43Z

Manage temp tensor files in memory rather than sending them to storage

694a264

nvoxland-al marked this pull request as ready for review April 4, 2024 19:59

nvoxland added 5 commits April 4, 2024 18:48

Merge remote-tracking branch 'origin/main' into in_memory_temp_tensors

a91448d

Merge branch 'refs/heads/main' into in_memory_temp_tensors

f9ba8b5

Support processed scheduler with in-memory temp_tensors

a1b1b9f

Fixed issue from merge

bfea813

Formatting fixes

71fbed0

nvoxland-al marked this pull request as draft April 19, 2024 13:42