New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset/tensor info
alongside meta
#1066
Conversation
hub/constants.py
Outdated
# info is 100% optional user-defined information | ||
DATASET_INFO_FILENAME = "dataset_info.json" | ||
TENSOR_INFO_FILENAME = "tensor_info.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 additional small files -- this means that in total we will have the following small files:
- dataset meta/dataset info (2)
- tensor info/tensor info (2 * num_tensors)
- chunk_ids (1 * num_tensors)
for a dataset with 2 tensors, there will be 8 small files.
for a dataset with 4 tensors there will be 14 small files.
the more tensors we add, the more small files they will contain. this isn't a problem now, but we should keep it in mind as we proceed.
@@ -108,8 +108,10 @@ def update_headers(self, incoming_num_bytes: int, sample_shape: Tuple[int]): | |||
self.shapes_encoder.add_shape(sample_shape, 1) | |||
self.byte_positions_encoder.add_byte_position(num_bytes_per_sample, 1) | |||
|
|||
def __len__(self): | |||
@property | |||
def nbytes(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced len
with nbytes
for all cachables so that cachable classes can have their len
computed without returning the num bytes.
@@ -100,6 +100,7 @@ def local_path(request): | |||
return | |||
|
|||
path = _get_storage_path(request, LOCAL) | |||
LocalProvider(path).clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests will fail if --keep-storage
is called on the previous test run. this fixes it
Codecov Report
@@ Coverage Diff @@
## main #1066 +/- ##
==========================================
+ Coverage 89.98% 90.23% +0.25%
==========================================
Files 95 98 +3
Lines 4244 4436 +192
==========================================
+ Hits 3819 4003 +184
- Misses 425 433 +8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes. Should be good to merge once addressed.
🚀 🚀 Pull Request
info vs meta
Allows the user to do:
no need for
with
ords.flush
. although, if a user does make changes to the items without callingds.update
, they will not be persistent. to do this, they can still callds.info.update()
.if they make changes inside a
with
block, it will still flush and persist their changes.What's new?
added a new
CallbackCachable
class that is based offCachable
. This is only used by theInfo
class (also new) and exists in the API only.we may get rid of
Cachable
andCallbackCachable
in the future, for now while reviewing i will not be removing cachables in this PR.Info
is a purely API class. this also means that callbacks are OK because they will be used infrequently.Checklist:
coverage-rate
upChanges