Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AL-1579] Auto htype #1370

Merged
merged 10 commits into from
Dec 23, 2021
Merged

[AL-1579] Auto htype #1370

merged 10 commits into from
Dec 23, 2021

Conversation

farizrahman4u
Copy link
Contributor

🚀 🚀 Pull Request

Checklist:

  • My code follows the style guidelines of this project and the Contributing document
  • I have commented my code, particularly in hard-to-understand areas
  • I have kept the coverage-rate up
  • I have performed a self-review of my own code and resolved any problems
  • I have checked to ensure there aren't any other open Pull Requests for the same change
  • I have described and made corresponding changes to the relevant documentation
  • New and existing unit tests pass locally with my changes

Changes

Detect json and text htypes automatically. https://activeloop.atlassian.net/browse/AL-1579

@CLAassistant
Copy link

CLAassistant commented Dec 2, 2021

CLA assistant check
All committers have signed the CLA.

@farizrahman4u farizrahman4u marked this pull request as ready for review December 7, 2021 13:23
@tatevikh tatevikh requested review from jraman, AbhinavTuli and aliubimov and removed request for AbhinavTuli December 9, 2021 23:52
@@ -811,3 +811,17 @@ def test_empty_extend(memory_ds):
ds.create_tensor("y")
ds.y.extend(np.zeros((len(ds), 3)))
assert len(ds) == 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test case for when set_htype is called and self.htype is not None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_htype is internal , analogous to set_dtype, that case shouldn't be hit from user space.

@@ -811,3 +811,17 @@ def test_empty_extend(memory_ds):
ds.create_tensor("y")
ds.y.extend(np.zeros((len(ds), 3)))
assert len(ds) == 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test case for when set_htype is called and self.length is > 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_htype is internal , analogous to set_dtype, that case shouldn't be hit from user space.

@@ -88,6 +80,38 @@ def set_dtype(self, dtype: np.dtype):

self.dtype = dtype.name

def set_htype(self, htype: str, **kwargs):
"""Should only be called once."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More documentation please.

  • If self.htype is present, it must be None.
  • If self.length is present, it must be zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstr is similar to that of set_dtype. This is an internal method and the conditions are explicit in the checks.

"""Should only be called once."""
ffw_tensor_meta(self)

if getattr(self, "htype", None) is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case when htype is not yet present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is hit by default when create_tensor is called.

f"Tensor meta already has a htype ({self.htype}). Incoming: {htype}."
)

if getattr(self, "length", 0) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case when length is not yet present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is hit by default when create_tensor is called.

@@ -37,6 +37,21 @@ def get_dtype(val: Union[np.ndarray, Sequence, Sample]) -> np.dtype:
raise TypeError(f"Cannot infer numpy dtype for {val}")


def get_htype(val: Union[np.ndarray, Sequence, Sample]) -> str:
if isinstance(val, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring.

@@ -37,6 +37,21 @@ def get_dtype(val: Union[np.ndarray, Sequence, Sample]) -> np.dtype:
raise TypeError(f"Cannot infer numpy dtype for {val}")


def get_htype(val: Union[np.ndarray, Sequence, Sample]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samples instead of val is easier to parse for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method is analogous to get_dtype.

if isinstance(val, np.ndarray):
return "generic"
types = set((map(type, val)))
if dict in types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't all types have to be dict? Or, what's the case when not all are dicts (please add test case)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added.

hub/util/casting.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Dec 23, 2021

Codecov Report

Merging #1370 (e36f021) into main (3ecaacb) will decrease coverage by 0.04%.
The diff coverage is 93.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1370      +/-   ##
==========================================
- Coverage   92.27%   92.22%   -0.05%     
==========================================
  Files         174      174              
  Lines       13850    13906      +56     
==========================================
+ Hits        12780    12825      +45     
- Misses       1070     1081      +11     
Flag Coverage Δ
unittests 92.22% <93.15%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
hub/core/serialize.py 97.89% <66.66%> (-1.02%) ⬇️
hub/core/meta/tensor_meta.py 86.66% <92.00%> (-0.18%) ⬇️
hub/util/casting.py 84.12% <92.85%> (+2.49%) ⬆️
hub/__init__.py 96.96% <100.00%> (ø)
hub/api/tests/test_api.py 100.00% <100.00%> (ø)
hub/core/chunk_engine.py 96.66% <100.00%> (+0.01%) ⬆️
hub/core/dataset/dataset.py 93.52% <100.00%> (ø)
hub/integrations/tests/test_pytorch_dataloader.py 95.69% <0.00%> (-2.16%) ⬇️
hub/integrations/pytorch/dataset.py 91.49% <0.00%> (-1.62%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d88131...e36f021. Read the comment docs.

@farizrahman4u farizrahman4u merged commit 83146b3 into main Dec 23, 2021
@farizrahman4u farizrahman4u deleted the fr_auto_htype branch December 23, 2021 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants