Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] pretty prints of objects #1439

Closed
2 tasks
davidbuniat opened this issue Jan 15, 2022 · 25 comments
Closed
2 tasks

[Feature] pretty prints of objects #1439

davidbuniat opened this issue Jan 15, 2022 · 25 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@davidbuniat
Copy link
Member

davidbuniat commented Jan 15, 2022

🚨🚨 Feature Request

If your feature will improve HUB

To explore the structure of a dataset it is convenient to have nicer and more informative prints of dataset objects and samples

Description of the possible solution

1) show ds

now

> ds
Dataset(path='hub://activeloop/abalone_full_dataset', tensors=['length', 'diameter', 'height', 'weight'])

Something along the lines would work (taken from SQLlite)

> ds.height
path: "hub://activeloop/abalone_full_dataset", samples:  1532596

tensor    htype        dtype    shape       compression
------    ------       ------   ------      -----------
length    image        uint8    256x256x3   jpeg
diameter  image        float32  512x512x3   zstd
height    image        float32  512x512x3   zstd
weight    class_label  int32    32          None

and in jupyter notebook shown as a table similar to pandas

2) show ds.tensor

now

> ds.height
Tensor(key='Length')

at least provide full information about tensor

> ds.height
Tensor(
    key='height', 
    htype='image', 
    dtype='uint8', 
    shape=(256, 256, 3), 
    sample_compression='jpeg'
)

or to make consistent with 1)

> ds.height
tensor    htype    dtype     shape       compression
------    ------   ------    ------      -----------
height    image    float32   512x512x3   zstd

2) show ds[0:5] sample

> ds[0:5]
    length    diameter     height     weight
    ------    --------     ------     ------
0      0.5    [[0.,...,0]] "sent.."      dog   
0      0.5    [[0.,...,0]] "text a"      dog   
0      0.5    [[0.,...,0]] "text b"      dog   

and in jupyter notebook visualize images (and other htypes)

Notes

  • Feel free to provide a better format for printing dataset, tensor and sample classes
  • Feel free to suggest other important classes/objects need to printed properly for exploring the structure
@davidbuniat davidbuniat added enhancement New feature or request good first issue Good for newcomers labels Jan 15, 2022
@davidbuniat davidbuniat changed the title [BUG] pretty prints of objects [Feature] pretty prints of objects Jan 15, 2022
@SaiNikhileshReddy
Copy link

I would like to take up this issue and I have a few additional ideas for this issue like a schema view for the dataset.

I have an idea wherein we just display Dtype, Htype, sample_compression under the name of the tensor of the schema to give greater insight into schema structure.

Example of above idea:

image

if ds.schema(verbose=False) then

./coco-hub
├── images
├── boxes

if ds.schema(verbose=True) then

./coco-hub
├── images
    image | jpg | dynamic
├── boxes
    bbox | lz4 | dynamic

@davidbuniat
Copy link
Member Author

@SaiNikhileshReddy looks great! lets get started!

@neel2299
Copy link
Contributor

neel2299 commented Mar 7, 2022

Hello!! I see that this issue has already been assigned. Can I still try to work on it? I am new to open source, this issue seems like a nice place to start!!

@SaiNikhileshReddy
Copy link

@neel2299 I don't have any problem with it. I'm partially done. Currently, I'm trying to integrate my solution into the hub.

@neel2299
Copy link
Contributor

neel2299 commented Mar 7, 2022

Great! I will start right away :)

@neel2299
Copy link
Contributor

neel2299 commented Mar 8, 2022

Which branch should I make the PR for?

@mikayelh
Copy link
Collaborator

mikayelh commented Mar 8, 2022

@tatevikh would advise you best here, @neel2299 , thanks a lot for the contribution! In the meantime, you can pick another issue here.

@tatevikh
Copy link
Collaborator

tatevikh commented Mar 8, 2022

Hi @neel2299 ! Feel free to make the PR for the main branch. Thanks for your interest in Hub.

@FayazRahman
Copy link
Contributor

@SaiNikhileshReddy @neel2299 Any updates on this issue? Do tell me if you need any help!

@SaiNikhileshReddy
Copy link

@FayazRahman

I would be pushing the code for pretty prints in 2-3 days.

I was working on dataset notebooks in hub/examples repo.

@SaiNikhileshReddy
Copy link

SaiNikhileshReddy commented Mar 19, 2022

@FayazRahman I have got an error running hub source files (not modified).

I have excuted this command : pytest -x --local .
image

Should this be ignored?

@neel2299
Copy link
Contributor

neel2299 commented Mar 19, 2022

@FayazRahman Thank you for giving a helping hand! Much needed... Until now I was running only the tests that were in the core/tests. Thanks to @SaiNikhileshReddy 's post I noticed that we had other tests. Can you please give my PR a look and point if I am in the right direction? I will be changing the code a bit according to some edge cases the tests point to.

My main concern is if the code in the str method is getting too cluttered. link to the PR

@neel2299
Copy link
Contributor

@SaiNikhileshReddy I googled and found that the convention was to comment "# type: ignore" when its not mandatory. Its also written in our testing scripts. I think it would be nice to check where the error happened and if there is "# type: ignore" mentioned there. Though someone who is experienced would answer best.

image

@SaiNikhileshReddy
Copy link

Thanks @neel2299 for sharing this. @FayazRahman has mentioned that, it is a known issue and can be ignored.

@SaiNikhileshReddy
Copy link

SaiNikhileshReddy commented Mar 23, 2022

@davidbuniat @FayazRahman @mikayelh
I have been working on generating below output. I would push the code for the both table and schema soon.

Any updates on below outputs will help solve the issue quickly.

Command Line Output:
image

Jupyter Notebook Output:
image

@neel2299
Copy link
Contributor

The formatting is really nice!!
Btw seeing you too have not included compression type(I had trouble too) in the table it is in ds.tensor.meta.sample_compression
When a tensor is made the compression type is stored in meta, so its there.

@mikayelh
Copy link
Collaborator

wow, good job @SaiNikhileshReddy , pretty exciting!

@SaiNikhileshReddy
Copy link

@davidbuniat @mikayelh @FayazRahman
Below are the generated schema for the coco-train hub dataset.

For verbose = False;
image

For verbose = True;
image

@mikayelh
Copy link
Collaborator

can we replace the word "verbose" with "detailed"? "verbose" is a little advanced vocab, could be not very clear to english as a second language speakers. :)
I defer to @istranic regarding the rest.

@davidbuniat
Copy link
Member Author

davidbuniat commented Mar 29, 2022

@mikayelh verbose is mostly used for detail level logs, so I believe it is fine here though context is debug logs.

@SaiNikhileshReddy Other than it looks great, I believe for dynamic shapes there stationary dimensions or the number of dimensions is important so having to show Nones would be better instead of Dynamic in verbose mode.

Also instead of having a separate API would be great to embed into ds object.

@SaiNikhileshReddy
Copy link

SaiNikhileshReddy commented Mar 29, 2022

@davidbuniat Can you suggest me on how to modify the shapes values if they are different across the images?

def tensor_info(tensor, full_shape=False):
    # Htype, dtype, compression, shape
    htype = tensor.htype
    dtype = tensor.dtype
    shape = tensor.shape
    compression = None
    sample_compression = tensor.meta.sample_compression
    chunk_compression = tensor.meta.chunk_compression

    if (sample_compression != None or chunk_compression != None):
        if (sample_compression != None):
            compression = sample_compression
        else:
            compression = chunk_compression
    
    if full_shape:
        shape = (len(tensor), tensor.meta.min_shape, tensor.meta.max_shape)
    else:
        shape = 'Dynamic'

    if compression is None:
        compression = tensor.meta.chunk_compression

    return [htype, dtype, compression, str(shape)]

@SaiNikhileshReddy
Copy link

@davidbuniat. Table and Schema use tensor_info() to retrieve info about the tensor. Next step would be incorporate them into hub api without any function call except for schema.

Does hub support nested groups?

@davidbuniat
Copy link
Member Author

Regarding shapes you have two options easy, just show tensor.shape which includes Nones or slightly more sophisticated min_shape[i]:max_shape[i] for all i and if they are the same then only show one integer.

Sounds great! btw much better to have discussion on a specific PR rather than issues here.

Yes, seems hub supports nested groups.

@SaiNikhileshReddy
Copy link

Sure @davidbuniat. I'll integrate the code in hub and raise these doubts in that PR. Thanks for clarifying and sharing feedback!

tatevikh pushed a commit that referenced this issue Apr 7, 2022
…1439. (#1543)

* done it again

* attribute name

* attribute name

* making the changes requested

* black formatting

* Handling None Types

* made requested changes

* minor changes

* black

* adding tests

* black

* removing comments

* summary test

* removing unneeded import

* added test for local path
@farizrahman4u
Copy link
Contributor

Closed by #1543

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants