[Feature] pretty prints of objects #1439

davidbuniat · 2022-01-15T19:45:15Z

🚨🚨 Feature Request

If your feature will improve `HUB`

To explore the structure of a dataset it is convenient to have nicer and more informative prints of dataset objects and samples

Description of the possible solution

1) show ds

now

> ds
Dataset(path='hub://activeloop/abalone_full_dataset', tensors=['length', 'diameter', 'height', 'weight'])

Something along the lines would work (taken from SQLlite)

> ds.height
path: "hub://activeloop/abalone_full_dataset", samples:  1532596

tensor    htype        dtype    shape       compression
------    ------       ------   ------      -----------
length    image        uint8    256x256x3   jpeg
diameter  image        float32  512x512x3   zstd
height    image        float32  512x512x3   zstd
weight    class_label  int32    32          None

and in jupyter notebook shown as a table similar to pandas

2) show ds.tensor

now

> ds.height
Tensor(key='Length')

at least provide full information about tensor

> ds.height
Tensor(
    key='height', 
    htype='image', 
    dtype='uint8', 
    shape=(256, 256, 3), 
    sample_compression='jpeg'
)

or to make consistent with 1)

> ds.height
tensor    htype    dtype     shape       compression
------    ------   ------    ------      -----------
height    image    float32   512x512x3   zstd

2) show ds[0:5] sample

> ds[0:5]
    length    diameter     height     weight
    ------    --------     ------     ------
0      0.5    [[0.,...,0]] "sent.."      dog   
0      0.5    [[0.,...,0]] "text a"      dog   
0      0.5    [[0.,...,0]] "text b"      dog

and in jupyter notebook visualize images (and other htypes)

Notes

Feel free to provide a better format for printing dataset, tensor and sample classes
Feel free to suggest other important classes/objects need to printed properly for exploring the structure

The text was updated successfully, but these errors were encountered:

sainikhileshreddy · 2022-02-13T18:33:26Z

I would like to take up this issue and I have a few additional ideas for this issue like a schema view for the dataset.

I have an idea wherein we just display Dtype, Htype, sample_compression under the name of the tensor of the schema to give greater insight into schema structure.

Example of above idea:

if ds.schema(verbose=False) then

./coco-hub
├── images
├── boxes

if ds.schema(verbose=True) then

./coco-hub
├── images
    image | jpg | dynamic
├── boxes
    bbox | lz4 | dynamic

davidbuniat · 2022-02-15T06:25:46Z

@sainikhileshreddy looks great! lets get started!

neel2299 · 2022-03-07T16:49:38Z

Hello!! I see that this issue has already been assigned. Can I still try to work on it? I am new to open source, this issue seems like a nice place to start!!

sainikhileshreddy · 2022-03-07T16:53:16Z

@neel2299 I don't have any problem with it. I'm partially done. Currently, I'm trying to integrate my solution into the hub.

neel2299 · 2022-03-07T17:48:47Z

Great! I will start right away :)

neel2299 · 2022-03-08T16:22:53Z

Which branch should I make the PR for?

mikayelh · 2022-03-08T16:46:13Z

@tatevikh would advise you best here, @neel2299 , thanks a lot for the contribution! In the meantime, you can pick another issue here.

tatevikh · 2022-03-08T23:31:43Z

Hi @neel2299 ! Feel free to make the PR for the main branch. Thanks for your interest in Hub.

FayazRahman · 2022-03-16T07:10:39Z

@sainikhileshreddy @neel2299 Any updates on this issue? Do tell me if you need any help!

sainikhileshreddy · 2022-03-16T07:25:32Z

@FayazRahman

I would be pushing the code for pretty prints in 2-3 days.

I was working on dataset notebooks in hub/examples repo.

sainikhileshreddy · 2022-03-19T02:06:37Z

@FayazRahman I have got an error running hub source files (not modified).

I have excuted this command : pytest -x --local .

Should this be ignored?

neel2299 · 2022-03-19T16:06:55Z

@FayazRahman Thank you for giving a helping hand! Much needed... Until now I was running only the tests that were in the core/tests. Thanks to @sainikhileshreddy 's post I noticed that we had other tests. Can you please give my PR a look and point if I am in the right direction? I will be changing the code a bit according to some edge cases the tests point to.

My main concern is if the code in the str method is getting too cluttered. link to the PR

neel2299 · 2022-03-19T16:21:29Z

@sainikhileshreddy I googled and found that the convention was to comment "# type: ignore" when its not mandatory. Its also written in our testing scripts. I think it would be nice to check where the error happened and if there is "# type: ignore" mentioned there. Though someone who is experienced would answer best.

sainikhileshreddy · 2022-03-19T17:48:54Z

Thanks @neel2299 for sharing this. @FayazRahman has mentioned that, it is a known issue and can be ignored.

sainikhileshreddy · 2022-03-23T12:49:49Z

@davidbuniat @FayazRahman @mikayelh
I have been working on generating below output. I would push the code for the both table and schema soon.

Any updates on below outputs will help solve the issue quickly.

Command Line Output:

Jupyter Notebook Output:

neel2299 · 2022-03-23T20:14:55Z

The formatting is really nice!!
Btw seeing you too have not included compression type(I had trouble too) in the table it is in ds.tensor.meta.sample_compression
When a tensor is made the compression type is stored in meta, so its there.

mikayelh · 2022-03-23T21:16:21Z

wow, good job @sainikhileshreddy , pretty exciting!

sainikhileshreddy · 2022-03-29T19:43:55Z

@davidbuniat @mikayelh @FayazRahman
Below are the generated schema for the coco-train hub dataset.

For verbose = False;

For verbose = True;

mikayelh · 2022-03-29T20:24:55Z

can we replace the word "verbose" with "detailed"? "verbose" is a little advanced vocab, could be not very clear to english as a second language speakers. :)
I defer to @istranic regarding the rest.

davidbuniat · 2022-03-29T20:28:36Z

@mikayelh verbose is mostly used for detail level logs, so I believe it is fine here though context is debug logs.

@sainikhileshreddy Other than it looks great, I believe for dynamic shapes there stationary dimensions or the number of dimensions is important so having to show Nones would be better instead of Dynamic in verbose mode.

Also instead of having a separate API would be great to embed into ds object.

sainikhileshreddy · 2022-03-29T20:30:54Z

@davidbuniat Can you suggest me on how to modify the shapes values if they are different across the images?

def tensor_info(tensor, full_shape=False):
    # Htype, dtype, compression, shape
    htype = tensor.htype
    dtype = tensor.dtype
    shape = tensor.shape
    compression = None
    sample_compression = tensor.meta.sample_compression
    chunk_compression = tensor.meta.chunk_compression

    if (sample_compression != None or chunk_compression != None):
        if (sample_compression != None):
            compression = sample_compression
        else:
            compression = chunk_compression
    
    if full_shape:
        shape = (len(tensor), tensor.meta.min_shape, tensor.meta.max_shape)
    else:
        shape = 'Dynamic'

    if compression is None:
        compression = tensor.meta.chunk_compression

    return [htype, dtype, compression, str(shape)]

sainikhileshreddy · 2022-03-29T20:34:17Z

@davidbuniat. Table and Schema use tensor_info() to retrieve info about the tensor. Next step would be incorporate them into hub api without any function call except for schema.

Does hub support nested groups?

davidbuniat · 2022-03-29T20:41:15Z

Regarding shapes you have two options easy, just show tensor.shape which includes Nones or slightly more sophisticated min_shape[i]:max_shape[i] for all i and if they are the same then only show one integer.

Sounds great! btw much better to have discussion on a specific PR rather than issues here.

Yes, seems hub supports nested groups.

sainikhileshreddy · 2022-03-29T20:49:47Z

Sure @davidbuniat. I'll integrate the code in hub and raise these doubts in that PR. Thanks for clarifying and sharing feedback!

…1439. (#1543) * done it again * attribute name * attribute name * making the changes requested * black formatting * Handling None Types * made requested changes * minor changes * black * adding tests * black * removing comments * summary test * removing unneeded import * added test for local path

farizrahman4u · 2022-04-11T17:48:11Z

Closed by #1543

davidbuniat added enhancement New feature or request good first issue Good for newcomers labels Jan 15, 2022

davidbuniat changed the title ~~[BUG] pretty prints of objects~~ [Feature] pretty prints of objects Jan 15, 2022

mikayelh assigned sainikhileshreddy Feb 13, 2022

tatevikh assigned neel2299 Mar 7, 2022

sainikhileshreddy mentioned this issue Apr 11, 2022

Added prettytable package code with licence #1589

Closed

7 tasks

farizrahman4u closed this as completed Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] pretty prints of objects #1439

[Feature] pretty prints of objects #1439

davidbuniat commented Jan 15, 2022 •

edited by dhiganthrao

Loading

sainikhileshreddy commented Feb 13, 2022

davidbuniat commented Feb 15, 2022

neel2299 commented Mar 7, 2022 •

edited

Loading

sainikhileshreddy commented Mar 7, 2022

neel2299 commented Mar 7, 2022

neel2299 commented Mar 8, 2022

mikayelh commented Mar 8, 2022

tatevikh commented Mar 8, 2022

FayazRahman commented Mar 16, 2022

sainikhileshreddy commented Mar 16, 2022

sainikhileshreddy commented Mar 19, 2022 •

edited

Loading

neel2299 commented Mar 19, 2022 •

edited

Loading

neel2299 commented Mar 19, 2022

sainikhileshreddy commented Mar 19, 2022

sainikhileshreddy commented Mar 23, 2022 •

edited

Loading

neel2299 commented Mar 23, 2022

mikayelh commented Mar 23, 2022

sainikhileshreddy commented Mar 29, 2022

mikayelh commented Mar 29, 2022

davidbuniat commented Mar 29, 2022 •

edited

Loading

sainikhileshreddy commented Mar 29, 2022 •

edited

Loading

sainikhileshreddy commented Mar 29, 2022

davidbuniat commented Mar 29, 2022

sainikhileshreddy commented Mar 29, 2022

farizrahman4u commented Apr 11, 2022

[Feature] pretty prints of objects #1439

[Feature] pretty prints of objects #1439

Comments

davidbuniat commented Jan 15, 2022 • edited by dhiganthrao Loading

🚨🚨 Feature Request

If your feature will improve HUB

Description of the possible solution

1) show ds

2) show ds.tensor

2) show ds[0:5] sample

Notes

sainikhileshreddy commented Feb 13, 2022

davidbuniat commented Feb 15, 2022

neel2299 commented Mar 7, 2022 • edited Loading

sainikhileshreddy commented Mar 7, 2022

neel2299 commented Mar 7, 2022

neel2299 commented Mar 8, 2022

mikayelh commented Mar 8, 2022

tatevikh commented Mar 8, 2022

FayazRahman commented Mar 16, 2022

sainikhileshreddy commented Mar 16, 2022

sainikhileshreddy commented Mar 19, 2022 • edited Loading

neel2299 commented Mar 19, 2022 • edited Loading

neel2299 commented Mar 19, 2022

sainikhileshreddy commented Mar 19, 2022

sainikhileshreddy commented Mar 23, 2022 • edited Loading

neel2299 commented Mar 23, 2022

mikayelh commented Mar 23, 2022

sainikhileshreddy commented Mar 29, 2022

mikayelh commented Mar 29, 2022

davidbuniat commented Mar 29, 2022 • edited Loading

sainikhileshreddy commented Mar 29, 2022 • edited Loading

sainikhileshreddy commented Mar 29, 2022

davidbuniat commented Mar 29, 2022

sainikhileshreddy commented Mar 29, 2022

farizrahman4u commented Apr 11, 2022

davidbuniat commented Jan 15, 2022 •

edited by dhiganthrao

Loading

If your feature will improve `HUB`

neel2299 commented Mar 7, 2022 •

edited

Loading

sainikhileshreddy commented Mar 19, 2022 •

edited

Loading

neel2299 commented Mar 19, 2022 •

edited

Loading

sainikhileshreddy commented Mar 23, 2022 •

edited

Loading

davidbuniat commented Mar 29, 2022 •

edited

Loading

sainikhileshreddy commented Mar 29, 2022 •

edited

Loading