# Images Transpose

Some early `images` sub-datasets are in shape `(N, 3, H, W)`, while we want them to be `(N, H, W, 3)`.

That's why I wrote this notebook.

In [65]:
import h5py

## Define the `view_dataset` function

In [66]:
def _print_hdf5(name, obj):
    indent = "  " * name.count("/")
    if isinstance(obj, h5py.Dataset):
        print(f"{indent}[Dataset] {name} shape={obj.shape} dtype={obj.dtype}")
    elif isinstance(obj, h5py.Group):
        print(f"{indent}[Group]   {name}")

def view_dataset(dataset_path):
    with h5py.File(dataset_path, "r") as f:
        f.visititems(_print_hdf5)

## Set parameters and load the dataset

In [67]:
# Path
dataset_path = "/Users/zeyuxie/Desktop/Ra/datasets/Ra_128_indexed_binned.h5"

# Load dataset
with h5py.File(dataset_path, "r") as f:
    images = f["images"][:]
view_dataset(dataset_path)

[Dataset] images shape=(300, 3, 128, 128) dtype=uint8
[Dataset] index_train shape=(150,) dtype=int64
[Dataset] index_valid shape=(150,) dtype=int64
[Dataset] labels shape=(300,) dtype=float64
[Dataset] types shape=(300,) dtype=int32


## Transpose the `images` sub-dataset and edit the dataset

In [68]:
images = images.transpose(0, 2, 3, 1)

# Edit dataset
with h5py.File(dataset_path, "r+") as f:
    del f["images"]
    f.create_dataset("images", data=images)
view_dataset(dataset_path)

[Dataset] images shape=(300, 128, 128, 3) dtype=uint8
[Dataset] index_train shape=(150,) dtype=int64
[Dataset] index_valid shape=(150,) dtype=int64
[Dataset] labels shape=(300,) dtype=float64
[Dataset] types shape=(300,) dtype=int32
