Dataset is not opening #13

kanika02 · 2021-09-18T06:58:14Z

I was trying to open the .mdb dataset files, but I could not access and read them. I have MS access, and I tried other options too. Could you help me out?

JingyeChen · 2021-09-18T14:35:07Z

Hello, the .lmdb files can not be directly accessed.
To visualize the images in .mdb，you could refer to the Class lmdbDataset_real at
scene-text-telescope/dataset/dataset.py

If you have other questions, feel free to leave an issue :D

kanika02 · 2021-09-21T09:41:36Z

I am not able to visualize the images. Can you give more information of class lmdbDataset_real at dataset.py .

JingyeChen · 2021-09-22T02:12:20Z

import torch
from torch.utils.data import Dataset
from torch.utils.data import sampler
import torchvision.transforms as transforms
import lmdb
import six
import sys
import bisect
import warnings
from PIL import Image
import numpy as np
import string

def buf2PIL(txn, key, type='RGB'):
    imgbuf = txn.get(key)
    buf = six.BytesIO()
    buf.write(imgbuf)
    buf.seek(0)
    im = Image.open(buf).convert(type)
    return im

class lmdbDataset(Dataset):
    def __init__(self, root=None, voc_type='upper', max_len=100, test=False):
        super(lmdbDataset, self).__init__()
        self.env = lmdb.open(
            root,
            max_readers=1,
            readonly=True,
            lock=False,
            readahead=False,
            meminit=False)

        if not self.env:
            print('cannot creat lmdb from %s' % (root))
            sys.exit(0)

        with self.env.begin(write=False) as txn:
            nSamples = int(txn.get(b'num-samples'))
            self.nSamples = nSamples
        self.voc_type = voc_type
        self.max_len = max_len
        self.test = test

    def __len__(self):
        return self.nSamples

    def __getitem__(self, index):
        assert index <= len(self), 'index range error'
        index += 1
        txn = self.env.begin(write=False)
        label_key = b'label-%09d' % index
        word = str(txn.get(label_key).decode())
        img_HR_key = b'image_hr-%09d' % index  # 128*32
        img_lr_key = b'image_lr-%09d' % index  # 64*16
        try:
            img_HR = buf2PIL(txn, img_HR_key, 'RGB')
            img_lr = buf2PIL(txn, img_lr_key, 'RGB')
        except IOError or len(word) > self.max_len:
            return self[index + 1]
        # label_str = str_filt(word, self.voc_type)
        label_str = word
        return img_HR, img_lr, label_str

# Visulize the dataset
def check_dataset(root, index=None):
    dataset = lmdbDataset(root)
    length = len(dataset)
    if index is None:
        for i in range(length):
            image_hr, image_lr, label = dataset[i]

            image_hr = image_hr.resize((128,32))
            image_lr = image_lr.resize((128,32))

            print('****')
            display(image_hr)
            display(image_lr)
            print(label)
    else:
        image_hr, image_lr, label = dataset[index]
        display(image_hr)

if __name__ == '__main__':
    check_dataset('./FudanOCR/scene-text-telescope/dataset/mydata/test/easy',467)

Hello, you could follow this script to visualize the image in .mdb

kanika02 · 2021-09-22T06:08:42Z

Thank you so much, it was really helpful.

kanika02 closed this as completed Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset is not opening #13

Dataset is not opening #13

kanika02 commented Sep 18, 2021

JingyeChen commented Sep 18, 2021

kanika02 commented Sep 21, 2021

JingyeChen commented Sep 22, 2021 •

edited

kanika02 commented Sep 22, 2021

Dataset is not opening #13

Dataset is not opening #13

Comments

kanika02 commented Sep 18, 2021

JingyeChen commented Sep 18, 2021

kanika02 commented Sep 21, 2021

JingyeChen commented Sep 22, 2021 • edited

kanika02 commented Sep 22, 2021

JingyeChen commented Sep 22, 2021 •

edited