Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset is not opening #13

Closed
kanika02 opened this issue Sep 18, 2021 · 4 comments
Closed

Dataset is not opening #13

kanika02 opened this issue Sep 18, 2021 · 4 comments

Comments

@kanika02
Copy link

I was trying to open the .mdb dataset files, but I could not access and read them. I have MS access, and I tried other options too. Could you help me out?

@JingyeChen
Copy link
Member

Hello, the .lmdb files can not be directly accessed.
To visualize the images in .mdb,you could refer to the Class lmdbDataset_real at
scene-text-telescope/dataset/dataset.py

If you have other questions, feel free to leave an issue :D

@kanika02
Copy link
Author

I am not able to visualize the images. Can you give more information of class lmdbDataset_real at dataset.py .

@JingyeChen
Copy link
Member

JingyeChen commented Sep 22, 2021

import torch
from torch.utils.data import Dataset
from torch.utils.data import sampler
import torchvision.transforms as transforms
import lmdb
import six
import sys
import bisect
import warnings
from PIL import Image
import numpy as np
import string

def buf2PIL(txn, key, type='RGB'):
    imgbuf = txn.get(key)
    buf = six.BytesIO()
    buf.write(imgbuf)
    buf.seek(0)
    im = Image.open(buf).convert(type)
    return im

class lmdbDataset(Dataset):
    def __init__(self, root=None, voc_type='upper', max_len=100, test=False):
        super(lmdbDataset, self).__init__()
        self.env = lmdb.open(
            root,
            max_readers=1,
            readonly=True,
            lock=False,
            readahead=False,
            meminit=False)

        if not self.env:
            print('cannot creat lmdb from %s' % (root))
            sys.exit(0)

        with self.env.begin(write=False) as txn:
            nSamples = int(txn.get(b'num-samples'))
            self.nSamples = nSamples
        self.voc_type = voc_type
        self.max_len = max_len
        self.test = test

    def __len__(self):
        return self.nSamples

    def __getitem__(self, index):
        assert index <= len(self), 'index range error'
        index += 1
        txn = self.env.begin(write=False)
        label_key = b'label-%09d' % index
        word = str(txn.get(label_key).decode())
        img_HR_key = b'image_hr-%09d' % index  # 128*32
        img_lr_key = b'image_lr-%09d' % index  # 64*16
        try:
            img_HR = buf2PIL(txn, img_HR_key, 'RGB')
            img_lr = buf2PIL(txn, img_lr_key, 'RGB')
        except IOError or len(word) > self.max_len:
            return self[index + 1]
        # label_str = str_filt(word, self.voc_type)
        label_str = word
        return img_HR, img_lr, label_str

# Visulize the dataset
def check_dataset(root, index=None):
    dataset = lmdbDataset(root)
    length = len(dataset)
    if index is None:
        for i in range(length):
            image_hr, image_lr, label = dataset[i]

            image_hr = image_hr.resize((128,32))
            image_lr = image_lr.resize((128,32))

            print('****')
            display(image_hr)
            display(image_lr)
            print(label)
    else:
        image_hr, image_lr, label = dataset[index]
        display(image_hr)

if __name__ == '__main__':
    check_dataset('./FudanOCR/scene-text-telescope/dataset/mydata/test/easy',467)

Hello, you could follow this script to visualize the image in .mdb

@kanika02
Copy link
Author

Thank you so much, it was really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants