# Dataset Explorer

To work with the diffusionDB and imagenet datasets, both of which are available on Hugging Face, follow these steps:

1. Install the requirements listed in requirements.txt.
2. Create or log in to your Hugging Face account.
3. Generate an access token.
4. Type "huggingface-cli login" in your terminal.
5. Paste your access token when prompted. Note that you will need to log in via your machine's command line interface to access the imagenet dataset.
6. Accept the terms and conditions on the imagenet dataset Hugging Face site: https://huggingface.co/datasets/imagenet-1k.
7. (Optional) Visit the diffusionDB site on Hugging Face: https://huggingface.co/datasets/poloclub/diffusiondb.


In [1]:
import datasets
from tqdm import tqdm

# # loader builders to check if setup is correct
# diffusion_db_builder = datasets.load_dataset_builder("poloclub/diffusiondb", "2m_all")
# image_net_builder = datasets.load_dataset_builder("imagenet-1k")

In [2]:
# Download the datasets only 100 images for testing
image_net_dataset = datasets.load_dataset("imagenet-1k", streaming=True)
diffusion_db_dataset = datasets.load_dataset("poloclub/diffusiondb", "2m_all", streaming=True)

In [3]:
# Download the first 10 images from the ImageNet dataset
image_net_images = []
for i, example in tqdm(enumerate(image_net_dataset['train'])):
    if i == 10:
        break
    image_net_images.append(example)

# Download the first 10 images from the DiffusionDB dataset
diffusion_db_images = []
for i, example in tqdm(enumerate(diffusion_db_dataset['train'])):
    if i == 10:
        break
    diffusion_db_images.append(example)

# Print the number of images downloaded for each dataset
print(f"Number of images downloaded from ImageNet: {len(image_net_images)}")
print(f"Number of images downloaded from DiffusionDB: {len(diffusion_db_images)}")

10it [00:01,  9.17it/s]
10it [00:34,  3.40s/it]

Number of images downloaded from ImageNet: 10
Number of images downloaded from DiffusionDB: 10





In [4]:
image_net_images

[{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=817x363>,
  'label': 726},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x491>,
  'label': 917},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x375>,
  'label': 13},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x375>,
  'label': 939},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x375>,
  'label': 6},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x333>,
  'label': 983},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=221x500>,
  'label': 655},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x333>,
  'label': 579},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=288x173>,
  'label': 702},
 {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x370>,
  'label': 845}]