# Filtering and Organizing Selected ImageNet Classes

This notebook processes and organizes a selection of classes from the ImageNet dataset, saving images from training, validation, and test splits into class-specific folders. This setup facilitates efficient access to targeted data for analysis or further processing.

---

## Library Imports and Authentication

First, we import the necessary libraries and authenticate with Hugging Face to enable dataset access.

In [1]:
# Import necessary libraries and login to Hugging Face
from huggingface_hub import login
from datasets import load_dataset
import os

# Login to Hugging Face to access datasets
login(token="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")

  from .autonotebook import tqdm as notebook_tqdm


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\User\.cache\huggingface\token
Login successful
here


---

## Specifying Target Classes and Setting Up Directories

The next step involves specifying the target classes by their ImageNet IDs. These will be saved in a root directory, with each class organized in a dedicated folder. 

In [2]:
# Define the target classes and their corresponding ImageNet class IDs
target_classes = {
    "butterfly": 321,
    "panda": 388,          
    "parrot": 88,
    "pomeranian": 259,
    "goldfish": 1,
    "elephant": 101,
    "monkey": 381,
    "persian_cat": 283,
    "penguin": 145,
    "red_panda": 387,
}

# Create a root directory to save class-specific folders
output_dir = "imagenet_selected_raw_classes"
os.makedirs(output_dir, exist_ok=True)

---

## Loading Dataset Splits and Saving Images by Class

In this section, we define and process each split (training, validation, test) to include images from all stages of the dataset. The code loops through each image, checks if it belongs to one of the specified classes, and saves it into the appropriate directory.

In [3]:
# Define splits to include all images from train, validation, and test
splits = ["train", "validation", "test"]

for split in splits:
    
    # Load the specific split of the ImageNet dataset
    dataset = load_dataset("imagenet-1k", split=split, trust_remote_code=True)

    # Loop through each item in the dataset
    for index, item in enumerate(dataset):
        
        # Print progress every 1000 iterations
        if index % 1000 == 0:
            print(f"Processing image {index} from the {split} split...")

        class_id = item['label']

        # Check if the class ID is one we want to keep
        if class_id in target_classes.values():
            
            # Get the class name based on the class ID
            class_name = [name for name, id in target_classes.items() if id == class_id][0]

            # Create a folder for this class if it doesn't exist
            class_folder = os.path.join(output_dir, class_name)
            os.makedirs(class_folder, exist_ok=True)

            # Load the image and save it to the appropriate folder
            img = item['image']  # PIL Image object
            
            # Save each image with a unique filename to avoid overwrites
            image_filename = os.path.join(class_folder, f"{split}_{index}.jpg")
            img.save(image_filename)

Processing image 0 from the train split...
Processing image 1000 from the train split...
Processing image 2000 from the train split...
Processing image 3000 from the train split...
Processing image 4000 from the train split...
Processing image 5000 from the train split...
Processing image 6000 from the train split...
Processing image 7000 from the train split...
Processing image 8000 from the train split...
Processing image 9000 from the train split...
Processing image 10000 from the train split...
Processing image 11000 from the train split...
Processing image 12000 from the train split...
Processing image 13000 from the train split...
Processing image 14000 from the train split...
Processing image 15000 from the train split...
Processing image 16000 from the train split...
Processing image 17000 from the train split...
Processing image 18000 from the train split...
Processing image 19000 from the train split...
Processing image 20000 from the train split...
Processing image 21000 fro



Processing image 94000 from the train split...
Processing image 95000 from the train split...
Processing image 96000 from the train split...
Processing image 97000 from the train split...
Processing image 98000 from the train split...
Processing image 99000 from the train split...
Processing image 100000 from the train split...
Processing image 101000 from the train split...
Processing image 102000 from the train split...
Processing image 103000 from the train split...
Processing image 104000 from the train split...
Processing image 105000 from the train split...
Processing image 106000 from the train split...
Processing image 107000 from the train split...
Processing image 108000 from the train split...
Processing image 109000 from the train split...
Processing image 110000 from the train split...
Processing image 111000 from the train split...
Processing image 112000 from the train split...
Processing image 113000 from the train split...
Processing image 114000 from the train split..



Processing image 333000 from the train split...
Processing image 334000 from the train split...
Processing image 335000 from the train split...
Processing image 336000 from the train split...
Processing image 337000 from the train split...
Processing image 338000 from the train split...
Processing image 339000 from the train split...
Processing image 340000 from the train split...
Processing image 341000 from the train split...
Processing image 342000 from the train split...
Processing image 343000 from the train split...
Processing image 344000 from the train split...
Processing image 345000 from the train split...
Processing image 346000 from the train split...
Processing image 347000 from the train split...
Processing image 348000 from the train split...
Processing image 349000 from the train split...
Processing image 350000 from the train split...
Processing image 351000 from the train split...
Processing image 352000 from the train split...
Processing image 353000 from the train s



Processing image 538000 from the train split...
Processing image 539000 from the train split...
Processing image 540000 from the train split...
Processing image 541000 from the train split...
Processing image 542000 from the train split...
Processing image 543000 from the train split...
Processing image 544000 from the train split...
Processing image 545000 from the train split...
Processing image 546000 from the train split...
Processing image 547000 from the train split...
Processing image 548000 from the train split...
Processing image 549000 from the train split...
Processing image 550000 from the train split...
Processing image 551000 from the train split...
Processing image 552000 from the train split...
Processing image 553000 from the train split...
Processing image 554000 from the train split...
Processing image 555000 from the train split...
Processing image 556000 from the train split...
Processing image 557000 from the train split...
Processing image 558000 from the train s



Processing image 869000 from the train split...
Processing image 870000 from the train split...
Processing image 871000 from the train split...
Processing image 872000 from the train split...
Processing image 873000 from the train split...
Processing image 874000 from the train split...
Processing image 875000 from the train split...
Processing image 876000 from the train split...
Processing image 877000 from the train split...
Processing image 878000 from the train split...
Processing image 879000 from the train split...
Processing image 880000 from the train split...
Processing image 881000 from the train split...
Processing image 882000 from the train split...
Processing image 883000 from the train split...
Processing image 884000 from the train split...
Processing image 885000 from the train split...
Processing image 886000 from the train split...
Processing image 887000 from the train split...
Processing image 888000 from the train split...
Processing image 889000 from the train s

---

## Completion

Upon completion, all images will be saved in the `imagenet_selected_raw_classes` directory, organized by class and split. This structure provides a streamlined dataset format for further analysis or model training.

In [4]:
print("All images from train, validation, and test splits have been saved to class-specific folders.")

All images from train, validation, and test splits have been saved to class-specific folders.
