# **DomestiQ-1000 Dataset Overview**

## **Introduction**
The **DomestiQ-1000** dataset is a curated collection of **high-quality images** of household objects, designed for training and evaluating machine learning models in object recognition and classification tasks. It contains images sourced from Google Images, ensuring diversity in object appearances, backgrounds, and lighting conditions.

## **Dataset Structure**
The dataset is organized into **20 categories**, each representing a distinct household object. Images are stored in a structured directory format:
DomestiQ-1000/ │── 3-seater_sofa_images_clear/ │ ├── image_1.jpg │ ├── image_2.jpg │ └── ... │── refrigerator_2-door_images/ │── microwave_single_door_images/ │── ... └── shoes_images/



# DomestiQ-1000 Dataset Categories

The **DomestiQ-1000** dataset consists of **20 categories** of household objects, each containing approximately **50 images**. The dataset is structured to include variations in object appearance, background, and resolution.

## Categories:

1. **3-seater sofa images clear**  
2. **Refrigerator 2-door images**  
3. **Microwave single door images**  
4. **Kitchen chimney images**  
5. **Split AC images**  
6. **Bed images**  
7. **4-legged chair images**  
8. **Ceiling fan 3 blade images**  
9. **Dining table 6-seater images**  
10. **Ladder aluminum images**  
11. **Mixer grinder images**  
12. **Scissors images**  
13. **Wall clock simple images**  
14. **Induction cooktop images**  
15. **Washing machine front load images**  
16. **Photo frames single images**  
17. **Toaster images**  
18. **Motorcycle helmet images**  
19. **Single knife images**  
20. **Shoes images**  

Each category is stored in a separate folder under **DomestiQ-1000/**, with metadata recorded in `metadata.csv`.

Additionally, a **metadata.csv** file is provided, containing detailed information about each image:

| **Category**                | **URL**                      | **Filename**                     | **Resolution** |
|-----------------------------|-----------------------------|----------------------------------|---------------|
| 3-seater sofa images clear  | `https://image-url.com/1`  | `3-seater_sofa_1.jpg`           | `1200x800`    |
| Refrigerator 2-door images  | `https://image-url.com/2`  | `refrigerator_2-door_1.jpg`     | `1024x768`    |
| Microwave single door images| `https://image-url.com/3`  | `microwave_single_door_1.jpg`   | `800x600`     |

## **Dataset Size**
- **Total Images:** **~1,000**
- **Images per Category:** **50**
- **Image Resolutions:** Varying resolutions from **800x600** to **1920x1080**
- **File Format:** JPEG and PNG

## **Notes**
- Images have been automatically downloaded from Google Images.
- Some images may require preprocessing (resizing, background removal, normalization) before model training.


In [None]:
from bing_image_downloader import downloader
import os
import csv
import re
from PIL import Image

categories = [
    "3-seater sofa images clear", "refrigerator 2-door images", "microwave single door images",
    "kitchen chimney images", "split ac images", "bed images", "4-legged chair images",
    "ceiling fan 3 blade images", "dining table 6-seater images", "ladder aluminum images",
    "mixer grinder images", "scissors images", "wall clock simple images", "induction cooktop images",
    "washing machine front load images", "photo frames single images", "toaster images",
    "motorcycle helmet images", "single knife images", "shoes images"
]

def download_images(category, num_images):
    output_dir = f'DomestiQ-1000/{category}'
    os.makedirs(output_dir, exist_ok=True)

    log_file = f"{output_dir}/download_log.txt"
    with open(log_file, "w") as log:
        import sys
        original_stdout = sys.stdout
        sys.stdout = log

        downloader.download(category, limit=num_images, output_dir='DomestiQ-1000',
                            adult_filter_off=True, force_replace=False, timeout=60)

        sys.stdout = original_stdout

    return log_file, output_dir


In [None]:
def extract_urls(log_file):
    urls = []
    with open(log_file, "r") as log:
        for line in log:
            match = re.search(r"Downloading Image #\d+ from (.+)", line)
            if match:
                urls.append(match.group(1))
    return urls


In [None]:
with open('metadata.csv', 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Category', 'URL', 'Filename', 'Resolution']) 

    for category in categories:
        log_file, output_dir = download_images(category, 50)
        print(f"Processing logs for: {category}")
        
        image_urls = extract_urls(log_file)

        for index, filename in enumerate(os.listdir(output_dir)):
            file_path = os.path.join(output_dir, filename)

            try:
                with Image.open(file_path) as img:
                    resolution = f"{img.width}x{img.height}"
                
                image_url = image_urls[index-1] if index <= len(image_urls) else "URL Not Available"
                
                writer.writerow([category, image_url, filename, resolution])

            except Exception as e:
                print(f"Error processing {file_path}: {e}")

print("Metadata file 'metadata.csv' created successfully.")


Processing logs for: 3-seater sofa images clear
Error processing DomestiQ-1000/3-seater sofa images clear\download_log.txt: cannot identify image file 'DomestiQ-1000/3-seater sofa images clear\\download_log.txt'
Processing logs for: refrigerator 2-door images
Error processing DomestiQ-1000/refrigerator 2-door images\download_log.txt: cannot identify image file 'DomestiQ-1000/refrigerator 2-door images\\download_log.txt'
Processing logs for: microwave single door images
Error processing DomestiQ-1000/microwave single door images\download_log.txt: cannot identify image file 'DomestiQ-1000/microwave single door images\\download_log.txt'
Processing logs for: kitchen chimney images
Error processing DomestiQ-1000/kitchen chimney images\download_log.txt: cannot identify image file 'DomestiQ-1000/kitchen chimney images\\download_log.txt'
Processing logs for: split ac images
Error processing DomestiQ-1000/split ac images\download_log.txt: cannot identify image file 'DomestiQ-1000/split ac image