# Dataset Builder

This notebook is used to populate the different datasets

## Get different pet images
The first step is to get pet images from online resources
Indicate the number of images you want to collect

In [1]:
number_of_images = 100

We will now collect the requested number of images for the following pets:
* cat
* dog
* fox
* bird
* panda

In [2]:
import requests
import shutil
import os
import json
from tqdm import tqdm
import random

In [3]:
def save_image(url, path):
    response = requests.get(url, stream=True)
    with open(path, "wb") as out_file:
        shutil.copyfileobj(response.raw, out_file)
    del response


def create_directory_if_missing(path):
    if not os.path.exists(path):
        os.makedirs(path)


def get_pet_images(pet_name, number_of_images=number_of_images):
    path = f"../pets/{pet_name}"
    create_directory_if_missing(path)
    url = f"https://some-random-api.ml/animal/{pet_name}"
    for i in tqdm(range(number_of_images)):
        response = json.loads(requests.get(url).text)
        save_image(response["image"], f"{path}/{pet_name}_{i}.jpg")


In [4]:
pets = ["cat", "dog", "fox", "bird", "panda"]

for pet in pets:
    print(f"Gathering {number_of_images} {pet} Images")
    get_pet_images(pet, number_of_images)


Gathering 100 cat Images


100%|█████████████████████████████████████████████████| 100/100 [05:37<00:00,  3.37s/it]


Gathering 100 dog Images


100%|█████████████████████████████████████████████████| 100/100 [02:48<00:00,  1.68s/it]


Gathering 100 fox Images


100%|█████████████████████████████████████████████████| 100/100 [00:24<00:00,  4.15it/s]


Gathering 100 bird Images


100%|█████████████████████████████████████████████████| 100/100 [00:35<00:00,  2.85it/s]


Gathering 100 panda Images


100%|█████████████████████████████████████████████████| 100/100 [00:28<00:00,  3.52it/s]


## Building the datasets

In [5]:
def build_dataset(favorite_pet):
    path = f"../datasets/dataset_{favorite_pet}"
    create_directory_if_missing(path)
    shutil.copytree(f"../pets/{favorite_pet}", f"{path}/{favorite_pet}")
    create_directory_if_missing(path + "/others")
    all_pets = [pet for pet in pets]
    all_pets.remove(favorite_pet)
    all_pet_pictures = []
    for pet in all_pets:
        all_pet_pictures += [
            f"../pets/{pet}/{pic}" for pic in os.listdir(f"../pets/{pet}")
        ]
    random_pictures = random.sample(all_pet_pictures, number_of_images)
    for pic in random_pictures:
        shutil.copy2(pic, path + "/others/" + pic.split("/")[-1])
    print(f"Compiled {favorite_pet} dataset")


In [6]:
build_dataset("cat")
build_dataset("dog")
build_dataset("fox")
build_dataset("bird")
build_dataset("panda")


Compiled cat dataset
Compiled dog dataset
Compiled fox dataset
Compiled bird dataset
Compiled panda dataset
