# This Notebook controls the creation of datasets

### The settings dictionary below controls the dataset being created. Read each setting to see what it does. As a note, the truth_data_settings don't do anything (currently), but they will once I know how the ML algorithms are validating their training. 

In [1]:
settings = {
    
    # data settings control the data used in the creation of the dataset
    "data": {
        
        # "background" controls which dataset is used for the background images
        # Supported background options:
            # "Random": the dataset of random images
            # "Walls": the dataset of wall images
            # "Simple": a dataset of very simple designs
            # "Desktop_Backgrounds": a dataset of desktop backgrounds akin to the Bing daily wallpaper
            # "Landscape": dataset of nature landscapes, akin to Bing daily wallpaper
            # "Blackjack_Best": best blackjack images
            #  etc...
        "background": "1000_Comprehensive", 


        # "use_all_background_images": boolean. If set to true, we will simply place chips on each image and save them
        "use_all_background_images": True, 
        
        # "objects" controls which dataset is used for the object images
        # Supported object options:
            # "CLeanedCroppedChips": all the circle-cropped chips (~3700)
            # "Post2020_Cropped": all the non-deprecated chips that have been introduced post-2020 (618)
            # "Post2010_Cropped": all the non-deprecated chips that have been introduced post-2010 (113)
        "chips": "CLeanedCroppedChips"
        },
        
    # Placement settings control how the chips on the dataset are placed on the images
    "placement": {
        
        # "type" is the type of dataset created
        # Supported type options:
            # "singleton": a single chip on the margin, guaranteed to not be cut off by the background margins
            # "multiple": multiple chips on the image, guaranteed to not be cut off by the background margins
            # "stacked": ??? - Ask George/Sam
            # "overlaid": ??? - Ask George/Sam
        "type": "multiple",
        
        # Below, the settings for each placement type are defined. You only need to define the settings for the placement 
        # type you are using. e.g., you only need to define the "singleton" settings if the "type" is "singleton". 
        # Having other types defined do not affect the program - they will be ignored. 
        
        # "singleton_settings" are settings for making a "singleton" dataset
        "singleton_settings": {
            
            # "random_rotation": Boolean for whether to rotate the chips or not
            "random_rotation": True, 
            
            # "size_range": Tuple for how big we want the chip with respect to the image's width. It is a proportion. 
            # The first value in the Tuple is the lower range, the second is the upper range. Sizes are chosen based on 
            # a uniform distribution between these two ranges. 
            "size_range": (0.05, 0.06), 
            }, 
        
        # "multiple_settings" are settings used for making a "multiple" dataset
        "multiple_settings": {
            
            # "random_rotation": Boolean for whether to rotate the chips or not
            "random_rotation": True, 
            
            # "size_range": Tuple for how big the chips are, see "singleton_settings"["size_range"]
            "size_range": (0.05, 0.06), 
            
            # "num_chips_range": Tuple for how many chips there will be. It is an integer.
            # The first value is the lower value, 
            # the second value is the upper value. Number of chips are chosen based on a uniform distribution
            "num_chips_range": (7, 12), 
            
            }, 
        
        # "stacked_settings": Unsupported
        "stacked_settings": None,
        
        # "overlaid_settings": Unsupported
        "overlaid_settings": None

        },
    
    # "truth_data_settings" are settings used to control what truth data is recorded
    # Supported truth_data_settings options:
        # "cartesian": records # of chips, x, y, and radius information for each chip
        # "segmentation": records segmentation data, which is dicolor images representing chip and non-chip regions
    "truth_data_settings": "cartesian"
    
}

## In the Cell below, we create our dataset object, which calls upon the dataset_class in lib. We specify the name of the dataset we want to create as well as the number of images we want in it. 

In [4]:
import lib
from lib.dataset_file import dataset_class

# size: number of images in the dataset you want to make
size = 1000

# folder_name: name of the folder you want the dataset to go into
# NOTE: this will DELETE the content of folders with the same name before writing data to it
# NOTE: this folder will be created in /Dataset
folder_name = "1000_Comprehensive_Chip_Dataset"

dataset_object = dataset_class(size = size, folder_name = folder_name, settings = settings)

## The cell below creates the dataset, whose progress is indicated by the loading bar. As a note, if you have edited any of the python files (like dataset_file.py), make sure you've restarted the kernel before running the cell below. 

In [5]:
dataset_object.create_dataset()
print("Done!")

Loading Chips: 1it [00:00, 132.05it/s]


Dataset folder Dataset/1000_Comprehensive_Chip_Dataset already exists. Do you want to delete it and create a new one? (y/n)


 y


Deleted Dataset/1000_Comprehensive_Chip_Dataset
Created Dataset/1000_Comprehensive_Chip_Dataset


Creating Dataset: 1000it [02:46,  5.99it/s]


Done!


In [10]:
print(dataset_object.truth_data)

[]
