# Dataset Generator

In this notebook we find the necessary code to produce the dataset we will use for training and testing the model which will identify the circunferences in an area of 100x100 units

## CONSTANTS

Define here the constants for the generation of the DataSet

In [201]:
NUM_IMAGES = 10 # Size of the Dataset
NUM_CIRC = 3 # Number of circunferences
RANDOMNESS = 1.0 # Ammount of randomness in the points of a circunference. The more the randomness, the less defined the circunference. It is advised to not make it too large
RANGE_POINTS = (30,50) # Range for the ammount of points in a circunference of radius 1, the ammount of points for a circunference will be random inside this interval. For other radiuses it will be proportional
RANGE_RADIUS = (5.0, 15.0) # Range for the radius of the circunferences. Values must be float
NOISE_RATIO = 1/10 # Number of random points generated in the Dataset per circunference point. These will be added at the end of the process

OUTPUT = "../dataset" # Dataset output

## Circunference generator

The first thing to do is create a function that will give us a circunference based on the values provided of x ofsset, y offset, radius of the circunference and ammount of randomness in the points (x and y)

This function will generate the requested ammount of points in random positions throughout the circunference

In [202]:
import random, os
import matplotlib.pyplot as plt
from math import pi, cos, sin, sqrt
import numpy as np

def get_circunference_points(n, center, rad):
    points = []
    for _ in range(n):
        theta = random.random() * 2 * pi
        x = center[0] + cos(theta)*rad + (random.random()/10)*rad*RANDOMNESS
        y = center[1] + sin(theta) * rad + (random.random()/10)*rad*RANDOMNESS
        points.append((x,y)) if 0<=x<=100 and 0<=y<=100 else _ # We only add those points inside the valid range
    return points

## Classes

We will need two classes to define the type of points in the dataset. We have two types of points, those belonging to Noise and those part of a Circunference

To make it easier to store the information, we will only use one type of class, allowing null values

In [203]:
class PointsSet:
    def __init__(self, points, center, circ_no):
        self.points = points
        self.center = center
        self.circ_no = circ_no

    def add_point(self, point):
        self.points.append(point)

    def get_radius(self):
        return np.median([sqrt((self.center[0]-p[0])**2 + (self.center[1]-p[1])**2) for p in self.points])

    def is_noise(self):
        return self.circ_no is None

    def unpack(self):
        if self.is_noise():
            return [[p[0], p[1], None, None] for p in self.points]
        else:
            return [[p[0], p[1], self.center[0], self.center[1], self.circ_no] for p in self.points]

    def __str__(self):
        if self.is_noise():
            return f"{len(self.points)} of Noise"
        else:
            return f"Circunference {self.circ_no} has {len(self.points)} points and center in {self.center}"

## Data Creation

With the prevoiusly developed code we can now create the set of circunferences to be used in "1 image", and then creating the noise. Before creating the rings, we have to come up with a set of centers and radii that will satisfy the set_type requested

In [240]:
def get_data(set_type):
    centers = [(random.uniform(0.0, 100.0), random.uniform(0.0, 100.0)) for _ in range(NUM_CIRC)]
    radii = [random.randint(*RANGE_RADIUS) for _ in range(NUM_CIRC)]
    collides = any(any((centers[i][0]-centers[j][0])**2 + (centers[i][1]-centers[j][1])**2 <= (radii[i]+radii[j])**2 for i in range(j+1, NUM_CIRC)) for j in range(NUM_CIRC))
    extends = any(0>=centers[i][0]-radii[i] or radii[i]+centers[i][0]>=100 or 0>=centers[i][1]-radii[i] or radii[i]+centers[i][1]>=100 for i in range(NUM_CIRC))

    match set_type:
        case "clean":
            while(collides or extends): 
                centers = [(random.uniform(0.0, 100.0), random.uniform(0.0, 100.0)) for _ in range(NUM_CIRC)]
                radii = [random.randint(*RANGE_RADIUS) for _ in range(NUM_CIRC)]
                collides = any(any((centers[i][0]-centers[j][0])**2 + (centers[i][1]-centers[j][1])**2 <= (radii[i]+radii[j])**2 for i in range(j+1, NUM_CIRC)) for j in range(NUM_CIRC))
                extends = any(0>=centers[i][0]-radii[i] or radii[i]+centers[i][0]>=100 or 0>=centers[i][1]-radii[i] or radii[i]+centers[i][1]>=100 for i in range(NUM_CIRC))
        case "extends":
            while(collides or not extends):
                centers = [(random.uniform(0.0, 100.0), random.uniform(0.0, 100.0)) for _ in range(NUM_CIRC)]
                radii = [random.randint(*RANGE_RADIUS) for _ in range(NUM_CIRC)]
                collides = any(any((centers[i][0]-centers[j][0])**2 + (centers[i][1]-centers[j][1])**2 <= (radii[i]+radii[j])**2 for i in range(j+1, NUM_CIRC)) for j in range(NUM_CIRC))
                extends = any(0>centers[i][0]-radii[i] or radii[i]+centers[i][0]>100 or 0>centers[i][1]-radii[i] or radii[i]+centers[i][1]>100 for i in range(NUM_CIRC))
        case "collission":
            while(not collides):
                centers = [(random.uniform(0.0, 100.0), random.uniform(0.0, 100.0)) for _ in range(NUM_CIRC)]
                radii = [random.randint(*RANGE_RADIUS) for _ in range(NUM_CIRC)]
                collides = any(any((centers[i][0]-centers[j][0])**2 + (centers[i][1]-centers[j][1])**2 <= (radii[i]+radii[j])**2 for i in range(j+1, NUM_CIRC)) for j in range(NUM_CIRC))
                extends = any(0>=centers[i][0]-radii[i] or radii[i]+centers[i][0]>=100 or 0>=centers[i][1]-radii[i] or radii[i]+centers[i][1]>=100 for i in range(NUM_CIRC))

    data = [PointsSet(get_circunference_points(random.randint(*RANGE_POINTS), centers[i], radii[i]), centers[i], i+1) for i in range(NUM_CIRC)]
    '''
    for i in range(NUM_CIRC):
        circ_no = i+1 # 0 will be used for noise
        center = centers[i]
        rad = radii[i]
        n = random.randint(*RANGE_POINTS)

        points = get_circunference_points(n, center, rad)
        circunference = PointsSet(points, center, circ_no)
        data.append(circunference)
    '''
    
    # Add noise
    n = int(sum([len(c.points) for c in data]) * (NOISE_RATIO)) # Number of total points so far in the dataset * NOISE_RATIO. This gives us the ammount of noise to include in the dataset
    points = [(random.uniform(0.0, 100.0), random.uniform(0.0, 100.0)) for _ in range(n)]
    noise = PointsSet(points, None, None)

    data.append(noise)
    return data


# Dataset Generation and Storing

Now that we have the tools to create "1 image", we can create an entire dataset and save it as a collection of csv, classified depending on the type or circunferences generated:
    - Cirfuncerences with no collision and no extinding over the edges
    - Circunferences extending over the edges
    - Circunferences with collision (may also be extending over the edges)

This could be later used for machine learning and plotting results.

To store the data we will use the pandas library

In [241]:
import pandas as pd

dataset_clean = [get_data(set_type="clean") for _ in range(NUM_IMAGES)]
dataset_extend = [get_data(set_type="extends") for _ in range(NUM_IMAGES)]
dataset_collission = [get_data(set_type="collission") for _ in range(NUM_IMAGES)]

def save_dataset(dataset, set_type):
    if not os.path.exists(OUTPUT+f"/{set_type}"):
        os.makedirs(OUTPUT+f"/{set_type}")
        counter = 1
    else:
        # Start counter from the last csv in the directory as to not overwrite previous data
        counter = sorted([int(x.split(".")[0]) for x in os.listdir(OUTPUT+f"/{set_type}")])[-1]+1
    for data in dataset:
        data_frame_list = []
        for points_set in data:
            data_frame_list.extend(points_set.unpack())
        
        plane_pd = pd.DataFrame(data_frame_list,
                                columns=["point_x", "point_y", "center_x", "center_y", "circ_no"])

        # We save the data in a csv, in the output folder specified, continue to the next circle
        plane_pd.to_csv(OUTPUT+f"/{set_type}/{counter}.csv", sep=";", index=False)
        counter += 1

save_dataset(dataset_clean, "clean")
save_dataset(dataset_extend, "extends")
save_dataset(dataset_collission, "collides")