# Dataset Generator

In this notebook we find the necessary code to produce the dataset we will use for training and testing the model which will identify the circunferences in an area of 100x100 units

## CONSTANTS

Define here the constants for the generation of the DataSet

In [22]:
NUM_IMAGES = 10 # Size of the Dataset
NUM_CIRC = 3 # Number of circunferences
RANDOMNESS = 1.0 # Ammount of randomness in the points of a circunference. The more the randomness, the less defined the circunference. It is advised to not make it too large
RANGE_POINTS = (30,50) # Range for the ammount of points in a circunference of radius 1, the ammount of points for a circunference will be random inside this interval. For other radiuses it will be proportional
RANGE_RADIUS = (5.0, 15.0) # Range for the radius of the circunferences. Values must be float
NOISE_RATIO = 1/10 # Number of random points generated in the Dataset per circunference point. These will be added at the end of the process

OUTPUT = "../dataset" # Dataset output

## Circunference generator

The first thing to do is create a function that will give us a circunference based on the values provided of x ofsset, y offset, radius of the circunference and ammount of randomness in the points (x and y)

This function will generate the requested ammount of points in random positions throughout the circunference

In [23]:
import random
import matplotlib.pyplot as plt
from math import pi, cos, sin

def get_circunference_points(n, center, rad):
    points = []
    for _ in range(n):
        theta = random.random() * 2 * pi
        x = center[0] + cos(theta)*rad*RANDOMNESS + (random.random()/10)*rad
        y = center[1] + sin(theta) * rad*RANDOMNESS + (random.random()/10)*rad
        points.append((x,y)) if 0<=x<=100 and 0<=y<=100 else _ # We only add those points inside the valid range
    return points

## Classes

We will need two classes to define the type of points in the dataset. We have two types of points, those belonging to Noise and those part of a Circunference

To make it easier to store the information, we will only use one type of class, allowing null values

In [29]:
class PointsSet:
    def __init__(self, points, center, circ_no):
        self.points = points
        self.center = center
        self.circ_no = circ_no

    def add_point(self, point):
        self.points.append(point)

    def is_noise(self):
        return self.circ_no is None

    def unpack(self):
        if self.is_noise():
            return [[p[0], p[1], None, None] for p in self.points]
        else:
            return [[p[0], p[1], self.center[0], self.center[1], self.circ_no] for p in self.points]

    def __str__(self):
        if self.is_noise():
            return f"{len(self.points)} of Noise"
        else:
            return f"Circunference {self.circ_no} has {len(self.points)} points and center in {self.center}"

## Data Creation

With the prevoiusly developed code we can now create the set of circunferences to be used in "1 image", and then creating the noise

In [30]:
def get_data():
    data = []
    for i in range(NUM_CIRC):
        circ_no = i+1 # 0 will be used for noise
        center = (random.uniform(0.0, 100.0), random.uniform(0.0, 100.0))
        rad = random.uniform(*RANGE_RADIUS)
        n = random.randint(*RANGE_POINTS)
        points = get_circunference_points(n, center, rad)
        circunference = PointsSet(points, center, circ_no)
        data.append(circunference)
    
    # Add noise
    n = int(sum([len(c.points) for c in data]) * (NOISE_RATIO)) # Number of total points so far in the dataset * NOISE_RATIO. This gives us the ammount of noise to include in the dataset
    points = [(random.uniform(0.0, 100.0), random.uniform(0.0, 100.0)) for _ in range(n)]
    noise = PointsSet(points, None, None)

    data.append(noise)
    return data


# Dataset Generation and Storing

Now that we have the tools to create "1 image", we can create an entire dataset and save it as a csv, which could be later used for machine learning and plotting results.

To store the data we will use the pandas library

In [31]:
import os
if not os.path.exists(OUTPUT):
    os.mkdir(OUTPUT)

In [32]:
import pandas as pd

dataset_list = [get_data() for _ in range(NUM_IMAGES)]

counter = 1
for data in dataset_list:
    data_frame_list = []
    for points_set in data:
        print(points_set.unpack())
        data_frame_list.extend(points_set.unpack())
    
    plane_pd = pd.DataFrame(data_frame_list,
                            columns=["point_x", "point_y", "center_x", "center_y", "circ_no"])

    # We save the data in a csv, in the output folder specified, continue to the next circle
    plane_pd.to_csv(OUTPUT+f"/{counter}.csv", sep=";", index=False)
    counter += 1

[[6.5669440286589404, 27.289236664549623, 4.590325028979625, 21.65764935156146, 1], [2.048341403730194, 26.566046898769475, 4.590325028979625, 21.65764935156146, 1], [4.0733631192542, 16.434905006479614, 4.590325028979625, 21.65764935156146, 1], [0.37256609924764883, 24.982653471556866, 4.590325028979625, 21.65764935156146, 1], [2.8159927333793964, 27.158808119948024, 4.590325028979625, 21.65764935156146, 1], [10.244853127643545, 22.259644825038293, 4.590325028979625, 21.65764935156146, 1], [2.7907440608051224, 16.71195297430633, 4.590325028979625, 21.65764935156146, 1], [2.7458554627353915, 27.087419278919032, 4.590325028979625, 21.65764935156146, 1], [10.631651940747412, 22.552435555161786, 4.590325028979625, 21.65764935156146, 1], [8.983923732702157, 25.58624267428223, 4.590325028979625, 21.65764935156146, 1], [0.09098031830661996, 19.111703278122192, 4.590325028979625, 21.65764935156146, 1], [9.893358742726795, 18.800396185229516, 4.590325028979625, 21.65764935156146, 1], [10.44715

In [33]:
print(plane_pd)

       point_x    point_y   center_x   center_y  circ_no
0    56.099394   0.302452  58.214301  11.398316      1.0
1    65.685110   1.175834  58.214301  11.398316      1.0
2    49.692260   2.397215  58.214301  11.398316      1.0
3    71.037204   9.094849  58.214301  11.398316      1.0
4    53.486153   0.138940  58.214301  11.398316      1.0
..         ...        ...        ...        ...      ...
134  83.863523   4.875457        NaN        NaN      NaN
135  45.653520  92.413589        NaN        NaN      NaN
136  28.842515   0.591662        NaN        NaN      NaN
137  43.786063  51.078078        NaN        NaN      NaN
138  86.914788   7.437961        NaN        NaN      NaN

[139 rows x 5 columns]
