# Coordinate Calculator and Label Compiler

Inputs:
- location in pixels of beginning and end of axes
- range of axes
- units of axis range
- user's desired spacing for x coordinates
- NO legend input! how to specify different colors/markers that separates the curves (for later stages)
    - could eventually have mask look for the legend (distinctive enough) to learn what types of distinguishing features exist between the lines on the graph
    - I will ignore this part for now
- detectron2 output:
    - list of dicts, one dict for each image
    - each dict has field 'panoptic_seg' which refers to a (named?) tuple.
    - the first field in the tuple is a tensor with the ID of each pixel
    - the second field is a list of dicts with one dict per ID.
    - entries in this dict are 'id' which is the ID that matches the first field tensor, and 'category_id'
    
Outputs:
- dictionary with the following fields:
    - coordinates: a dictionary, with fields equal to the curve labels (one for each label), and entries that are a list of tuples, each with (x, and y coordinates in same units as input image)
    - units: (a tuple of x and y units)
    - beginning and end of curve
    - NO labels (for later stages): a list of labels for each curve, taken from legend if possible, or automatic (data1, data2, etc)


Options for coordinates:
1. based on the actual axis, choose a set of x coordinates for which we want to know the y coordinates (for each curve)
2. based on pixels, grab points that occur every "few" pixels along the x axis, then convert to coordinates

- could do the first option by finding out how many pixels are between each of the given x points, and using that number of pixels paired with the second option.
- do we want user to input the spacing between points, or should we use a default value that seems reasonable based on the range of the x axis?
- should the different curves have uniform x axis values (so y values are directly comparable), or should we have the sets of data points for each curve start at the 'beginning' of the curve and end at the 'end'


Outlier cases:
- nested plots
- weird legend formatting

In [2]:
# uniform x axis but also save the very first and last endpoints of the curve 

### Notes, TODO
- adjust the x and y pixel locations to lie on the same horizontal and vertical lines
- put the coordinate and pixel min max values in a dictionary or something more compact

In [1]:
import pandas as pd
import numpy as np

In [3]:
# create the axis info dictionary
def get_axis_info(xcoordinatemin, xcoordinatemax, xpixelmin, xpixelmax, ycoordinatemin, ycoordinatemax, ypixelmin, ypixelmax, max_points, units):
    # if the input from kev is in a more compact form extract this info below before passing into the scale funcitons
    x_scale = get_x_scale(xcoordinatemin, xcoordinatemax, xpixelmin, xpixelmax)
    y_scale = get_y_scale(ycoordinatemin, ycoordinatemax, ypixelmin, ypixelmax)
    pixel_origin = (xpixelmin, ypixelmax) # assumes that y pixel max is the larger y value
    axis_info_dict = {'pixel_origin': pixel_origin,
                      'x_scale': x_scale,
                      'y_scale': y_scale
                      'step': get_step(max_points, xpixelmin, xpixelmax)
                      'units': units}
    return axis_info_dict

In [None]:
# convert the max points desired into a step size in pixels
def get_step(max_points, xpixelmin, xpixelmax):
    #step = length of x axis in pixels / max_points
    step = (xpixelmax - xpixelmin) / max_points
    return step

In [1]:
# establish scaling from pixel to real units
def get_x_scale(xcoordinatemin, xcoordinatemax, xpixelmin, xpixelmax):
    # the x pixel and x coordinate count up in the same direction
    pixel_range = xpixelmax - xpixelmin
    coordinate_range = xcoordinatemax - xcoordinatemin
    x_scale = pixel_range / coordinate_range
    return x_scale # pixels per coordinate


def get_y_scale(ycoordinatemin, ycoordinatemax, ypixelmin, ypixelmax):
    # y pixel count down and y coordinate count up from origin
    pixel_range = ypixelmin - ypixelmax
    coordinate_range = ycoordinatemax - ycoordinatemin
    y_scale = pixel_range / coordinate_range
    return y_scale # pixels per coordinate

In [None]:
# convert pixel location to coordinates
def pixel_to_coords(pixel_loc, axis_info_dict):
    # pixel_loc is a tuple (x,y) of pixel location starting from top left
    x_pixel_loc = pixel_loc[0]
    coord_x = x_pixel_to_coords(x_pixel_loc, axis_info_dict)
    
    # get signed distance from pixel to origin in x and y(pixel units):
    pixel_distance_y = axis_info_dict['pixel_origin'][1] - pixel_loc[1]
    
    # pixels / (pixel/coord) = coord
    coord_y = pixel_distance_y / axis_info_dict['y_scale']
    return (coord_x, coord_y)

In [4]:
# for use later in code, just the x axis pixel to coords
def x_pixel_to_coords(x_pixel_loc, axis_info_dict):
    pixel_distance_x = x_pixel_loc - axis_info_dict['pixel_origin'][0]
    coord_x = pixel_distance_x / axis_info_dict['x_scale']
    return coord_x

In [4]:
# each curve is one ID. get all pixel locations for one ID
def get_pixels_for_id(ID, pixel_tensor):
    pixel_array = np.array(pixel_tensor)
    result = np.where(pixel_array == ID)
    pixel_lst = list(zip(result[0], result[1]))
    return pixel_lst

maybe incorporate later:

convert ID to the label from the legend input

def get_label_for_id():
    return label

In [None]:
# dont really need this. accomplished in the unify x function
# pretty much get the avg y pixel function for each unique x pixel value
def clean_pixel_lst(pixel_lst):
    return cleaned_pixel_lst

In [1]:
# find closest number in list to value val
def closest(lst, val):
    lst = np.asarray(lst)
    idx = (np.abs(lst - val)).argmin()
    return lst[idx]

In [None]:
# return x and y values of the unified list
def unify_x(pixel_lst, axis_info_dict):
    # get the number of pixels between each desired coordinate pt based on scale
    # create bins of pixels and avg the values between them
    # pretty much copy (with modification) my existing code from elsewhere
    pixel_lst.sort()
    
    # step is a global variable based on user input
    step = axis_info_dict['step']
    x_end = len(pixel_lst) / step
    x_vals = [range(0, x_end, step)]
    
    # create dictionary of point and closest standard x val
    closest_dict = {}
    for point in pixel_lst:
        key = closest(x_vals, point[0])
        if key in closest_dict:
            closest_dict[key].append(point)
        else:
            closest_dict[key] = [point]
    
    # iterate through keys to average all y values in each set
    for key in closest_dict:
        y_vals = [i[1] for i in closest_dict[key]]
        y_val = sum(y_vals) / len(y_vals)
        closest_dict[key] = y_val
    
    # for all the missing dict keys, make a line between nearest values and fill it in
    existing_keys = list(closest_dict.keys())
    existing_keys.sort()
    for x in x_vals:
        if x not in existing_keys:
            # find the index of first existing x greater than x
            i = 0
            while existing_keys[i] < x and i < (len(x_vals + 1)):
                i += 1
            
            x2 = existing_keys[i] # existing x just above missing x
            y2 = closest_dict[x2]
            x1 = existing_keys[i-1] # existing x just below missing x
            y1 = closest_dict[x1]
            
            # find line between bounds
            m = (y1 - y2) / (x1 - x2)
            b = (x1 * y2 - x2 * y1) / (x1 - x2)
            
            # solve for y of x
            y = m * x + b
            closest_dict[x] = y

        else:
            continue
    
    # turn dictionary into a list of tuples
    unified_pixel_lst = list(closest_dict.items())
    unified_pixel_lst.sort()
    
    
    return unified_pixel_lst # a list of (x, y) tuples

In [6]:
# step 1 to create coordinate dictionary to add to the output dict
def create_pixel_dict(panoptic_seg):
    # initialize dict
    pixel_dict = {}
    
    # get list of IDs:
    lst_of_dicts = panoptic_seg[1]
    
    for id_dict in lst_of_dicts:
        # get ID:
        ID = id_dict['id']
        pixel_lst = get_pixels_for_id(ID)
        #cleaned_pixel_lst = clean_pixel_lst(pixel_lst)
        
        # add the list of pixels for this ID to the pixel dict
        pixel_dict[str(ID)] = pixel_lst
    return pixel_dict

In [7]:
# create coordinate dictionary to add to the output dict
def create_coordinate_dict(pixel_dict, axis_info_dict):
    # initialize dict
    coordinate_dict = {}
    
    for ID in pixel_dict.keys():
        pixel_lst = pixel_dict[ID]
        
        # get unified x axis:
        # add an if statement to handle user specifiying either step size (in coordinates) or number of points
        unified_pixel_lst = unify_x(pixel_lst, axis_info_dict)
        
        # convert pixels to coordinates
        coordinate_lst = []
        for pixel_loc in unified_pixel_lst:
            coordinate_lst.append(pixel_to_coords(pixel_loc, axis_info_dict))
        
        # add the list of coordinates for this ID to the coordinate dict
        coordinate_dict[str(ID)] = coordinate_lst
    return coordinate_dict

In [None]:
# get the coordinate locations of the start and end of the curve
def get_start_end(pixel_dict):
    # initialize dict
    start_end_dict = {}
    
    for ID in pixel_dict.keys():
        pixel_lst = pixel_dict[ID]
        
        # get start and end, assumes x is in the first position
        start = x_pixel_to_coords(min(pixel_lst)[0])
        end = x_pixel_to_coords(max(pixel_lst)[0])
        # add the start and end tuple for this ID to the pixel dict
        start_end_dict[str(ID)] = (start, end)
    return start_end_dict

In [8]:
# create the output dictionary and input the units of axis and labels directly from the input info,
# as well as the coordinate dictionary
def create_output_dict(panoptic_seg, axis_info_dict):
    pixel_dict = create_pixel_dict(panoptic_seg)
    output_dict = {}
    output_dict['coordinates'] = create_coordinate_dict(pixel_dict, axis_info_dict)
    output_dict['start_end'] = get_start_end(pixel_dict)
    output_dict['units'] = axis_info_dict['units']
    return output_dict

In [None]:
# add section to convert to csv
def write_results_to_excel(output_dict):
    writer = pd.ExcelWriter('multiple.xlsx', engine 'xlsxwriter')
    x_units = output_dict['units'][0]
    y_units = output_dict['units'][1]
    
    # a summary of the start and end of the x axis for each ID
    starts = []
    ends = []
    ids = []
    for ID in output_dict[start_end].keys():
        ids.append(ID)
        start = output_dict['start_end'][ID][0]
        end = output_dict['start_end'][ID][1]
        starts.append(start)
        ends.append(end)
    df = pd.DataFrame([ids, starts, ends],  columns=['ID', 'x start '+str(x_units), 'x end '+str(x_units)])
    df.to_excel(writer, sheet_name='starts_ends')
    
    # the actual data in xy form, one ID per sheet
    for ID in output_dict['coordinates'].keys():
        x = output_dict['coordinates'][ID][0]
        y = output_dict['coordinates'][ID][1]
        column_titles = ['x, ' + str(x_units), 'y, ' + str(y_units)]
        df = pd.DataFrame([x, y], columns=column_titles)
        df.to_excel(writer, sheet_name=str(ID))
    writer.save()