# Dataset upscaling pipeline

Requires:
* .png's of microscope image frames
* Corresponding .h5 file with analyzed results

TODO: Instead of needing png's, make native support for .ims files. Integrate with Granule Explorer image read file

Capgemini:
* Don't touch RPA
* 

## Questions to answer

* When cutting granules out of images, what size should the standard cutout be? 
    * We need a standard size due to restraints on model inputs -> They need to be the same size.
    * Could do dynamic upscaling (Upscaling all to same size, irrespective of the intial cutout). This however does require us to remember the scaling factor for every granule, introducing some headaches. (Could create a dataframe storing it all for when we want to reduce them again)
         
* Granule cutout: Current method is upscaling the cutout to 1024x1024 with no regard for the granules original size. This introduces streching/distortion, should not be a problem.
                  However, can try to upscale image to be as close to 1024x1024 as possible, then just a padding for any space left.
                  I don't expect different results, but can be worthwhile to try and compare results.
* Look for 'bump' Granule 7 frame 4 has one. Does every granule have one? If so, bug in boundry code somewhere? If not, ignore, not likily to have large consequences.

# Thesis dataprocess outline

1. Create dataset
    * Grab granules from frame
    * Upscale image and border to 1024x1024 pixels
2. Train models
3. Get model results and downscale again to original shape
4. Covert results into Granule Explorer format
5. Compare model prediction results with Granule Explorer results for valid granules.
    * -> We get the deviation between the model and true result.
6. Apply model to invalid granules. 

Meeting with Pekka, every 2nd week at 14:30, next is 20/02/2024.

Machine learning Master: The project/thesis is machine learning, aka must demonstrate sufficient knowledge in this field. Meaning, just taking pretrained YOLOv8 models etc, and not finetuning them demonstrates little ability. When selecting models, must then also finetune them, so that they become better and also demonstrating knowledge of the machine learning field.

Experiment with freezing layers of pretrained models. Finetuning, ask Pekka more in the future?

In [1]:
from PIL import Image
from pathlib import Path
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import imageio.v3 as iio
import matplotlib.pyplot as plt
import numpy as np

import pathlib
import platform
plt2 = platform.system()
if plt2 == 'Windows': pathlib.PosixPath = pathlib.WindowsPath

# Read .h5 file & grab data

In [2]:
results = pd.read_hdf(Path("data/Analysis_data/2020-02-05_15.41.32-NAs-T1354-GFP_Burst.h5"), mode="r", key="fourier")

frame_id = 4
granule_id = 7 # Id of granule to plot
granule_fourier = results[(results['granule_id'] == granule_id) & (results['frame'] == frame_id)]
magnitude = granule_fourier['magnitude']
order = granule_fourier['order'].tolist()
# granule_fourier[:2]

# Info about a granule
bbox_left = granule_fourier['bbox_left'].iloc[0]
bbox_right = granule_fourier['bbox_right'].iloc[0]
bbox_top = granule_fourier['bbox_top'].iloc[0]
bbox_bottom = granule_fourier['bbox_bottom'].iloc[0]

mean_radius = granule_fourier['mean_radius'].iloc[0]
x_pos = granule_fourier['x'].iloc[0]
y_pos = granule_fourier['y'].iloc[0]
x_pos_relative = x_pos - bbox_left
y_pos_relative = y_pos - bbox_bottom

In [3]:
# All valid granules
analyzed_granule_results = results
valid_granule_fourier = analyzed_granule_results[(analyzed_granule_results['valid'] == True) & (analyzed_granule_results['frame'] == frame_id)]['granule_id']
valid_granule_ids = valid_granule_fourier.unique()
print(f"Valid granules in Frame {frame_id} are", str(valid_granule_ids.tolist()))

Valid granules in Frame 4 are [7, 4, 5, 13, 16, 18, 20, 25, 31, 33, 34, 35]


## Original granule

In [4]:
im = iio.imread('images/raw_granule_image_filter.png')
fig = make_subplots(rows=1, cols=2, 
                    horizontal_spacing=0.01, 
                    vertical_spacing=0.1,
                    subplot_titles=(f"Frame {frame_id}", f"Granule {granule_id} cutout"))

base_image_fig = px.imshow(im)
fig.add_trace(base_image_fig.data[0], 1, 1)

granule_cutout_image = im[bbox_left:bbox_right, bbox_bottom:bbox_top]
granule_cutout = px.imshow(granule_cutout_image)
fig.add_trace(granule_cutout.data[0], 1, 2)
fig.update_layout(width=700, height=400)
fig.show()
# iio.imwrite("granule_cutout.png", im[bbox_left:bbox_right, bbox_bottom:bbox_top])

## Upscale image

docs: https://pillow.readthedocs.io/en/stable/handbook/concepts.html#PIL.Image.Resampling.NEAREST

In [5]:
original_image = Image.fromarray(granule_cutout_image)

# Get the size of the original image
original_width, original_height = original_image.size

# Define the scaling factor 
scale_factor = 10

# Calculate the new size of the image after upscaling
new_width = original_width * scale_factor
new_height = original_height * scale_factor

new_width, new_height = 1024, 1024

# Resize the image using the new size
upscaled_image_hamming = np.array(original_image.resize((new_width, new_height), Image.Resampling.HAMMING)) 
upscaled_image_bicubic = np.array(original_image.resize((new_width, new_height), Image.Resampling.BICUBIC))
upscaled_image_box     = np.array(original_image.resize((new_width, new_height), Image.Resampling.NEAREST))        

# OpenCV - Test a non-PIL library for upscaling
# import cv2
# upscaled_image_hamming = cv2.resize(np.array(original_image), (new_width, new_height), interpolation=cv2.INTER_AREA)

# upscaled_image.save("upscaled_image.png")  # Replace "upscaled_image.jpg" with the desired output file path
# Image.fromarray(upscaled_image_box).save("Not_valid_graunle.png")

Resizing images lead to strange offsets in the granule border. If scaled by 10 times (s = 10), then we must also 'shove' it by $ \phi = \frac{s}{2} - \frac{1}{2}$ 

A point on the border is: 
$$b(p)=p*s + \phi$$
$$b(p)=p*s + \frac{s}{2} - \frac{1}{2}$$

In [6]:
from helper_functions import helper_functions as helper_f

fig = make_subplots(rows=2, cols=2, 
                    horizontal_spacing=0.01, 
                    vertical_spacing=0.1,
                    subplot_titles=('Standard image', 'Upscale NEAREST', 'Upscale HAMMING', 'Upscale BICUBIC',))

# Add images to plot
base_image_fig = px.imshow(granule_cutout_image)
fig.add_trace(base_image_fig.data[0], 1, 1)

upscaled_image_fig = px.imshow(upscaled_image_box)
fig.add_trace(upscaled_image_fig.data[0], 1, 2)

upscaled_image_fig = px.imshow(upscaled_image_hamming)
fig.add_trace(upscaled_image_fig.data[0], 2, 1)

upscaled_image_fig = px.imshow(upscaled_image_bicubic)
fig.add_trace(upscaled_image_fig.data[0], 2, 2)

# Calculate and draw boundry for first plot
granule_fourier = results[(results['granule_id'] == granule_id) & (results['frame'] == frame_id)]
xs,ys = helper_f.get_coords(granule_fourier, get_relative=True)
fig.add_trace(go.Scatter(x=np.append(xs,xs[0]), y=np.append(ys,ys[0]), marker=dict(color='red', size=16), name=f"100 p border {granule_id}"), row=1, col=1)

phi = scale_factor/2 -  1/2
# Add borders for upscaled images
for row,col in [(1,2),(2,1),(2,2)]:
    fig.add_trace(go.Scatter(x=phi + np.append(xs,xs[0])*scale_factor, y=phi+np.append(ys,ys[0])*scale_factor, marker=dict(color='red', size=16), name=f"100 p border {granule_id}"), row=row, col=col)
    # fig.add_trace(go.Scatter(x=scale_factor/2+np.append(xs,xs[0])*scale_factor, y=scale_factor/2+np.append(ys,ys[0])*scale_factor, marker=dict(color='red', size=16), name=f"100 p border {granule_id}"), row=row, col=col)
    # fig.add_trace(go.Scatter(x=xs, y=ys, marker=dict(color='cyan', size=16), name=f"True Border {id}"), row=1, col=1)

fig.update_layout(title_text=f"Upscaling granule cutouts - (Frame {frame_id} granule {granule_id})", title_x=0.5, width=1200, height=1000, showlegend=True, font_size=11)
# fig.add_image(new)
# mask_img = result.plot(conf=False, line_width=0, font_size=0, img=np.zeros((img_height, img_width, 3), dtype=np.uint8), kpt_radius=0, kpt_line=False, labels=False, boxes=False, masks=True, probs=False)

## Pixel borders

In [7]:
fig = make_subplots(rows=2, cols=2, 
                    horizontal_spacing=0.01, 
                    vertical_spacing=0.1,
                    subplot_titles=('Standard image', 'Upscale BOX', 'Upscale HAMMING', 'Upscale BICUBIC',))

# Add images to plot
base_image_fig = px.imshow(granule_cutout_image)
fig.add_trace(base_image_fig.data[0], 1, 1)

upscaled_image_fig = px.imshow(upscaled_image_box)
fig.add_trace(upscaled_image_fig.data[0], 1, 2)

upscaled_image_fig = px.imshow(upscaled_image_hamming)
fig.add_trace(upscaled_image_fig.data[0], 2, 1)

upscaled_image_fig = px.imshow(upscaled_image_bicubic)
fig.add_trace(upscaled_image_fig.data[0], 2, 2)

# Calculate and draw boundry for first plot
granule_fourier = results[(results['granule_id'] == granule_id) & (results['frame'] == frame_id)]
xs,ys = helper_f.get_coords(granule_fourier, get_relative=True)
xs2, ys2 = helper_f.pixels_between_points(np.round(np.append(xs,xs[-1]),0),np.round(np.append(ys,ys[-1]),0))#, precision=10)
fig.add_trace(go.Scatter(x=np.append(xs,xs[-1]), y=np.append(ys,ys[-1]), marker=dict(color='red', size=16), name=f"400 p border {granule_id}"), row=1, col=1)
# fig.add_trace(go.Scatter(x=np.round(np.append(xs,xs[-1]),0), y=np.round(np.append(ys,ys[-1]),0), marker=dict(color='cyan', size=16), name=f"Line segment border {granule_id}"), row=1, col=1)
fig.add_trace(go.Scatter(x=xs2,y=ys2, marker=dict(color='cyan', size=16), name=f"True pixel border {granule_id}"), row=1, col=1)

# Add borders for upscaled images
xs3, ys3 = helper_f.pixels_between_points(xs,ys)#,100, scale_factor_x=scale_factor, scale_factor_y=scale_factor)
for row,col in [(1,2),(2,1),(2,2)]:    
    fig.add_trace(go.Scatter(x=phi+np.append(xs,xs[-1])*scale_factor, y=phi+np.append(ys,ys[-1])*scale_factor, marker=dict(color='red', size=16), name=f"400 p border {granule_id}"), row=row, col=col)
    fig.add_trace(go.Scatter(x=xs3, y=ys3, marker=dict(color='cyan', size=16), name=f"True pixel border {granule_id}"), row=row, col=col)
    # fig.add_trace(go.Scatter(x=np.round(np.append(xs,xs[-1])*scale_factor,0), y=np.round(np.append(ys,ys[-1])*scale_factor,0), marker=dict(color='orange', size=16), name=f"Line segment border {granule_id}"), row=row, col=col)

fig.update_layout(title_text="Pixel borders", title_x=0.5, width=1200, height=1000, showlegend=False, font_size=11)
# fig.add_image(new)
# mask_img = result.plot(conf=False, line_width=0, font_size=0, img=np.zeros((img_height, img_width, 3), dtype=np.uint8), kpt_radius=0, kpt_line=False, labels=False, boxes=False, masks=True, probs=False)


divide by zero encountered in double_scalars


divide by zero encountered in double_scalars

