Scale Analysis Example
====
Using scales from Rabia Sevil, which is why this file is named like it is

We'll read the LIF files, try to segment them out and then run EFA to summarise their shape variation.
So far I've only done this on the ALP scales, since they looked like they'd be easiest to segment.

Research Question
----
#### What morphological differences are there between hom/wt and onto/regen scales?

NB:
 - hom = mutant (their bones form far quicker than normal, or something)
 - wt = wildtype, as they appear in the wild
 - onto = fully formed scales
 - regen = scales that are still growing

we'd expect the hom scales to be weird shapes, the regen scales to be smaller...

Read in the scales
----
I've segmented the scales out following the process in [the segmentation notebook](segmentation.ipynb).

This involved:
 - making an initial rough segmentation by thresholding
 - Running SAM (a transformer-based model from Meta) on the scales, using the rough segmentation as a prior
 - manually tidying them up where necessary

In [None]:
"""
Read in the images + various segmentations
"""

import pathlib
import tifffile

segmentation_dir = pathlib.Path("segmentation")
assert segmentation_dir.is_dir()

dirs = [
    segmentation_dir / "images",
    segmentation_dir / "mask_priors",
    segmentation_dir / "sam_masks",
    segmentation_dir / "cleaned_masks",
]

names, images, rough_segmentations, sam_segmentations, clean_segmentations = (
    [],
    [],
    [],
    [],
    [],
)
for img_path, rough_path, sam_path, clean_path in zip(
    *(sorted(list(d.glob("*.tif"))) for d in dirs)
):
    name = img_path.name
    assert name == rough_path.name == sam_path.name == clean_path.name

    images.append(tifffile.imread(img_path))
    rough_segmentations.append(tifffile.imread(rough_path))
    sam_segmentations.append(tifffile.imread(sam_path))
    clean_segmentations.append(tifffile.imread(clean_path))
    names.append(name)

In [None]:
import textwrap
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2)
# Pick some different looking scales
for axis, i in zip(axes.flat, (0, 90, 180, 270)):
    axis.imshow(images[i])
    axis.set_title("\n".join(textwrap.wrap(names[i], width=20)), fontsize=8)
    axis.set_axis_off()
fig.suptitle("This is what a scale looks like")
fig.tight_layout()

In [None]:
"""
Show a couple of the segmentations at different stages on top of the actual images
"""


def plot_masks(masks, title):
    fig, axes = plt.subplots(6, 6, figsize=(6, 6))
    for axis, img, mask in zip(axes.flat, images, masks):
        axis.imshow(img, cmap="grey")
        axis.imshow(mask, alpha=0.5, cmap="Reds")
        axis.set_axis_off()

    fig.suptitle(title)
    fig.tight_layout()


plot_masks(
    rough_segmentations, "First we threshold to get a rough idea of the scale shape"
)

In [None]:
plot_masks(sam_segmentations, "Then we use the SAM model to refine the prediction")

In [None]:
plot_masks(
    clean_segmentations,
    "Then I went through by hand and cleaned them up a little in places",
)

Elliptical Fourier Analysis
----
We'll summarise their shapes using Elliptical Fourier Analysis (EFA)
<a name="cite_ref-1"></a><sup>[1]</sup>
<a name="cite_ref-2"></a><sup>[2]</sup>,
which basically decomposes the boundary into sums of ellipses.
The coefficients (strength and direction of each size of ellipse) tell us about the shape of the object.
There's a demonstration of how this works [here](https://reinvantveer.github.io/2019/07/12/elliptical_fourier_analysis.html).

Our edge is constructed as:

\begin{aligned}
x(t) &= a_0 + \sum_{n=1}^{N} \big[a_n \cos(n t) + b_n \sin(n t)\big],\\
y(t) &= c_0 + \sum_{n=1}^{N} \big[c_n \cos(n t) + d_n \sin(n t)\big],
\qquad t \in [0, 2\pi].
\end{aligned}

with:

\begin{aligned}
a_0 = \frac{1}{2\pi}\int_{0}^{2\pi} x(t)\,dt,\qquad
c_0 = \frac{1}{2\pi}\int_{0}^{2\pi} y(t)\,dt.
\end{aligned}

\begin{aligned}
a_n &= \frac{1}{\pi}\int_{0}^{2\pi} x(t)\cos(n t)\,dt, &
b_n &= \frac{1}{\pi}\int_{0}^{2\pi} x(t)\sin(n t)\,dt,\\
c_n &= \frac{1}{\pi}\int_{0}^{2\pi} y(t)\cos(n t)\,dt, &
d_n &= \frac{1}{\pi}\int_{0}^{2\pi} y(t)\sin(n t)\,dt.
\end{aligned}

possibly up to some factors of $2\pi$ or something.

The $a_0$ and $c_0$ coefficients tell us about the locus/centroid of the object - i.e., its centre - which we don't care about here, since the relative position of the scale doesn't matter (we only care about its shape). We therefore only use the coefficients starting from $a_1$ etc.

In [None]:
"""
First tidy the masks up a little, because I broke some of them when cleaning them
"""

import numpy as np
from scipy.ndimage import binary_fill_holes
from scale_morphology.scales.segmentation import largest_connected_component

masks = [
    255 * largest_connected_component(binary_fill_holes(m)).astype(np.uint8)
    for m in clean_segmentations
]

In [None]:
"""
Perform EFA on the scales and plot the reconstruction
"""

import numpy as np
from tqdm.notebook import tqdm
from scale_morphology.scales import efa, errors, segmentation


n_edge_points = 100
order = 30

coeffs = []
for scale in tqdm(masks):
    try:
        coeffs.append(efa.coefficients(scale, n_edge_points, order))
    except errors.BadImgError as e:
        coeffs.append(np.ones((order, 4)) * np.nan)
        print(f"\nError processing scale: {e}. NaN coeffs")
coeffs = np.stack(coeffs)

In [None]:
from scale_morphology.scales import plotting

i = 100
fig, axis = plt.subplots(figsize=(8, 8))
axis.imshow(images[i].sum(axis=2).T, origin="lower", cmap="grey")
axis.set_aspect("equal")

locus = np.mean(np.where(masks[i] > 0), axis=1)
plotting.plot_efa(
    locus,
    coeffs[i],
    label="Elliptic Expansion best fit",
    linewidth=3,
    color="#ff00fa",
    axis=axis,
)

x, y = efa.points_around_edge(masks[i], n_edge_points)
axis.plot(x, y, "#00ff05", markersize=3, label="Edges", marker="o", linestyle="none")

axis.set_axis_off()
axis.legend()

fig.suptitle("Scale, edge points and reconstruction")

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

images = np.array(masks)
flat_coeffs = coeffs.reshape((coeffs.shape[0], -1))

pca = PCA(n_components=2)
transformed_coeffs = np.ascontiguousarray(pca.fit_transform(flat_coeffs))

In [None]:
def _colour(name):
    name = name.lower()
    if "hom" in name:
        if "ontogenetic" in name:
            return "Hom Onto"
        return "Hom Regen"
    if "ontogenetic" in name:
        return "WT Onto"
    return "WT Regen"


colours = []
for name in names:
    colours.append(str(_colour(name)))
colours = np.array(colours)

fig, axis = plt.subplots()
for c in np.unique(colours):
    axis.scatter(*transformed_coeffs[colours == c].T, label=c)

axis.legend()
axis.set_xlabel(f"PC1 ({100*pca.explained_variance_ratio_[0]:.2f}% variance)")
axis.set_ylabel(f"PC2 ({100*pca.explained_variance_ratio_[1]:.2f}% variance)")
fig.suptitle("PCA")
fig.tight_layout()

In [None]:
from scale_morphology.scripts.plotting import interpret_dimensions
from IPython.display import Image

interpret_dimensions._plot_pca_importance(
    flat_coeffs, np.zeros(flat_coeffs.shape[0], dtype=bool)
)
Image(filename="../../output/interpretation/importance.png")

In [None]:
sizes = [np.sum(m) / 255 for m in tqdm(np.array(masks))]

In [None]:
plt.plot(sizes, transformed_coeffs[:, 0], ".")
plt.xlabel("Scale Size (pixels?)")
plt.ylabel("PC1")
plt.title("The first principal component corresponds tells us about size")
plt.tight_layout()

plot_dir = pathlib.Path("rabia")
plot_dir.mkdir(exist_ok=True)
plt.savefig(plot_dir / "sizes.png")

In [None]:
# interpret_dimensions._correlation_plot(transformed_coeffs, coeffs)
# Image(filename="../../output/interpretation/correlation.png", )

In [None]:
# n_to_plot = 12
# fig, axes = plt.subplots(1, 2)
#
# for axis, component in zip(axes, pca.components_):
#     axis.bar(np.arange(n_to_plot), component[:n_to_plot])

LDA
----
LDA is better for classification than PCA

In [None]:
fig, axis = plt.subplots()
idx = np.arange(flat_coeffs.shape[1])

val = 0.05
vars = flat_coeffs.var(axis=0)
cutoff = 1000

axis.bar(idx[(vars < val) & (vars < cutoff)], vars[(vars < val) & (vars < cutoff)])
axis.bar(idx[(vars > val) & (vars < cutoff)], vars[(vars > val) & (vars < cutoff)])
axis.set_ylabel("variance")
axis.set_title("Do we want to drop low-variance harmonics?")

In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

lda = LinearDiscriminantAnalysis()
lda_coeffs = lda.fit_transform(flat_coeffs, colours)

In [None]:
from itertools import combinations

ld_names = [f"LD{i}" for i in range(1, lda_coeffs.shape[1] + 1)]
explained_variance = lda.explained_variance_ratio_

ld_df = pd.DataFrame(lda_coeffs, columns=ld_names)
ld_df["colour"] = colours
ld_df["name"] = [x.strip(".tif") for x in names]

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for axis, (i, j), (k, var) in zip(
    axes, combinations(range(1, 4), 2), enumerate(explained_variance)
):  # 3 choose 2
    for cls in np.unique(colours):
        d = ld_df[ld_df["colour"] == cls]
        x, y = f"LD{i}", f"LD{j}"
        axis.scatter(d[x], d[y], s=25, label=cls)

        axis.set_xlabel(f"{x} ({100*explained_variance[i-1]:.1f}% variance)")
        axis.set_ylabel(f"{y} ({100*explained_variance[j-1]:.1f}% variance)")

fig.suptitle(f"Separation accuracy: {100*lda.score(flat_coeffs, colours):.1f}%")
axes[1].legend()
fig.tight_layout()

But what do these axes mean?
----
Ideally we want to understand intuitively what it means for LD1 to be the best axis for discrimination - e.g. does this mean the the Hom Onto scales are bigger/flatter/lumpier/... etc.?

We can do this in two ways:
 1. Analytically: by looking at the components of the LDA axes, we might be able to work out what they mean (e.g. if they correspond to the $a_0$ and $d_0$ coefficients, this means they're size or circularness or something). This is hard
 2. Empirically: we can take a grid of points along our LDA axes, project them back into the original coefficient space and then draw the shapes that these correspond to. This means we should be able to see by eye what each axis corresponds to.

### Analytically
The LD axes are linear combinations of the EFA coefficient axes - we'll plot the strength of the components of each EFA coefficient in our LD axes to see if anything jumps out.

In [None]:
from matplotlib.colors import LogNorm

fig, axis = plt.subplots(figsize=(12, 6))

im = axis.imshow(np.abs(lda.scalings_).T, aspect="auto", norm=LogNorm())
fig.savefig("")
axis.set_xlabel("EFA Coefficient N")
axis.set_ylabel("Importance for LDA Project")

axis.set_yticks(range(3), (f"LDC{i}" for i in range(1, 4)))

cbar = fig.colorbar(im)
cbar.ax.set_ylabel("Importance", rotation=-90)

fig.suptitle(
    "Only the first few EFA coefficients are important for separating the classes"
)

In [None]:
n_components = 4
fig, axes = plt.subplots(3, 1, figsize=(6, 6), sharex=True)

for i, (axis, strengths) in enumerate(zip(axes, lda.scalings_.T)):
    p = axis.bar(np.arange(n_components), strengths[:n_components], width=0.2)
    axis.bar_label(p, label_type="edge")
    axis.set_xlabel("")
    axis.set_xticks(range(4), [f"${x}_1$" for x in "abcd"])
    axis.axhline(0, color="k", linestyle="--")

axes[2].set_xlabel("EFA Coeff")
fig.supylabel("Contribution to LDA axes")

fig.suptitle(
    "Only the coefficients $b_0$ and $c_0$ contribute to separating the scale shapes"
)

fig.tight_layout()

What does this mean?
----
We can work out what this means by making some "pure" axes of just these things:
 1. Component 1 - almost equal contribution from $b_1$ and $c_1$
 2. Component 2 - 6:1 ratio of contributions from $b_1$ and $c_1$
 3. Component 3 - 1:-1 ratio of contributions from $b_1$ and $c_1$

Going back to our definitions, this means our components are:

 1. $x = k sin(t), y = k cos(t) \implies x^2 + y^2 = k^2 \implies$ a circle (clockwise)
 2. $x = 3k sin(t), y = 0.5k cos(t) \implies x^2 + 36y^2 = 9k \implies$ an ellipse with eccentricity $\frac{\sqrt{35}}{6}$
 3. $x = 0.5k sin(t), y = -0.5k cos(t) \implies 4x^2 + 4y^2 = k^2 \implies$ a smaller circle (now anticlockwise)

If we draw these out...

In [None]:
"""
Make some "toy" coefficients only up to first-order to illustrate the difference between these shapes
"""

from pyefd import reconstruct_contour

# Define our axes
# Scaled to have the same approximate relative magnitude as we found
axis_1 = np.array([[0, 1, 1, 0]])
axis_2 = np.array([[0, 3, 0.5, 0]])
axis_3 = np.array([[0, 0.5, -0.5, 0]])

# We'll make 8 example shapes, for each of the quadrants
toy_coeffs = []
co_ords = []
indices = [-1, 0, 1]
for i in indices:
    for j in indices:
        for k in indices:
            toy_coeffs.append(i * axis_1 + j * axis_2 + k * axis_3)
            co_ords.append(np.array([i, j, k]))

# Convert these to shapes
contours = [reconstruct_contour(c) for c in toy_coeffs]

In [None]:
"""
Plot them
"""

fig, axes = plt.subplot_mosaic(
    """
    ABC.DEF.GHI
    JKL.MNO.PQR
    STU.VWX.YZ1
    """,
    figsize=(8, 4),
    sharex=True,
    sharey=True,
)

for axis, c, co_ord in zip(axes.values(), contours, co_ords):
    axis.scatter(*c.T, s=0.5)
    axis.set_axis_off()
    axis.set_title(co_ord)

fig.tight_layout()

Empirically
----
Instead of trying to work out the shapes analytically, we can instead project our axes back into EFD coefficients and plot the resulting shapes.

In [None]:
"""
Turn our EFD axes back into shapes - see what they look like.
"""

axis_1, axis_2, axis_3 = lda.scalings_.T

# We'll make 8 example shapes, for each of the quadrants
toy_coeffs = []
co_ords = []
indices = [-1, 0, 1]
for i in indices:
    for j in indices:
        for k in indices:
            toy_coeffs.append(
                np.array([i * axis_1 + j * axis_2 + k * axis_3]).reshape((-1, 4))
            )
            co_ords.append(np.array([i, j, k]))

# Convert these to shapes
contours = [reconstruct_contour(c) for c in toy_coeffs]

In [None]:
fig, axes = plt.subplot_mosaic(
    """
    ABC.DEF.GHI
    JKL.MNO.PQR
    STU.VWX.YZ1
    """,
    figsize=(8, 4),
    sharex=True,
    sharey=True,
)

for axis, c, co_ord in zip(axes.values(), contours, co_ords):
    axis.scatter(*c.T, s=0.5)
    axis.set_axis_off()
    axis.set_title(co_ord)

fig.tight_layout()

Ok it's pretty hard to interpret these.

Instead, we'll find the average of each class along the LDA axes, back-project these into EFD coefficients and find what shapes they represent.


In [None]:
"""
Find the mean of each class along the LDA axes
"""

classes = np.unique(colours)
class_coords = {c: lda_coeffs[colours == c].mean(axis=0) for c in classes}

fig, axis = plt.subplots(subplot_kw={"projection": "3d"})

for name, c in class_coords.items():
    axis.scatter(*c, label=name)

axis.legend()
fig.suptitle("Class averages in LDA space")
fig.tight_layout()

In [None]:
"""
Project these back into EFD coefficients
"""

lda.scalings_.shape, [v.shape for v in class_coords.values()]

projected_classes = [
    np.dot(lda.scalings_, x).reshape(30, 4) for x in class_coords.values()
]

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(8, 8))

for axis, efd_coeffs, name in zip(axes.flat, projected_classes, classes):
    plotting.plot_efa((0, 0), efd_coeffs, axis=axis, marker=".")
    axis.set_aspect("equal")
    axis.set_axis_off()
    axis.set_title(name)

[^1](#cite_ref-1):  F. P. Kuhl and C. R. Giardina, ‘Elliptic Fourier features of a closed contour’, Computer Graphics and Image Processing, vol. 18, no. 3, pp. 236–258, Mar. 1982, doi: 10.1016/0146-664x(82)90034-x. 

[^2](#cite_ref-2): N. MacLeod, 'PalaeoMath 101 part 25: the centre cannot hold II: Elliptic fourier
analysis.' Palaeontol. Assoc. Newslett. 79, 29–43, 2012 http://go.palass.org/65a.

Appendix 1 - dashboard
----
It might be interesting to plot a dashboard showing the dimensionality reduction interactively.

In [None]:
from scale_morphology.scales import dashboard

dashboard.write_dashboard(
    lda_coeffs[:, 0:2],
    images,
    colours,
    names,
    "test_dashboard.html",
    "LDA Components 1&2",
)