# Introduction

This notebook explores the shapelet transformers implemented within aeon. For a wider understanding please refer to antoines notebook first.

if you want to learn about x y z then go to antoines

here youll learn about a b c

# A little bit about each transformer
- talk generally about the purpose of transforms as opposed to shapelet trees
- talk about their params and logic and contributions and flaws (write in order of release)

In [1]:
import warnings

warnings.filterwarnings("ignore")

from aeon.registry import all_estimators

for k, v in all_estimators("transformer", filter_tags={"algorithm_type": "shapelet"}):
    print(f"{k}: {v}")  # TODO: SAST is not appearing here waiting for PR to merge

RSAST: <class 'aeon.transformations.collection.shapelet_based._rsast.RSAST'>
RandomDilatedShapeletTransform: <class 'aeon.transformations.collection.shapelet_based._dilated_shapelet_transform.RandomDilatedShapeletTransform'>
RandomShapeletTransform: <class 'aeon.transformations.collection.shapelet_based._shapelet_transform.RandomShapeletTransform'>


## Shapelet transform

## Random Dilated Shapelet Transform

## Scalable and Accurate Subsequence Transform

## Random Scalable and Accurate Subsequence Transform

# The Gun/No Gun classification problem

The Gun/NoGun motion capture time series dataset is perhaps the most studied time
series classification problem in the literature.

This dataset involves one female actor and one male actor making a motion with their hand, sometimes holding a gun and sometimes not. The classification problem is to determine whether or not they were holding a prop or just miming the action. The problem is made somewhat more complicated by the fact that the two actors, differ in height (by 12 inches) and “style”.

The two classes are:

Gun-Draw:
- the actors have their hands by their sides. They draw a replicate gun from a hip-mounted holster, point it at a target for approximately one second, then return the gun to the holster, and their hands to their sides. 

Point:
- the actors have their gun by their sides. They point with their index fingers to a target for approximately one second, and then return their hands to their sides. 

For both classes, they study tracked the centroid of the actor's right hand in both X- and Y-axes, which appear to be highly correlated. Because of this, the data in the archive is just the X-axis - making this a univariate time series. Class 1 is "gun" and class 2 is "no gun (pointing)".

In [None]:
import numpy as np

from aeon.datasets import load_classification

X_gun_train, y_gun_train = load_classification("GunPoint", split="train")
X_gun_test, y_gun_test = load_classification("GunPoint", split="test")

X_gun_full = np.concatenate((X_gun_train, X_gun_test), axis=0)

print(f"Shape of the dataset: {X_gun_full.shape}")
print(f"Number of channels = {X_gun_train.shape[1]}")
print(f"Length of each time series = {X_gun_train.shape[2]}")
print(f"Number of training samples = {X_gun_train.shape[0]}")
print(f"Number of testing samples = {X_gun_test.shape[0]}")

As you can see, we have 200 different time series, each 150 datapoints long. The train/test split follows the original paper with 50 samples taken for training and the rest for testing, with each actor and class being equally represented in each. 

Note: Time series classification follows its on train/test split rather than the more general 70/30 found in wider ML. Eamonn, who was setting up the archive, chose to make the train sets smaller so that the classification problems would even harder to solve!

---


The two graphs below have the time series from the dataset plotted for each class.

*can we find a big difference between class 1 and 2?*  ---- The narrative will be that the data is noisy and so trying to use global patterns would make this pretty hard, instead shapelets will let us find locally discriminative subsequences to distinguish the two

In [None]:
import matplotlib.pyplot as plt

class_1_indices = []
class_2_indices = []

# Populate the class-specific lists
for i in range(0, 50):
    if y_gun_train[i] == "1":
        class_1_indices.append(i)
    elif y_gun_train[i] == "2":
        class_2_indices.append(i)

# Create a figure arranged horizontally
fig, axs = plt.subplots(1, 2, figsize=(15, 6), sharey=True)

# Plot the first class
for i in class_1_indices:
    axs[0].plot(X_gun_train[i][0])
axs[0].set_title("Time series with No Gun in the dataset.")
axs[0].set_ylim(-3, 2.5)  # Set the y-axis range for comparability
axs[0].legend(["Class 1"])

# Plot the second class
for i in class_2_indices:
    axs[1].plot(X_gun_train[i][0])
axs[1].set_title("Time series with Gun in the dataset.")
axs[1].set_ylim(-3, 2.5)  # Set the y-axis range for comparability
axs[1].legend(["Class 2"])

plt.tight_layout()
plt.show()

You can roughly make out the groups of time series of the female and male actor, as described in the dataset the male is 12 inches taller which can be seen by the taller and shorter time series groups

# Discuss how we will explore the transform, compare them, and hopefully understand something about the motion data

- track how long it takes to fit the transform
- show the head of a pandas df to explain how the transform looks
- generate 10 shapelets for each transform and plot all 10 on a graph, keep their positions (dont just layer them)
- choose a shapelet for each class and do the three plots from the visualisation tool
    - provide some insights on these

- compare the shapelets
    - compare their lengths
    - going to fit a classifier to help us rank shapelets

- run through different transformers with different configurations
    - this is an optional extension (i think the above is a sufficient NB)
        - maybe if we can obviously improve a transform with some parameter then do so

- Note: useful visualisation tools are ShapeletVisualizer, ShapeletTransformVisualizer

# Fitting the Transforms to the training data

Here is a dataframe representing the testing data, each row is a time series and each column in the value at each time point. 

In [None]:
import pandas as pd

timeseries_list = []
for timeseries in X_gun_test:
    timeseries_list.append(pd.DataFrame(timeseries))
pd.concat(timeseries_list, axis=0, ignore_index=True)

#### Random Shapelet Transform

The aeon implementation of the algorithm matches the experimental parameters explored in the Gunpoint problem, the only parameter which was required to be set was max_shapelets = 10. 

The paper filtered the GunPoint data set using the
length parameters specified in the original shapelet paper to allow for
a fair comparison between the two methods. For MAXLEN, the original always set the longest possible length to the length of
the shortest time series in the dataset. For MINLEN, they hardcoded the shortest possible length to three since three is the minimum meaningful length. 


You can use the transform directly in aeon, but we will mostly explore via the transform classifier because it lets us rank the shapelets.

In [None]:
import time

import pandas as pd

from aeon.transformations.collection.shapelet_based import RandomShapeletTransform

start_time = time.time()
rst = RandomShapeletTransform(max_shapelets=10, random_state=99).fit(
    X_gun_train, y_gun_train
)
end_time = time.time()

# Calculate and print the elapsed time
elapsed_time = end_time - start_time
print(f"Time taken to fit: {elapsed_time:.4f} seconds")

pd.DataFrame(rst.transform(X_gun_test)).head()

The table shows what the transformed data looks like,

In [None]:
import matplotlib.pyplot as plt

# Define the class of interest
gun_class = "1"
nogun_class = "2"

shapelets = rst.shapelets


shapelet_gun_vals = []
shapelet_gun_pos = []

shapelet_nogun_vals = []
shapelet_nogun_pos = []

for shapelet in shapelets:
    if shapelet[5] == gun_class:  # Filter by class
        shapelet_gun_vals.append(shapelet[6])
        shapelet_gun_pos.append(shapelet[2])

for shapelet in shapelets:
    if shapelet[5] == nogun_class:  # Filter by class
        shapelet_nogun_vals.append(shapelet[6])
        shapelet_nogun_pos.append(shapelet[2])

# Create a figure with 2 subplots arranged horizontally
fig, axs = plt.subplots(1, 2, figsize=(15, 6), sharey=True)

# Plot the first set of shapelets
for i in range(len(shapelet_gun_vals)):
    x_values = [x + shapelet_gun_pos[i] for x in range(len(shapelet_gun_vals[i]))]
    axs[0].plot(x_values, shapelet_gun_vals[i])

axs[0].set_title("Shapelets from rst_rccv (Class 0)")
axs[0].set_xlabel("Position")
axs[0].set_ylabel("Values")

# Plot the second set of shapelets
for i in range(len(shapelet_nogun_vals)):
    x_values = [x + shapelet_nogun_pos[i] for x in range(len(shapelet_nogun_vals[i]))]
    axs[1].plot(x_values, shapelet_nogun_vals[i])

axs[1].set_title("Shapelets from rst_rccv (Class 0)")
axs[1].set_xlabel("Position")
axs[1].set_ylabel("Values")


# Adjust layout to prevent overlap
plt.tight_layout()
plt.show()

Here, we are training a random forest classifier on the transformed data. The purpose is not to evaluate classification performance but to make use of the feature importance provided by the Random Forest to aid in comparing shapelets further on.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import RidgeClassifierCV

from aeon.classification.shapelet_based import ShapeletTransformClassifier

rst_rf = ShapeletTransformClassifier(
    estimator=RandomForestClassifier(ccp_alpha=0.01), max_shapelets=10, random_state=99
).fit(X_gun_train, y_gun_train)
# rst_rf._transformer = rst

rst_rccv = ShapeletTransformClassifier(
    estimator=RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)),
    max_shapelets=10,
    random_state=99,
).fit(X_gun_train, y_gun_train)
# rst_rccv._transformer = rst

In [None]:
import matplotlib.pyplot as plt

# First set of shapelets (from rst_rccv)
shapelets_rccv = rst_rccv._transformer.shapelets
shapelet_vals_rccv = []
shapelet_pos_rccv = []

for shapelet in shapelets_rccv:
    shapelet_vals_rccv.append(shapelet[6])
    shapelet_pos_rccv.append(shapelet[2])

# Second set of shapelets (from st_rf)
shapelets_rf = rst_rf._transformer.shapelets
shapelet_vals_rf = []
shapelet_pos_rf = []

for shapelet in shapelets_rf:
    shapelet_vals_rf.append(shapelet[6])
    shapelet_pos_rf.append(shapelet[2])

# Create a figure with 2 subplots arranged horizontally
fig, axs = plt.subplots(1, 2, figsize=(15, 6), sharey=True)

# Plot the first set of shapelets (rst_rccv)
for i in range(len(shapelet_vals_rccv)):
    x_values = [x + shapelet_pos_rccv[i] for x in range(len(shapelet_vals_rccv[i]))]
    axs[0].plot(x_values, shapelet_vals_rccv[i])

axs[0].set_title("Shapelets from rst_rccv")
axs[0].set_xlabel("Position")
axs[0].set_ylabel("Values")

# Plot the second set of shapelets (st_rf)
for i in range(len(shapelet_vals_rf)):
    x_values = [x + shapelet_pos_rf[i] for x in range(len(shapelet_vals_rf[i]))]
    axs[1].plot(x_values, shapelet_vals_rf[i])

axs[1].set_title("Shapelets from st_rf")
axs[1].set_xlabel("Position")

# Adjust layout to prevent overlap
plt.tight_layout()
plt.show()

We should see if they have the same shapelets, then compare how theyre ranked.

In [None]:
import pandas as pd

from aeon.visualisation import ShapeletClassifierVisualizer

rst_rccv_vis = ShapeletClassifierVisualizer(rst_rccv)
rst_rf_vis = ShapeletClassifierVisualizer(rst_rf)

# Define the lists from the output
rst_rf_vis_index_0 = rst_rf_vis._get_shp_importance(0)[0]
rst_rf_vis_index_1 = rst_rf_vis._get_shp_importance(1)[0]
rst_rccv_vis_index_0 = rst_rccv_vis._get_shp_importance(1)[0]
rst_rccv_vis_index_1 = rst_rccv_vis._get_shp_importance(0)[0]

# Create a dictionary to store the elements at each position in the lists
elements_in_position = {
    "Position": list(range(10)),
    "rst_rf_vis Index 0": rst_rf_vis_index_0,
    "rst_rf_vis Index 1": rst_rf_vis_index_1,
    "rst_rccv_vis Index 0": rst_rccv_vis_index_0,
    "rst_rccv_vis Index 1": rst_rccv_vis_index_1,
}

# Convert the dictionary to a DataFrame
pd.DataFrame(elements_in_position).set_index("Position")

As you can see different classifiers find the same shapeelts of different importance.
Lets look at each ones most important shapelet

In [None]:
# fig = rst_rf_vis.visualize_best_shapelets_one_class(
#     X_gun_test,
#     y_gun_test,
#     0,
#     figure_options={"figsize": (18, 12), "nrows": 2, "ncols": 2},
#     id_example_class = 1,
#     id_example_other = 1,
# #TODO: Waiting to PR to be accepted for this to work correctly
#     n_shp = 1,
# )

In [None]:
fig = rst_rccv_vis.visualize_best_shapelets_one_class(
    X_gun_test,
    y_gun_test,
    0,
    figure_options={"figsize": (18, 12), "nrows": 2, "ncols": 2},
    id_example_class=1,
    id_example_other=1,  # TODO: Waiting to PR to be accepted for this to work correctly
)

Now i want to plot the worst shapelet for each one for each class

In [None]:
import copy


def visualize_worst_shapelets_one_class(
    self,
    X,
    y,
    class_id,
    n_shp=1,
    id_example_other=None,
    id_example_class=None,
    class_colors=("tab:green", "tab:orange"),
    scatter_options={  # noqa: B006
        "s": 70,
        "alpha": 0.75,
        "zorder": 1,
        "edgecolor": "black",
        "linewidths": 2,
    },
    x_plot_options={"linewidth": 4, "alpha": 0.9},  # noqa: B006
    shp_plot_options={  # noqa: B006
        "linewidth": 2,
        "alpha": 0.9,
        "linestyle": "--",
    },
    dist_plot_options={"linewidth": 3, "alpha": 0.9},  # noqa: B006
    threshold_plot_options={  # noqa: B006
        "linewidth": 2,
        "alpha": 0.9,
        "color": "purple",
        "label": "threshold",
    },
    boxplot_options={  # noqa: B006
        "patch_artist": True,
        "widths": 0.6,
        "showmeans": True,
        "meanline": True,
        "boxprops": {"linewidth": 1.5},
        "whiskerprops": {"linewidth": 1.5},
        "medianprops": {"linewidth": 1.5, "color": "black"},
        "meanprops": {"linewidth": 1.5, "color": "black"},
        "flierprops": {"linewidth": 1.5},
    },
    figure_options={  # noqa: B006
        "figsize": (20, 12),
        "nrows": 2,
        "ncols": 3,
        "dpi": 200,
    },
    rc_Params_options={  # noqa: B006
        "legend.fontsize": 14,
        "xtick.labelsize": 13,
        "ytick.labelsize": 13,
        "axes.titlesize": 15,
        "axes.labelsize": 15,
    },
    matplotlib_style="seaborn-v0_8",
):
    from sklearn.preprocessing import LabelEncoder

    y = LabelEncoder().fit_transform(y)

    plt.style.use(matplotlib_style)
    plt.rcParams.update(**rc_Params_options)

    idx, _ = self._get_shp_importance(class_id)
    idx = idx[::-1]

    shp_ids = []
    i = 0
    while len(shp_ids) < n_shp and i < idx.shape[0]:
        if idx[i] not in shp_ids:
            shp_ids = shp_ids + [idx[i]]
        i += 1
    X_new = self.estimator._transformer.transform(X)
    mask_class_id = np.where(y == class_id)[0]
    mask_other_class_id = np.where(y != class_id)[0]
    if id_example_class is None:
        id_example_class = np.random.choice(mask_class_id)
    if id_example_other is None:
        id_example_other = np.random.choice(mask_other_class_id)
    figures = []
    for i_shp in shp_ids:
        fig, ax = plt.subplots(**figure_options)
        if ax.ndim == 1:
            n_cols = ax.shape[0]
        else:
            n_cols = ax.shape[1]

        # Plots of features boxplots
        i_ax = 0
        for title, box_data in self._get_boxplot_data(
            X_new, mask_class_id, mask_other_class_id, i_shp
        ):
            if ax.ndim == 1:
                current_ax = ax[i_ax % n_cols]
            else:
                current_ax = ax[i_ax // n_cols, i_ax % n_cols]
            current_ax.set_title(title)
            bplot = current_ax.boxplot(box_data, **boxplot_options)
            current_ax.set_xticklabels(["Other classes", f"Class {class_id}"])
            for patch, color in zip(bplot["boxes"], class_colors):
                patch.set_facecolor(color)
            i_ax += 1

        # Plots of shapelet on X
        x0_plot_options = copy.deepcopy(x_plot_options)
        x0_plot_options.update(
            {
                "label": f"Sample of class {y[id_example_other]}",
                "c": class_colors[0],
            }
        )
        if ax.ndim == 1:
            current_ax = ax[i_ax % n_cols]
        else:
            current_ax = ax[i_ax // n_cols, i_ax % n_cols]
        shp0_scatter_options = copy.deepcopy(scatter_options)
        shp0_scatter_options.update({"c": class_colors[0]})
        self.plot_on_X(
            i_shp,
            X[id_example_other],
            ax=current_ax,
            line_options=x0_plot_options,
            scatter_options=shp0_scatter_options,
        )

        x1_plot_options = copy.deepcopy(x_plot_options)
        x1_plot_options.update(
            {
                "label": f"Sample of class {y[id_example_class]}",
                "c": class_colors[1],
            }
        )
        shp1_scatter_options = copy.deepcopy(scatter_options)
        shp1_scatter_options.update({"c": class_colors[1]})
        self.plot_on_X(
            i_shp,
            X[id_example_class],
            ax=current_ax,
            line_options=x1_plot_options,
            scatter_options=shp1_scatter_options,
        )
        current_ax.set_title("Best match on examples")
        current_ax.legend()

        # Plots of shapelet values
        i_ax += 1
        if ax.ndim == 1:
            current_ax = ax[i_ax % n_cols]
        else:
            current_ax = ax[i_ax // n_cols, i_ax % n_cols]
        self.plot(
            i_shp,
            ax=current_ax,
            line_options=shp_plot_options,
            scatter_options=scatter_options,
        )

        # Plots of distance vectors
        i_ax += 1
        if ax.ndim == 1:
            current_ax = ax[i_ax % n_cols]
        else:
            current_ax = ax[i_ax // n_cols, i_ax % n_cols]
        d0_plot_options = copy.deepcopy(dist_plot_options)
        d0_plot_options.update(
            {
                "c": class_colors[0],
                "label": f"Distance vector of class {y[id_example_other]}",
            }
        )
        self.plot_distance_vector(
            i_shp,
            X[id_example_other],
            ax=current_ax,
            show_legend=False,
            show_threshold=False,
            line_options=d0_plot_options,
        )
        d1_plot_options = copy.deepcopy(dist_plot_options)
        d1_plot_options.update(
            {
                "c": class_colors[1],
                "label": f"Distance vector of class {y[id_example_class]}",
            }
        )
        self.plot_distance_vector(
            i_shp,
            X[id_example_class],
            ax=current_ax,
            line_options=d1_plot_options,
        )
        current_ax.legend()
        current_ax.set_title("Distance vectors of examples")
        figures.append(fig)
    return figures

In [None]:
fig = visualize_worst_shapelets_one_class(
    rst_rccv_vis,
    X_gun_test,
    y_gun_test,
    0,
    figure_options={"figsize": (18, 12), "nrows": 2, "ncols": 2},
    id_example_class=1,
    id_example_other=1,  # TODO: Waiting to PR to be accepted for this to work correctly
)

In [None]:
# fig = visualize_worst_shapelets_one_class(
#     rst_rf_vis,
#     X_gun_test,
#     y_gun_test,
#     0,
#     figure_options={"figsize": (18, 12), "nrows": 2, "ncols": 2},
#     id_example_class = 1,
#     id_example_other = 1,
# #TODO: Waiting to PR to be accepted for this to work correctly
# )

#### Random Dilated Shapelet Transform

This paper did not explore the Gunpoint problem, however the default parameters were defined as:

- proportion of z-normalised shapelets = 0.8
- number of shapelets to generate = 10000
- the set of possible lengths of shapelets = [11]
- The percentile boundaries used to sample the occurrence threshold: P1 = 5, P2 = 10

In [None]:
import time

from aeon.transformations.collection.shapelet_based import (
    RandomDilatedShapeletTransform,
)

shapelet_lengths = array = list(range(3, 151))

start_time = time.time()
rdst = RandomDilatedShapeletTransform(
    max_shapelets=10, shapelet_lengths=shapelet_lengths
).fit(X_gun_train, y_gun_train)
end_time = time.time()

# Calculate and print the elapsed time
elapsed_time = end_time - start_time
print(f"Time taken to fit: {elapsed_time:.4f} seconds")

pd.DataFrame(rdst.transform(X_gun_test)).head()

Here, we are training a random forest classifier on the transformed data. The purpose is not to evaluate classification performance but to make use of the feature importance provided by the Random Forest to aid in comparing shapelets further on.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import RidgeClassifierCV

from aeon.classification.shapelet_based import RDSTClassifier

rdst_rf = RDSTClassifier(
    estimator=RandomForestClassifier(ccp_alpha=0.01), random_state=0
).fit(X_gun_train, y_gun_train)

rdst_rccv = RDSTClassifier(
    estimator=RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)), random_state=0
).fit(X_gun_train, y_gun_train)

#### Scalable and Accurate Subsequence Transform

In [None]:
import time

from aeon.transformations.collection.shapelet_based import SAST

start_time = time.time()
sast = SAST().fit(X_gun_train, y_gun_train)
end_time = time.time()

# Calculate and print the elapsed time
elapsed_time = end_time - start_time
print(f"Time taken to fit: {elapsed_time:.4f} seconds")

pd.DataFrame(sast.transform(X_gun_test)).head()

Here, we are training a random forest classifier on the transformed data. The purpose is not to evaluate classification performance but to make use of the feature importance provided by the Random Forest to aid in comparing shapelets further on.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import RidgeClassifierCV

from aeon.classification.shapelet_based import SASTClassifier

sast_rf = SASTClassifier(classifier=RandomForestClassifier(ccp_alpha=0.01), seed=0).fit(
    X_gun_train, y_gun_train
)


sast_rccv = SASTClassifier(
    classifier=RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)), seed=0
).fit(X_gun_train, y_gun_train)

#### Random and Scalable Subsequence Transform

In [None]:
import time

from aeon.transformations.collection.shapelet_based import RSAST

start_time = time.time()
rsast = RSAST().fit(X_gun_train, y_gun_train)
end_time = time.time()

# Calculate and print the elapsed time
elapsed_time = end_time - start_time
print(f"Time taken to fit: {elapsed_time:.4f} seconds")

pd.DataFrame(rsast.transform(X_gun_test)).head()

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import RidgeClassifierCV

from aeon.classification.shapelet_based import RSASTClassifier

rsast_rf = RSASTClassifier(
    classifier=RandomForestClassifier(ccp_alpha=0.01), seed=0
).fit(X_gun_train, y_gun_train)

rsast_rccv = RSASTClassifier(
    classifier=RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)), seed=0
).fit(X_gun_train, y_gun_train)

# Interpreting the shapelets

### Random Shapelet Transform

In [None]:
import matplotlib.pyplot as plt

shapelets = rst.shapelets
shapelet_vals = []
shapelet_pos = []

for shapelet in shapelets:
    shapelet_vals.append(shapelet[6])
    shapelet_pos.append(shapelet[2])

for i in range(len(shapelet_vals)):
    x_values = [x + shapelet_pos[i] for x in range(len(shapelet_vals[i]))]
    plt.plot(x_values, shapelet_vals[i])

# Show the plot
plt.show()

In [None]:
from matplotlib import pyplot as plt

from aeon.visualisation import ShapeletTransformerVisualizer

rst_vis = ShapeletTransformerVisualizer(rst)
id_shapelet = 3  # Identifier of the shapelet

fig_rst, ax_rst = plt.subplots(ncols=3, figsize=(15, 5))
rst_vis.plot(
    id_shapelet,
    ax=ax_rst[0],
    scatter_options={"c": "purple"},
    line_options={"linestyle": "-."},
)
rst_vis.plot_on_X(
    id_shapelet,
    X_gun_test[10],
    ax=ax_rst[1],
    line_options={"linewidth": 3, "alpha": 0.5},
)
ax_rst[1].set_title(f"Best match of shapelet {id_shapelet} on X")
rst_vis.plot_distance_vector(
    id_shapelet, X_gun_test[10], ax=ax_rst[2], line_options={"c": "brown"}
)
ax_rst[2].set_title(f"Distance vector of shapelet {id_shapelet} on X")

### Random Dilated Shapelet Transform

In [None]:
from matplotlib import pyplot as plt

from aeon.visualisation import ShapeletTransformerVisualizer

rdst_vis = ShapeletTransformerVisualizer(rdst)
id_shapelet = 2  # Identifier of the shapelet

fig_rdst, ax_rdst = plt.subplots(ncols=3, figsize=(15, 5))
rdst_vis.plot(
    id_shapelet,
    ax=ax_rdst[0],
    scatter_options={"c": "purple"},
    line_options={"linestyle": "-."},
)
rdst_vis.plot_on_X(
    id_shapelet,
    X_gun_test[1],
    ax=ax_rdst[1],
    line_options={"linewidth": 3, "alpha": 0.5},
)
ax_rdst[1].set_title(f"Best match of shapelet {id_shapelet} on X")
rdst_vis.plot_distance_vector(
    id_shapelet, X_gun_test[1], ax=ax_rdst[2], line_options={"c": "brown"}
)
ax_rdst[2].set_title(f"Distance vector of shapelet {id_shapelet} on X")

### SAST

In [None]:
from aeon.visualisation import ShapeletTransformerVisualizer

sast_vis = ShapeletTransformerVisualizer(sast)
id_shapelet = 0  # Identifier of the shapelet

fig = sast_vis.plot_on_X(id_shapelet, X_gun_test[0], figure_options={"figsize": (7, 4)})

In [None]:
from matplotlib import pyplot as plt

fig, ax = plt.subplots(ncols=3, figsize=(15, 5))
sast_vis.plot(
    id_shapelet,
    ax=ax[0],
    scatter_options={"c": "purple"},
    line_options={"linestyle": "-."},
)
sast_vis.plot_on_X(
    id_shapelet, X_gun_test[1], ax=ax[1], line_options={"linewidth": 3, "alpha": 0.5}
)
ax[1].set_title(f"Best match of shapelet {id_shapelet} on X")
sast_vis.plot_distance_vector(
    id_shapelet, X_gun_test[1], ax=ax[2], line_options={"c": "brown"}
)
ax[2].set_title(f"Distance vector of shapelet {id_shapelet} on X")

## Misc

In [None]:
from collections import Counter

shapelets = rst.shapelets
classes = []
for shapelet in shapelets:
    classes.append(shapelet[5])
Counter(classes)

# Here we can see that the RST has made 5 shapelets for each class
# docstring says Each class value will have its own max,
# set to n_classes / max_shapelets - enforcing same amount

we can see that the NoGun class has a “dip” where the actor puts her hand down by
her side, and inertia carries her hand a little too far and she is forced to correct for it
(a phenomenon known as “overshoot”). In contrast, when the actor has the gun, she
returns her hand to her side more carefully, feeling for the gun holster, and no dip is
seen.

In [19], they identified that the most important shapelet for classification was when the actor lowered their arm; if they had no gun, a phenomenon called overshoot occurred and caused a dip in the data.
<br>
![image.png](attachment:image.png)
<br>
The shapelet decision tree trained by [19] contains a single
shapelet corresponding to the arm being lowered back into
position at the end of the series.

- To demonstrate that our
filter agrees with this and extracts the important information
from the data, we filtered the GunPoint data set using the
length parameters specified in the original paper to allow for
a fair comparison between the two methods. The top five
shapelets that we extracted are presented in Figure 5, along
with the shapelet reported by [19].
- The graphs in Figure 5 show that each of the top five
shapelets from our filter were very closely matched with
the shapelet from [19], reinforcing the notion that our filter
produces interpretable results. Furthermore, if we extract
the top ten shapelets from the filter we can gain even further
insight. Figure 6 shows that the top ten shapelets form two
distinct clusters. Interestingly, the shapelets to the left of the
figure correspond to the moments where the arm is lifted and
are instances where there is a gun. These shapelets could
correspond to the subtle extra movements required to lift the
prop, aiding classification by providing more information.
![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)



In the case where subsampling completely
removes one actor from the training data, the performance on the test set, where the two
actors are present, could be reduced.