This is a visualisation of a machine learning technique known as k-means clustering
for Matt Parker's Christmas Tree project. It accepts coordinates in GIFT format as coords.csv
and will output a CSV file that represents the light sequence that visualises
k-means clustering and is compatible with Matt's Christmas tree display code as kmeans_vis.csv
.
K-means clustering is a widely studied an utilised algorithm in the field of
machine learning. The goal is to take some data and to partition it into k
clusters, then we can take new data and assign it to one of those clusters. It does
this by trying to find the centre of the clusters, then assigning points of data
to a cluster based on which of the centres it is closest to. What you can do is
start of with initial centre positions, find the clusters generated by those centre
positions, then find the means of those clusters and set those as our new centre
positions. If we iterate this multiple times, our model will eventually converge,
giving us our clusters. It's designed to work on data that is grouped in some way
to try and predict what group a new data point should fit into, which isn't exactly
applicable to a Christmas tree where the points are largely random, but I've found
that this gives us both a good enough sample of data, even if it's not ideal, and
an excellent means to visualise what the algorithm is doing.
Essentially, what this does is it'll take the position of the lights as its data, and it will display each iteration as colours on those lights. Each light will be given a colour based on which cluster it is a part of, and the light that is closest to the centre of each cluster will be a different colour. It's the light closest to the centre of the cluster rather than the centre of the cluster itself as the centre of the cluster might be in between lights, so we just take the closest light as a good enough approximation for this visualisation.
In the source of this code, there is Constants.hs
I have laid it out as best
I can so you don't need intimate knowledge of Haskell to change it (though you
will need to have stack
set up on your machine to run the code). By default,
it will visualise k = 2
, then k = 3
, then k = 5
, then k = 7
, with colours as dictated
by that file, but these can be changed. You can also speed up or slow down the rate
at which the iterations are displayed. If you don't have stack
installed on your
machine, this comes preloaded with a kmeans_vis.csv
that uses the default parameters
that you can just run on the tree visualiser linked in resources.
- xmastree2021 - The project that this was created for. It collates a bunch of light sequences like this one.
- MPTree - You can load
coords.csv
andkmeans_vis.csv
into this to get a quick visualisation of the output.