-
Notifications
You must be signed in to change notification settings - Fork 0
Boxplot Visualization
The main functionality of Appollon is the visualization of the users Spotify Data. This page aims to explain the reasons why Box-Whisker-Plots were chosen to represent the audio features of songs.
Boxplots are graphs used to represent a data set's
- minimum
the lowest data point in the set excluding outliers - maximum
the highest data point in the set excluding outliers - median
the middle value in the data set - first quartile
the median of the lower half of the data set - third quartile
the median of the upper half of the data set - interquartile range
the distance between the upper and lower quartiles
Further information on boxplots can be found here and here.
 currently not visible because the license has to be checked
Different parts of a boxplot | image Michael Galarnyk taken from [1]
Boxplots can easily handle large amounts of data, because it is represented by the key metrics mentioned above [2]. This is particularly important, because it is unclear how many songs are in the playlists of users. Furthermore, just like histograms, boxplots provide a clear summary of a data set in a visual way, making it easy to compare data [2]. This is important, because a major use-case of Appollon is the comparison of playlists and personal statistics, as discussed on market analysis. Another advantage boxplots provide is the representation of outliers, which are often omitted in other graphs [2].
While there was no official beta-testing, multiple possible users who viewed the site or prototype during different stages of development gave the feedback that the boxplots were not intuitive. Empirical evidence also suggests that boxplots are often misinterpreted [3].
To avoid users not knowing how to read the boxplots representing their data, a section on the about page was added, explaining how the boxplots should be interpreted.
Screenshot of the about page detailing how to read the boxplots
While there are some react libraries providing boxplot components (e. g. this, this) none of these fit the use case of Appollon. Therefore it was decided to write a boxplot component from scratch.
The approach chosen was to first normalize the data, then calculate all the statistics and use those to render an SVG. There were several challenges to overcome with this, a very detailed summary of the implementation can be found in this comment.
There are a few different types of this boxplot component to help in different use-cases:
The basic boxplot function takes in props containing styling data and statistical data and then displays those using SVG.
const BoxPlot = (props: StylingProps & StatisticalBoxPlotProps) => {
return (
<div className="w-full">
<svg width={props.svgWidth} height={props.svgHeight}>
<line
x1={`${props.lowerWhisker}%`}
y1="0%"
x2={`${props.lowerWhisker}%`}
y2="60%"
stroke={props.textColor}
strokeWidth="2"
/>
[...]
</svg>
</div>
);
};
This calculates the values needed to display a boxplot from a provided array of data.
const calculateData = (
data: number[],
scale: number[],
svgWidth: number,
svgHeight: number
) => {
...
return {
median: data[Math.floor(data.length / 2)],
q1: q1,
q3: q3,
lowerWhisker: lowerWhisker,
upperWhisker: upperWhisker,
outliers: data.filter((x) => x < lowerWhisker || x > upperWhisker),
svgWidth: svgWidth,
svgHeight: svgHeight,
scaleStep: 92 / (scale.length - 1),
offset: offset,
scale: scale,
};
};
This component takes in raw data, calls the calculation function and displays it using the Boxplot component from above.
export const BoxPlotWithoutPopover = (props: StylingProps & DataProps) => {
const stats = calculateData(
props.data,
props.scale,
props.svgWidth,
props.svgHeight
);
return <BoxPlot {...{ ...stats, ...props }} />;
};
Calling this component like this:
<BoxPlotWithoutPopover
data={[
2.323, 2, 4, 2, 1.43, 2.12, 5.323, 1.123, 2.234, 3.092, 2.32, 3.23,
2.32,
]}
scale={[1, 2, 3, 4, 5, 6, 7]}
textColor={theme.extend.colors.textDark}
boxColor={theme.extend.colors.primary}
medianColor={theme.extend.colors.textLight}
svgHeight={100}
svgWidth={384}
/>
Produces the following
Screenshot of a basic boxplot
This component adds a Popover panel triggered by clicking the boxplot which displays the median, lower- and upper whisker, the amount of outliers and two quartiles.
export const BoxPlotWithPopover = (props: StylingProps & DataProps) => {
...
return (
<Popover className="relative">
<Popover.Button className="focus:outline-none">
<BoxPlot {...{ ...stats, ...props }} />
</Popover.Button>
<Popover.Panel>
<Popover.Button className="absolute z-10 w-full top-0 ...">
<div className="grid grid-cols-2 justify-items-start">
<p className="col-span-2 justify-self-center">Stats:</p>
<p>left whisker: {stats.lowerWhisker}</p>
<p>right whisker: {stats.upperWhisker}</p>
<p>q1: {stats.q1}</p>
<p>q3: {stats.q3}</p>
<p>median: {stats.median}</p>
<p>amount of outliers: {stats.outliers.length}</p>
</div>
</Popover.Button>
</Popover.Panel>
</Popover>
);
};
This component is called in the same way as the one without popover, the popover has not yet been styled properly, currently it looks as follows:
Screenshot of a boxplot with active popover
In comparison to the other boxplot components this one directly takes in the values needed for the creation of the SVG, eliminating the need to pass in the whole dataset. This is necessary because otherwise the backend would have to send all audio-features for all songs in a playlist, which would increase loading time dramatically. With this it is possible to calculate the needed values on the server. Currently it only returns the basic boxplot component, but there will probably be some more operations, that will take place in this wrapper, which is why it seemed appropriate to already create it.
export const PreCalculatedBoxPlot = (
props: StylingProps & StatisticalBoxPlotProps
) => {
return <BoxPlot {...props} />;
};
This component is called like this:
<PreCalculatedBoxPlot
{...{
...data,
textColor: theme.extend.colors.textDark,
boxColor: theme.extend.colors.primary,
medianColor: theme.extend.colors.textLight,
svgHeight: 100,
svgWidth: 384,
}}
/>
The output visualization does not differ from the boxplot without popover.
[1] Brennan Whitfield, 2023 Understanding Boxplots accessed 28.07.2023
[2] Alice Ladkin, 2018 Advantages & Disadvantages of a Box Plot accessed 28.07.2023
[3] Stephanie Lem et. al, 2012 The heuristic interpretation of box plots accessed 28.07.2023
- continually load the data into a histogram
- this can then be displayed as a boxplot
- when more data is loaded, the histogram and boxplot update