HIE provides a hierarchical level-of-detail visualization for large-scale image datasets that allows for an interactive exploration of the data. You can try HIE online with an example dataset provided by us.
This is joint work by Alex Bäuerle, Christian van Onzenoodt, Daniel Jönsson, and Timo Ropinski. HIE will be published as a short paper at EuroVis 2023.
The image explorer offers a view of groups of images in a certain alignment on a 2D plane. To explore the data, zooming and panning is possible. Zooming in will reveal continuously smaller groups until only one image is visible. Every group can be selected to reveal additional information.
Selecting groups (and single images) can be done in two ways:
- Selecting them one by one with a click
- Using the lasso tool (
ALT
or button on top)
We support selecting two none overlapping sets of groups and offer various ways to compare them.
To switch between selecting group A and B use x
or the button next to the lasso tool.
To unselect all groups at once, use ESC
.
The dataset can be filtered using a simple UI and more complex queries. To add a filter, open the right panel and click "add filter". If multiple filters are present, they are processed in order. Filters can be concatenated in with OR
and AND
. These operations apply to all the previous filters in the list.
Under each filter, the corresponding arquero query is displayed. These can be directly edited to achieve more complex filter operations.
In the bottom right of the view is a minimap. The minimap always displays the groups of the unfiltered dataset.
Once one or multiple groups are selected, the left sidebar opens, and the user is provided with additional information on the selected groups:
- the representative image of the selections
- the number of images in each group as a bar chart
Furthermore, if the dataset provides adequate columns such as labels, outlier score or probability, the user can select the columns in the dropdown menu to be displayed in a graph. Subsequently, discrete values are displayed as a mirrored bar chart (if two selections are chosen) or as a pie chart (if only one selection is chosen). Continuous values are displayed as a box plot or histogram.
Settings can be changed in the settings menu (gearwheel) in the upper right corner. You can alternate the following:
-
Resolution
- determines how many hexagons will be layed over the data and will be displayed on screen (higher resolution -> more hexagons, vice versa)
- effectively changes the number of columns displayed (standard setting is 10)
- note: since hexagons have to be shifted by half a hexagon every row to get a perfect fit, only columns with hexagons having the same height are counted as one column and too high values (>20) may lead to performance issues
-
Color query
- This property determines the hexagon outline color. It can be modified with an arquero expression. If the query results in a number, a continuous color scale is applied. One can switch between different scales. For other values, the categorical tableau-10 palette is used. If needed, the colors may repeat.
To use HIE on your own data, the data needs to be processed before it can be used in HIE. To get started, clone this repository and initialize it as follows:
- run
yarn
to initialize the project and install all necessary packages - run
yarn build
to compile and build the project
A dataset needs a descriptive .arrow
file this can be obtained in different ways:
-
Predefined data (mnist, flowers, cifar-10)
- Can be automatically downloaded with the
data_provider.py
script data_provider.py
can be easily expanded to support other image datasets where the folder of an image determines its class
- Can be automatically downloaded with the
-
Custom data
- The expected Arrow IPC file requires the following columns:
image_id
a unique identifierfile_path
relative path to image file- [optional] more columns containing data (e.g.
label
,classification
,probability
, ...) - [optional] representation of the image (e.g.
activations
,feature_vector
, ...)
- The expected Arrow IPC file requires the following columns:
-
Dimensionality reduction
- For dimensionality reduction the backend expects an arrow IPC table containing: (
id
,x
,y
). These can be automatically generated withdata_processing.py
. The scripts supports 3 image representations (-enc
):pixels
raw pixel datavgg-16
feature vector generated from a pre-trained VGG-16 modelarrow.<column>
take custom data from the dataset description table
- For dimensionality reduction the backend expects an arrow IPC table containing: (
The backend takes in a configuration file (-c
) this contains 3 properties:
table
path to dataset arrow IPC tablepoints2d
path to dimensionality reduction arrow IPC tableimgDataRoot
path to the relative root of image paths
Running yarn start -c [config_path]
will then start the backend server.
- Adjust
SERVER_ADDRESS
insrc/config.ts
to the backend IP. - Build static site with
yarn build
and then start it withyarn start
(or run dev-mode withyarn dev
for live preview).
We have an exported table raw_data.arrow
containing the columns: (image_id
, file_path
, label
, prediction
, probability
, activations
). activations
contains the representation of the image we want to use going forward. We put the table in backend/data/example/raw_data.arrow
and the images in backend/data/example/images
.
One row might look like this:
image_id | file_path | label | prediction | probability | activations |
---|---|---|---|---|---|
'01' | 'data/example/images/01.jpg' | 'bird' | 'plane' | 0.2543 | [0.2342,...,0.3243] |
Then, we run dimensionality reduction: python data_processing.py data/example/raw_data.arrow data/example/ -enc arrow.activations -dim umap
this generates the file data/example/raw_data_umap.arrow
.
Now, we write a config for the server:
// backend/configurations/config_example.json
{
"table": "data/example/raw_data.arrow",
"points2d": "data/example/raw_data_umap.arrow",
"imgDataRoot": "../"
}
We can then start the backend server with cd backend && yarn start -c configurations/config_example.json
.
The frontend needs no further adjustment to different datasets and can be started with cd hierarchical-image-explorer && yarn build && yarn start
.