This is the repo for the annotation tool developed to annotate CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer. Code for training models on CSAW-M could be found here. In order to generally understand how the tool is organized, please refer to Section "CSAW-M dataset creation" of our paper.
Below you can see and example window of our annotation tool with query and reference images shown side by side.
- Create a conda environemnt using the provided
env.yml
file using:conda env create -f env.yml
. This will create create an envoronment wih nameannotation2
. - Make sure the environment is active:
conda activate annotation2
- Download the CSAW-M dataset (TODO: ADD THE LINK HERE)
- Create a folder named
data
in the project folder (beside thesrc
folder) - The images that you want to sort perfectly should be in
data/test_imgs
and the ones that you want to put in different bins should be indata/train_imgs
.
-
--session_name
: specifies the name of the session, could be eithersort
(for sorting images) orsplit
(for splitting the perfectly sorted list into bins, see examples below). -
--data_mode
: specifies the mode of data that is being rated, could be eithertest
(for perfectly sortingdata/test_imgs
) ortrain
(to sortdata/train_imgs
into bins). -
--annotator
: the name of the annotator, this should always be provided when beginning a session for rating test/train images. -
--new
or--already
: If it is the first time an annotator is using the tool, they should use--new
, otherwise they should use--already
. -
--max_imgs_per_session
: the number images that one should rate in each session. This could be set to a small number of shorter sessions. -
--ui_verbosity
: determines how much the UI should be verbose (default: 1). Use a value of 2 to also see the image names, a value of 3 to also see the search intervals and other details in the UI. A value of 4 is used for debugging (automatically set when using--debug
). -
--resize_factor
: determines how much the images should be resized. -
--debug
: provides verbose info on the window useful for debugging.
-
Step 1: the program used to first perfectly sort
data/test_imgs
:
python main.py --annotator [ANNOTATOR] --new --session_name sort --data_mode test
The output will be saved tooutputs_test/output_[ANNOTATOR]
. -
Step 2: Once we have created a perfectly sorted list of image names in
outputs_test/output_[ANNOTATOR]
, we can split the sorted list into 8 bins, using the following command (no need to specify--already
or--data_mode
, but step 1 should be complete at this point):
python main.py --annotator [ANNOTATOR] --session_name split --n_bins 8
This will createoutputs_train/output_[ANNOTATOR]
with text files corresponding to the image names in each bins. -
Step 3: We can now sort the images in
data/train_imgs
into the created bins:
python main.py --annotator [ANNOTATOR] --already --session_name sort --data_mode train
Please feel free to contact us in case you have any questions or suggestions!