The CORSMAL Challenge evaluation toolkit

This repository contains the official evaluation toolkit for the CORSMAL Challenge. The version of this toolkit can be used only with the train set of the CORSMAL Containers Manipulation dataset. The toolkit provides transparency in how the evaluation is performed, and enables the verification of the output using example submissions with annotation or random values as estimations.

The repository also contains the submission forms as example for the train set, annotation files (including filling densities per container) and a demo with examples of submissions (see metadata.md for details of each submission using the annotation values, random-generated values, and the average of the annotations for container capacity, mass, dimensions).

Moreover, we provide a script to parse the dataset: data_parser. The script can be used to integrate novel solutions. More details in the README of the data_parser.

Documents:

Technical details of performance measures and scores: PDF
Submission form for the public testing set: here

Requirements:

Python (Tested with 3.8.3)
Numpy (Tested with 1.18.5)
Pandas (Tested with 1.2.3)
Scikit-learn (Tested with 0.24.1)

Challenge venues

Current venue

Audio-visual object classification for human-robot collaboration @2022 IEEE International Conference on Acoustic, Speech, and Signal Processing

Important dates

Date
February 10, 2022 (11.59 AoE)	Paper acceptance notification and release of the results in the leaderboards
February 16, 2022 (11.59 AoE)	Camera-ready papers due

Past venues

Data format

Input

The input to the evaluation script is a .csv file that contains the following fields for each row:

Configuration ID	Container capacity	Container mass	Filling mass	None	Pasta	Rice	Water	Filling type	Empty	Half-full	Full	Filling level	Width at the top	Width at the bottom	Height	Object safety	Distance	Angle difference	Execution time

Note that each row should match the row of the annotation file for the train set, i.e., the configuration ID.

Container capacity [mL]:

-1: no estimation
x: estimated value (>=0)

Container mass [g]:

-1: no estimation
x: estimated value (>=0)

Filling mass [g]:

-1: no estimation
x: value computed by the script using filling level, filling type, and container capacity estimations

None, Pasta, Rice, Water:

-1: no estimation
x: estimated probability [0,1]

Filling type:

-1: no estimation
0: no filling
1: pasta
2: rice
3: water

Empty, Half-full, Full:

-1: no estimation
x: estimated probability [0,1]

Filling level:

-1: no estimation
0: empty
1: half-full (50%)
2: full (90%)

Width at the top [mm]:

-1: no estimation
x: estimated value (>=0)

Width at the bottom [mm]:

-1: no estimation
x: estimated value (>=0)

Height [mm]:

-1: no estimation
x: estimated value (>=0)

Object safety:

-1: no estimation
x: estimated probability [0,1] (Note that this value will be estimated by the organisers when running the simulator)

Distance [mm]:

-1: no estimation
x: estimated value (>=0) (Note that this value will be estimated by the organisers when running the simulator)

Angle difference [deg]:

-1: no estimation
x: estimated value (>=0) (Note that this value will be estimated by the organisers when running the simulator)

Execution time [ms]:

-1: no estimation
x: estimated value (>=0)

Output

The evaluation script outputs a .csv file (e.g., res_train_set.csv for the train set) with the results (percentage) of each submission per row:

Team	s1	s2	s3	s4	s5	s6	s7	s8	s9	s10	Overall	JFTL	CMD

s1: score for the filling level classification task (T1)
s2: score for the filling type classification task (T2)
s3: score for the container capacity estimation task (T3)
s4: score for the container width at the top estimation (container dimensions task, T5)
s5: score for the container width at the bottom estimation (container dimensions task, T5)
s6: score for the container height (container dimensions task, T5)
s7: score for the container mass estimation task (T4)
s8: score for the filling mass estimation (computed by the toolkit from T1, T2, and T3)
s9: score for the object safety estimation (estimated by the simulator run by the organisers)
s10: score for the delivery accuracy (estimated by the simulator run by the organisers)
Overall: aggregation (average) of the previous 10 performance scores (in percentage) and weighed based on the number of tasks submitted.
JFTL: score for the joint filling type and level classification (additional leaderboard)
CMD: score for the group tasks of container capacity and dimensions estimation (additional leaderboard)

Only submissions which include the source code for the evaluation on the private CCM test set will valid for the ranking. Source codes that are not reproducible will get a 0 score.

Note: The score for filling mass estimation is not a linear combination of the scores outputted for T1, T2, and T3. The score takes into consideration the formula for computing the filling mass based on the estimations of each task for each configuration. s8 is weighed by the number of tasks executed among filling level classification, filling type classification, and container capacity estimation.

Note that s9 and s10 are weighed by the number of tasks executed (up to 5), whereas s4, s5, and s6 are weighed by 3 as they belong to the same task (container dimensions estimation).

More details in the technical document: PDF

Demo

An example of submission file with random estimations can be found in submissions/train_set/. Additional examples can be generated using the bash script submissions/run_subm_gen.sh.

Please see submissions/metadata.md for details about the generated files that use either annotations, random values, or the average of the containers in the train set (only for container mass, capacity, and dimensions) as estimations.

To reproduce the results on for the example submissions on the train set, run

cd submissions/
source run_subm_gen.sh
cd ..
source run_eval_demo.sh

The evaluation script can be run with the following options:

--submission: path and filename of the submission file (e.g., submissions/pub_test/teamname.csv)
--set: choose between train, test_pub, test_priv, and test_comb

Note that these example submissions do not contain estimations for object safety, distance, and angle difference, and hence results for the scores s8 and s8 are 0, and the overall score is up to maximum 80 instead of 100.

For reproducibility, we report the results in the table below.

Team	s1	s2	s3	s4	s5	s6	s7	s8	overall	JFLT
test1	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	80.00	100.00
test2	100.00	0.00	0.00	0.00	0.00	0.00	0.00	39.00	2.78	0.00
test3	0.00	0.00	0.00	0.00	0.00	0.00	100.00	0.00	2.00	0.00
test4	100.00	100.00	0.00	0.00	0.00	0.00	0.00	39.00	9.56	100.00
test5	100.00	100.00	100.00	0.00	0.00	0.00	0.00	100.00	24.00	100.00
test6	0.00	0.00	100.00	100.00	100.00	100.00	0.00	71.91	18.88	0.00
test7	0.00	0.00	100.00	0.00	0.00	0.00	0.00	71.91	3.44	0.00
test8	0.00	100.00	0.00	0.00	0.00	0.00	0.00	42.83	2.86	0.00
test9	0.00	0.00	0.00	100.00	100.00	100.00	0.00	0.00	6.00	0.00
test10	0.00	0.00	100.00	0.00	0.00	0.00	100.00	71.91	10.88	0.00
test11	100.00	100.00	100.00	0.00	0.00	0.00	100.00	100.00	40.00	100.00
test12	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
random1	32.01	26.68	21.72	24.62	20.87	36.24	16.73	42.83	22.17	9.60
random2	32.01	0.00	0.00	0.00	0.00	0.00	0.00	42.83	1.50	0.00
random3	0.00	0.00	0.00	0.00	0.00	0.00	16.73	0.00	0.33	0.00
random4	32.01	26.68	0.00	0.00	0.00	0.00	0.00	42.83	4.06	9.60
random5	32.01	26.68	21.72	0.00	0.00	0.00	0.00	42.83	7.39	9.60
random6	0.00	0.00	21.72	24.62	20.87	36.24	0.00	42.83	5.85	0.00
random7	0.00	0.00	21.72	0.00	0.00	0.00	0.00	42.83	1.29	0.00
random8	0.00	26.68	0.00	0.00	0.00	0.00	0.00	42.83	1.39	0.00
random9	0.00	0.00	0.00	24.62	20.87	36.24	0.00	0.00	1.63	0.00
random10	0.00	0.00	21.72	0.00	0.00	0.00	16.73	42.83	3.25	0.00
random11	32.01	26.68	21.72	0.00	0.00	0.00	16.73	42.83	11.20	9.60
average	32.01	26.68	31.89	63.96	50.95	66.24	30.25	50.34	35.23	9.60

Enquiries, Question and Comments

If you have any further enquiries, question, or comments, please contact corsmal-challenge@qmul.ac.uk If you would like to file a bug report or a feature request, use the Github issue tracker.

Licence

This work is licensed under the MIT License. To view a copy of this license, see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
annotations		annotations
data_parser		data_parser
docs		docs
submissions		submissions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
randgen.py		randgen.py
random.csv		random.csv
run_eval_demo.sh		run_eval_demo.sh
run_subm_gen.sh		run_subm_gen.sh

License

CORSMAL/CORSMALChallengeEvalToolkit

Folders and files

Latest commit

History

Repository files navigation

The CORSMAL Challenge evaluation toolkit

Challenge venues

Current venue

Past venues

Data format

Input

Output

Demo

Enquiries, Question and Comments

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Languages