Inconsistent benchmark results VS real user time taken

Hello, 

I would like to report an observation. It seems to be a noticeable inconsistency regarding the training/inference time reported by the benchmark versus how long the system is actually "occupied". Below, I have attached the report of sklearn benchmark on several datasets.

| algorithm | stage      | device | data_order | data_type | dataset_name        | rows   | columns | classes | tol   | max_iter | C | kernel | time[s] | accuracy | n_sv     |
|-----------|------------|--------|------------|-----------|---------------------|--------|---------|---------|-------|----------|---|--------|---------|----------|----------|
| SVC       | training   | none   | F          | float32   | cifar_10            | 54000  | 3072    | 10      | 0.001 | -1       | 1 | linear | 6294.79 | 0.19     | 45496.00 |
| SVC       | training   | none   | F          | float32   | connect             | 60801  | 126     | 3       | 0.001 | -1       | 1 | linear | 36.62   | 0.76     | 29539.00 |
| SVC       | training   | none   | F          | float32   | mnist               | 60000  | 784     | 10      | 0.001 | -1       | 1 | linear | 1.17    | 0.97     | 10347.00 |
| SVC       | training   | none   | F          | float32   | sensit              | 78822  | 100     | 3       | 0.001 | -1       | 1 | linear | 1.57    | 0.81     | 35643.00 |
| SVC       | training   | none   | F          | float32   | connect             | 60801  | 126     | 3       | 0.001 | -1       | 1 | linear | 36.85   | 0.76     | 29539.00 |
| SVC       | training   | none   | F          | float32   | letters             | 16000  | 16      | 26      | 0.001 | -1       | 1 | linear | 0.23    | 0.87     | 6598.00  |
| SVC       | training   | none   | F          | float32   | year_prediction_msd | 463715 | 90      | 89      | 0.001 | -1       | 1 | linear | 12166.49| 0.06     | 463234.00|
| SVC       | prediction | none   | F          | float32   | cifar_10            | 54000  | 3072    | 10      | 0.001 | -1       | 1 | linear | 4.72    | 0.18     | 45496.00 |
| SVC       | prediction | none   | F          | float32   | connect             | 60801  | 126     | 3       | 0.001 | -1       | 1 | linear | 0.29    | 0.76     | 29539.00 |
| SVC       | prediction | none   | F          | float32   | mnist               | 60000  | 784     | 10      | 0.001 | -1       | 1 | linear | 0.20    | 0.94     | 10347.00 |
| SVC       | prediction | none   | F          | float32   | sensit              | 78822  | 100     | 3       | 0.001 | -1       | 1 | linear | 0.35    | 0.80     | 35643.00 |
| SVC       | prediction | none   | F          | float32   | connect             | 60801  | 126     | 3       | 0.001 | -1       | 1 | linear | 0.29    | 0.76     | 29539.00 |
| SVC       | prediction | none   | F          | float32   | letters             | 16000  | 16      | 26      | 0.001 | -1       | 1 | linear | 0.02    | 0.87     | 6598.00  |
| SVC       | prediction | none   | F          | float32   | year_prediction_msd | 463715 | 90      | 89      | 0.001 | -1       | 1 | linear | 396.76  | 0.06     | 463234.00|

I would like to focus on a subset of them: (i) mnist; (ii) sensit; and (iii) letters. I have also attached some plots of the system activity regarding both RAM and CPU utilization for each dataset. In the plots, the system is active for a higher amount of time than reported. Specifically, for the three aforementioned datasets, the reported time is far lower (by a factor of more than x10) than reported.

[intelex-sensit_prof.pdf](https://github.com/user-attachments/files/16367322/intelex-sensit_prof.pdf)
[intelex-mnist_prof.pdf](https://github.com/user-attachments/files/16367323/intelex-mnist_prof.pdf)
[intelex-letters_prof.pdf](https://github.com/user-attachments/files/16367324/intelex-letters_prof.pdf)
[intelex-connect_prof.pdf](https://github.com/user-attachments/files/16367325/intelex-connect_prof.pdf)

Is there something in the background that is not considered as part of the training/inference phases? Am I missing something here? Or is this a bug?

Thank you for your time.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent benchmark results VS real user time taken #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

algorithm	stage	device	data_order	data_type	dataset_name	rows	columns	classes	tol	max_iter	C	kernel	time[s]	accuracy	n_sv
SVC	training	none	F	float32	cifar_10	54000	3072	10	0.001	-1	1	linear	6294.79	0.19	45496.00
SVC	training	none	F	float32	connect	60801	126	3	0.001	-1	1	linear	36.62	0.76	29539.00
SVC	training	none	F	float32	mnist	60000	784	10	0.001	-1	1	linear	1.17	0.97	10347.00
SVC	training	none	F	float32	sensit	78822	100	3	0.001	-1	1	linear	1.57	0.81	35643.00
SVC	training	none	F	float32	connect	60801	126	3	0.001	-1	1	linear	36.85	0.76	29539.00
SVC	training	none	F	float32	letters	16000	16	26	0.001	-1	1	linear	0.23	0.87	6598.00
SVC	training	none	F	float32	year_prediction_msd	463715	90	89	0.001	-1	1	linear	12166.49	0.06	463234.00
SVC	prediction	none	F	float32	cifar_10	54000	3072	10	0.001	-1	1	linear	4.72	0.18	45496.00
SVC	prediction	none	F	float32	connect	60801	126	3	0.001	-1	1	linear	0.29	0.76	29539.00
SVC	prediction	none	F	float32	mnist	60000	784	10	0.001	-1	1	linear	0.20	0.94	10347.00
SVC	prediction	none	F	float32	sensit	78822	100	3	0.001	-1	1	linear	0.35	0.80	35643.00
SVC	prediction	none	F	float32	connect	60801	126	3	0.001	-1	1	linear	0.29	0.76	29539.00
SVC	prediction	none	F	float32	letters	16000	16	26	0.001	-1	1	linear	0.02	0.87	6598.00
SVC	prediction	none	F	float32	year_prediction_msd	463715	90	89	0.001	-1	1	linear	396.76	0.06	463234.00

Inconsistent benchmark results VS real user time taken #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions