Benchmarks in Design Research

Design research has come very far after several decades of development. Nevertheless, there is still room for improvement. Such improvements must be well documented and fairly evaluated. Time is over for making claims like “results seem to be better than existing ones from the literature”. We are responsible for making quantitative judgments and fair comparisons to the state of the art.

We may think that the validation of design research is much more complicated than the validation of computer science research. Still, arguments such as " we would need several years to get evidence", "our design situation is really special", "we cannot replicate the simulation results as we do not have the computing resources" are no different from other disciplines. For instance, climate research, a discipline that requires more computing resources to solve research questions on highly complex situations and over a very long period, created the Coupled Model Intercomparison Project 6 (CMIP6), which has been internationally adopted as a shared infrastructure that serves as the provision of benchmarks against which to compare improvements in models and prediction quality. The same when researchers argue that "the influence of designers' knowledge is too important to get comparable results"; disciplines like Human-Computer Interaction are facing the same challenges but try to adopt the scientific method.

This version-controlled and community-based open science platform is a collaborative space gathering benchmarks of engineering design theories, processes, methods and tools. Each benchmark is a sustainable ecosystem where a community of researchers can engage in an asynchronous collaboration for 1/the co-definition of fundamental and practical research problems, goals and solutions, 2/ the fair and systematic evaluation of claimed contributions on open benchmark exercises following standardised measurement protocols, and 3/ the comparison of competing solutions based on agreed-upon qualitative or quantitative measures of performance aligned with research goals.

What is a scientific benchmark?

The "What" and "Why" of Scientific Benchmarks are mostly extracts from the paper ''Using benchmarking to advance research: a challenge to software engineering'' (Sim, 2003).

Researchers commonly communicate their research results in papers. Often, a wide variety of research papers address the same research topic. It is becoming increasingly hard to compare research results by reading papers. Why? Because it is hard to compare results if the metrics, protocols, datasets and ground truth differ. Benchmarking in the form of organising challenges (the terminology for this differs per field, from competitions, benchmarking, shared or common tasks, etc.) is one way to address this. In this case, a benchmark is a standardised validation framework that allows for the direct comparison of different solutions that address the same research problem. Participants are invited to submit their solutions to a benchmark, after which their submissions are assessed using a predefined set of evaluation criteria.

A scientific benchmark is more than a dataset or set of datasets composed of tests and metrics used to compare the performance of alternative tools or techniques. A benchmark operationalises a paradigm; it takes an abstract concept and makes it concrete so it can serve as a guide for action. Indeed, within research communities, benchmarks are a statement of the discipline's research goals, and they emerge through a synergetic process of technical knowledge and social consensus proceeding in tandem. This community-based open science platform is concerned primarily with benchmarks created and used by a technical research community, especially the design research community.

Benchmarking has a strong positive effect on a discipline's scientific maturity. It helps whenever a research area needs to become more scientific, codify technical knowledge, or become more cohesive. Appropriately deployed benchmarking is not about winning a contest but more about surveying a landscape — the more we can reframe, contextualise, and appropriately scope the datasets, the more useful they will become as an informative dimension.

Scientific benchmarks emerge through a process of scientific discovery and consensus. Both must progress together for a standard benchmark to emerge because neither alone is sufficient. The community of interest may include academia, industry, and government participants, but they are all primarily interested in scientific research. The benchmark should be specified at a high enough level of abstraction to ensure that it is portable to different tools or techniques and does not bias one technology in favour of others. Continued evolution of the benchmark is necessary to prevent researchers from making changes to optimise the performance of their contributions on a particular set of tests.

Motivations for Engaging in Design Research Benchmarking

During the last decades, our community of researchers in engineering design has reached a general consensus:

"As foundation of good scientific practise, manuscripts submitted for publication must advance the state of the art and provide novel theoretical, numerical or mechanical insight and knowledge. Sometimes, good ideas turn out to be not as good as initially expected, i.e., the initial research hypotheses or promises turn out not to hold. This is no shame and in fact learning from other researches’ mistakes or failed attempts may be just as fruitful as learning from successes and may save a lot of time. Unfortunately, however, many manuscripts set out with high goals and claims but fail to critically evaluate the outcome at the end." [Sigmund, 2022]

"Real progress on evaluating design methods can only be expected if preconditions such as standardized theoretical constructs, measures, data bases of empirical data, and a sufficient number of studies on specific design methods are developed." [Hein and Lamé, 2020]

"Without action to increase scientific, theoretical, and methodological rigour there is a real possibility of the field being superseded and becoming obsolete through lack of impact." [Cash, 2018]

"There is this concern that design research does not live up to the standards of science: it is creating in a sense too many theories and models, which jeopardises the coherence of the discipline and which indicates that design research does not yet have the means to test and refute design theories and models." [Vermaas, 2014]

"There is in design research a general concern about the quality of the testing of design theories and models. In work reflecting on the results that design research has produced, it is complained that generally accepted and effective research methods for testing design theories and models are lacking in design research, and that the discipline is fragmented in separate research strands.” [Vermaas, 2014]

“37% of the articles reviewed did not have any validation. There needs to be more validation in the field of research in engineering design.” [Barth, A. et al. 2011]

“A lack of common terminology, benchmarked research methods, and above all, a common research methodology are the most outstanding problems in the field.” [Blessing and Chakrabarti, 2009]

Two conditions must already exist within a discipline before the construction of a benchmark can be fruitfully attempted. Design research is at a stage where scientific benchmarks can be fruitfully attempted. Design research is being established, and diverse approaches and solutions proliferate. This proliferation is desirable, so there are various tools and techniques to be compared by the benchmarks. Evidence that our community has reached the required level of maturity and is ready to move to a more rigorous scientific basis comes in many forms. Typical symptoms include an increasing concern with the validation of research results and with a comparison between solutions developed at different laboratories, attempted replication of results, use of proto-benchmarks (or at least attempts to apply solutions to a common set of sample problems), and finally, increasing resistance to accepting speculative papers for publication. This precondition is important because there is a significant cost to developing and maintaining scientific benchmarks and a danger in committing to a benchmark too early.

Scientific benchmarks advance a discipline by improving the science and increasing the cohesiveness of the community. The research design community is sufficiently well-established and has a culture of collaboration. Evidence of the former includes an existing collection of diverse research results and an increasing concern in validating these results. Evidence of the latter includes multi-site research projects, multi-author publications, standards for reporting, file formats, and the like. From this base, a consensus-based process led by researchers can be used to construct benchmarks endorsed by the design research community. Using benchmarks results in a more rigorous examination of research contributions and an overall improvement in the tools and techniques being developed. The presence of benchmarks will state that the design research community believes that contributions ought to be evaluated against clearly defined standards. We want the design research community to become more scientific and cohesive by working as a community to define benchmarks to advance the state of research.

The benchmark itself promotes collaborative, open, and public research. Creating a benchmark requires our community to examine our understanding of the field, agree on the key problems, and encapsulate this knowledge in an evaluation. Throughout the benchmarking process, greater communication and collaboration among different researchers lead to a stronger consensus on the community's research goals and methods.

Although some research (e.g., computer science research) is more obviously amenable to benchmarking because the performance measures are straightforward, no dataset will be able to capture the full complexity of the details of existence,

Benchmarking is far superior to merely asserting that a design theory, process, method or technology is valuable. No research method or empirical evaluation is perfect. Benchmarks are one of the few ways that the dirty details of research, such as debugging techniques, design decisions, and mistakes, are forced out into the open and shared between laboratories. Like experiments, control of the task sample reduces variability in the results—all tools and techniques are evaluated using the same tasks and experimental materials. Another advantage of benchmarking is that replication is built into the method. Since the materials are designed to be used in different laboratories, people can perform the evaluation on various tools and techniques repeatedly, if desired.

The second precondition is that there must be an ethos of collaboration within the community. In other words, there must be a willingness to work together to solve common problems.

Why should we collaborate and open our research data?

Design research is interdisciplinary, and using multiple research methods is difficult. Literature reviews drew up extensive lists of research methods (Barth et al., 2011; Escudero-Mancebo et al., 2023) and design research objectives (Eckert et al., 2004; Cantamessa, 2003). When mixing research methods from multiple research areas, many challenges can arise due to the individual research cultures of each discipline involved.

Collaboration in benchmarking occurs in two ways. During development, researchers work together to build consensus on what should be in the benchmark. During deployment, the results from different design philosophies, processes, methods, [...] and tools are compared, which requires researchers to look at each other’s contributions. Consequently, researchers become more aware of one another's work and ties between researchers with similar interests are strengthened. Evaluations carried out using benchmarks are, by their nature, open and public. The materials are available for general use, and often so are the results being tested. It is difficult to hide the flaws of a tool or technique or to aggrandise its strengths when there is transparency in the test procedures. Moreover, anyone could use the benchmark with the same tools or techniques and attempt to replicate the results. Together with collaboration, openness, and publicness, these factors result in frank, detailed, and technical communication among researchers. This kind of public evaluation contrasts sharply with the descriptions of tools and techniques currently found in design research conferences or journal publications.

Publishing or making our data available to others is not considered standard practice. As measured by the French Open Science Barometer, researchers in design science keep their results more confidential than other disciplines, with only 10 % of French publications in engineering which mention the sharing of their data and 15% that include a "Data Availability Statement" between 2013 and 2021, whereas 86 % of French publications in engineering mention the use of data. This lack of openness is all the more regrettable, given that the opening up of data forces researchers to guarantee data quality. We may assume that it is mainly because we are primarily focused on getting grant money, and the influence of outside sponsors, such as industrialists, limits the openness of research data. Still, it is necessary to open our research data to a scientific community that examines the same research question from multiple angles over time because more than one data collection effort is needed to lead to a definitive answer. Research methods and results should be well documented, with enough detail so that other teams can attempt to reproduce or replicate the findings and expand upon them. If they come up with the same general results over time, all of these efforts give evidence for the scientific truth of the findings. Benchmarks are an opportunity to share open data that serves as ground truth.

Goals

What does this project provide?

This open science project has been born and developed to make a move for the engineering design research community's progress.

News

What's new?

20/05/2024 - 18th International Design Conference "Operationalizing Community-Based Open Scientific Design Research Benchmarks: Application to Model-Based Architecture Design Synthesis". Slides Paper
04/04/2023 - Animation of the workshop "Co-design of a community-based ecosystem to improve validation practices in engineering research”, S.mart Special Interest Group in Industry 4.0. Slides
08/12/2022 - Benchmarking in design research workshop at the Academia-Industry forum of the INCOSE French chapter (AFIS). Slides
21/06/2022 - Meeting of French academics whose research concentrates on systems engineering. Part of the researchers decided to start two working groups so as to develop two benchmarks: 1) Model-based system architecture synthesis; 2) Early validation and verification of systems. Lettre
28/03/2022 - 32nd CIRP Design Conference "An Open Science Platform for Benchmarking Engineering Design Researches." Slides Paper
31/03/2021 - Atelier S.mart "Validation de nos recherches en Génie Industriel : Co-Construction d'une Feuille de Route." Dashboard Notes
28/01/2021 - Webinar S.mart "Méthodologies de recherche sur l'industrie du futur: Pourquoi et Comment ? Replay Slides Notes

Contribution Process

Willing to be active with us? Follow the contributing guide!

Code of Conduct

This code of conduct outlines expectations for participation in this Open Source Benchmarking Environment for Engineering Research. By joining our community, you pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community by:

Being radically inclusive to existing members and newcomers looking to learn or participate.
Being totally respectful of each others abilities, interests, viewpoints, experiences and personal differences.
Gracefully accepting constructive criticism and being exceedingly kind even in moments of disagreement while working towards consensus.
Educating and illuminating others with something you know more about.
Contacting the original contributors before any external communication.
Preventing any public or private business opportunities of the open source content without agreeing with the original authors and contributors of the sources.

People violating this code of conduct may be banned from the community.

Benchmarks

The open science benchmarking environment contains a set of benchmarks that aim at making technical progress objective and reproducible.

	Model-Based Systems Engineering for Early V&V Benchmark methods and tools for the early validation and verification of engineered systems. Keywords: MBSE, Validation, Verification Discussions - Open Issues
	3D modelling in Virtual Reality Benchmark methods and tools for 3D modelling in virtual reality. Keywords: Virtual Reality, Geometric Modelling, CAD Discussions - Open Issues
	Ability to conduct Life Cycle Assessment in Education Identify the most suitable Life Cycle Assessment method for a teaching population. Keywords: Life cycle analysis, Sustainability, Competencies Evaluation Discussions - Open Issues
	Measure the value provided to stakeholders in design for sustainability Benchmark to compare different approaches for measuring the value perceived by the stakeholders: ecosystems, territorial approaches, value analysis approaches, etc. Keywords: Value, Sustainability, Stakeholders Discussions - Open Issues
	Model-Based Design reviews Open benchmark exercises for comparing digital materials supporting model-based design reviews. Keywords: Model-Based Design, Design review Discussions - Open Issues
	Model-Based Architecture Design Synthesis Open benchmark exercises for comparing concept finding in a model-based design synthesis process for system sizing. Keywords: Model-Based Design Synthesis, Concept Finding, System Sizing Discussions - Open Issues
	MBSE Model Transformation Inference Open benchmark exercises to compare and study the performance of approaches for automatically inferring a transformation model. Keywords: Model-Based Systems Engineering, Interoperability, Data Transformation, Inference Discussions - Open Issues

Project Lead

Romain Pinquié - Univ. Grenoble Alpes, CNRS, Grenoble INP, G-SCOP - romain.pinquie@grenoble-inp.fr {Git ID: rpinquie}

Project Team

These were the original creators of this project. Want to contact the Core Team? Send an e-mail to all of them!

Romain Pinquié - Univ. Grenoble Alpes, CNRS, Grenoble INP, G-SCOP - romain.pinquie@grenoble-inp.fr {Git ID: rpinquie}
Lionel Roucoules - Arts et Métiers ParisTech, LISPEN - lionel.roucoules@ensam.eu {Git ID: roucoules}
Lou Grimal - Université de Technologie de Troyes, InSyTE, CREIDD - lou.grimal@utt.fr {Git ID: LouGrimal}
Raphaël Chenouard - École Centrale de Nantes, LS2N, France - raphael.chenouard@ls2n.fr {Git ID: raphaelchenouard}
Pierre-Alain Yvars - ISAE-Supméca, Quartz, France - pierre-alain.yvars@isae-supmeca.fr

Other Participants

Volunteers run this open-science repository. Below is a list of volunteers who have expressed an interest in this project.

Julien Le Duigou - Université de Technologie de Compiègne, Roberval - julien.le-duigou@utc.fr {Git ID: jleduigo}
Nabil Anwer - ENS-Paris-Saclay, LURPA - nabil.anwer@lurpa.ens-cachan.fr
Matthieu Bricogne - Université de Technologie de Compiègne, Roberval - matthieu.bricogne@utc.fr
Alexis Lalevée - Université de Technologie de Troyes, ICD - alexis.lalevee@utt.fr
Florent Laroche - Ecole Centrale de Nantes, IS2N - florent.Laroche@ec-nantes.fr
Olivia Penas - Supméca, QUARTZ - olivia.penas@supmeca.fr
Régis Plateaux - Supméca, QUARTZ - regis.plateaux@supmeca.fr
Alix Thecle - Arts et Métiers ParisTech, IMS - Thecle.alix@ensam.eu
Nadège Troussier - Université de Technologie de Troyes, ICD - nadege.troussier@utt.fr
Esma Yahia - Arts et Métiers ParisTech, LISPEN - esma.yahia@ensam.eu
Gilles Foucault - Université Grenoble Alpes, G-SCOP - gilles.foucault@univ-grenoble-alpes.fr {Git ID: gilles-foucault-univ}

If you want to be an active member of our community, open a new issue.

Related Projects

What are the existing sources that have inspired this project?

Open science:
Open data repositories:
Systems Engineering:
Model-Based Systems Engineering:
Multidisciplinary Design Optimisation:
- Aircraft Jet Engine Architecting Benchmark Problem and its dataset
Additive Manufacturing:
- AM Bench 2022 Challenge Problems and Measurement Results
Artificial Intelligence applied to Engineering Design:
- MIT Design Computation and Digital Engineering (DeCoDE) Lab
- PHYRE A Benchmark For Physical Reasoning
Computer science challenges:
Data science public challenges:
Topological optimization:
- Aircraft wing design
- Suspension bridge
Shape retrieval:
Model exchange:
- OMG Model Interchange Wiki
- The NIST Validator
Multimedia evaluation:
- Multimedia Evaluation Benchmark
Social science:
- Fragile families challenge

Related Papers

Want to learn more about the validation in engineering design research? Here is a list of sources to start with!

Pinquié, R. et al. (2024) Operationalizing community-based open scientific design research benchmarks: application to model-based architecture design synthesis.
Eisenbart, B. et al. (2024) Two decades apart and looking forward - exploring rigour in reporting on research in the engineering design research community
Paehler, L. et al. (2023) Impact of Method Users on the Application of Design Methods—Assessing the Role of Method-related Background Knowledge.
Escudero-Mancebo, D. et al (2023) Research methods in engineering design : A synthesis of recent studies using a systematic literature review.
Gericke, K. et al. (2022) Elements of a design method - a basis for describing and evaluating design methods
Sigmund, O. (2022) On benchmarking and good scientific practise in topology optimization
Pinquié, R. et al. (2022) An open science platform for benchmarking engineering design researches
Prochner, I and Godin, D. (2022) Quality in research through design projects: Recommendations for evaluation and enhancement
Ralph, P. et al. (2021) Empirical standards for software engineering research
Eisenmann, M. et al. (2021) Design method validation – an investigation of the current practice in design research
Hein, A. M. and Lamé, G. (2020) Evaluating engineering design methods: taking inspiration from software engineering and the health sciences
Gericke, K. et al. (2020) Supporting designers: moving from method menagerie to method ecosystem
Ureten, S. et al. (2020) Current challenges and solution approaches in empirical engineering design research - A workshop for empirical research
Isaksson, O. et al. (2020) You need to focus to validate
Sileyew, K.J. (2019) Research Design and Methodology
Cross, N. (2018) Developing design as a discipline
Cash, P. (2018) Developing theory-driven design research
Panchal, J. H. and Szajnfarber, Z. (2017) Experiments in systems engineering and design research
Vermaas, P. E. (2016) A logical critique of the expert position in design research: beyond expert justification of design methods and towards empirical validation
Vermaas, P. E. (2016) Towards precedence that justifies the knowledge claims of design methods.
Vermaas, P. E. (2014) Design theories, models and their testing: On the scientific status of design research
McMahon, C. (2012) Reflections on diversity in design research
Barth, A. et al. (2011) How to validate research in engineering design?
Blessing, L. and Chakrabarti, A. (2009) DRM, a Design Research Methodology
Frey, D. D. and Dym, C. L. (2006) Validation of design methods: lessons from medicine
Briggs, R. O. (2006) On theory-driven design and deployment of collaboration systems.
Seepersad, C. C. et al. (2006) The validation square: How does one verify and validate a design method?
Sim, S.E. et al. (2003) Using benchmarking to advance research: a challenge to software engineering
Eckert, C. M. et al. (2003) The spiral of applied research: A methodological view on integrated design research
Pedersen, K. et al. (2000) Validating design methods & research: the validation square.
Cantamessa, M. (2003) An empirical perspective upon design research.
Hazelrigg, G. A. (2003) Validation of engineering design alternative selection methods

Disclaimer

Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by our community or S.mart.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Sponsors and Partners

S.mart - Systems.Manufacturing.Academics.Resources.Technologies

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
Docs		Docs
Images		Images
Papers		Papers
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarks in Design Research

What is a scientific benchmark?

Motivations for Engaging in Design Research Benchmarking

Why should we collaborate and open our research data?

Goals

News

Contribution Process

Code of Conduct

Benchmarks

Model-Based Systems Engineering for Early V&V

3D modelling in Virtual Reality

Ability to conduct Life Cycle Assessment in Education

Measure the value provided to stakeholders in design for sustainability

Model-Based Design reviews

Model-Based Architecture Design Synthesis

MBSE Model Transformation Inference

Project Lead

Project Team

Other Participants

Related Projects

Related Papers

Disclaimer

License

Sponsors and Partners

About

Releases

Packages

Contributors 3

License

GIS-S-mart/Welcome

Folders and files

Latest commit

History

Repository files navigation

Benchmarks in Design Research

What is a scientific benchmark?

Motivations for Engaging in Design Research Benchmarking

Why should we collaborate and open our research data?

Goals

News

Contribution Process

Code of Conduct

Benchmarks

Project Lead

Project Team

Other Participants

Related Projects

Related Papers

Disclaimer

License

Sponsors and Partners

About

Resources

License

Stars

Watchers

Forks