# Exploration of Low-Power Flexible Stress Monitoring Classifiers for Conformal Wearables

Abstract—Conventional stress monitoring relies on episodic, symptom-focused interventions, missing the need for continuous, accessible, and cost-efficient solutions. State-of-the-art approaches use rigid, silicon-based wearables, which, though capable of multitasking, are not optimized for lightweight, flexible wear, limiting their practicality for continuous monitoring. In contrast, flexible electronics (FE) offer flexibility and low manufacturing costs, enabling real-time stress monitoring circuits. However, implementing complex circuits like machine learning (ML) classifiers in FE is challenging due to integration and power constraints. Previous research has explored flexible biosensors and ADCs, but classifier design for stress detection remains underexplored. This work presents the first comprehensive design space exploration of low-power, flexible stress classifiers. We cover various ML classifiers, feature selection, and neural simplification algorithms, with over 1200 flexible classifiers. To optimize hardware efficiency, fully customized circuits with lowprecision arithmetic are designed in each case. Our exploration provides insights into designing real-time stress classifiers that offer higher accuracy than current methods, while being lowcost, conformable, and ensuring low power and compact size.

Index Terms—Stress Monitoring, Flexible Electronics, Low-power, Machine Learning

#### I. INTRODUCTION

Stress is a critical health concern, linked to conditions such as depression, heart disease, digestive issues, and sleep disturbances [1]. Traditional stress monitoring methods, based on intermittent evaluations, fall short in providing the continuous data needed for accurate, timely analysis. Real-time monitoring, enabled by wearable devices processing physiological data from biosensors, is crucial for early intervention and better management of stress-related health issues. These devices often use sensory time-series data and machine learning (ML) algorithms to predict stress states. However, most research has focused on algorithmic solutions [2]-[5] or general-purpose microprocessor-based systems [6], with limited attention to the hardware implications of wearable stress-monitoring devices. Traditional silicon-based wearable solutions, while effective, face limitations in terms of rigidity, high manufacturing costs, and substantial power consumption, making them unsuitable for continuous and accessible health monitoring applications.

Flexible electronics (FE) offer a promising alternative for wearable health monitoring devices, offering distinct advantages over conventional rigid electronics. Their flexible substrates allow them to adapt more naturally to body contours, improving comfort during prolonged use [7]. Furthermore,

they support the development of inexpensive, disposable hardware, making them well-suited for single-use low-cost patches in both commercial and medical applications. However, FE exclusively use only n-type transistors in thin-film transistor (TFT) technology, restricting circuit designs to unipolar logic. Further, static power accounts for over 99% of the total consumption of FE systems [8]. These limitations significantly hinder the integration capabilities of FE systems [8].

Significant research has been focused on mechanically-flexible biosensors capable of capturing stress-related signals, such as electrodermal activity (EDA) [9]. However, the ML stress classifier—a critical component for stress prediction [2]—remains largely unexplored by the state of the art. This can be attributed to the inherent design constraints of FE in realizing such complex circuits. Overall, matching the computational requirements of the healthcare applications, like stress prediction, with the unique mechanical characteristics of flexible technology remains an open question.

In this work, we address this research gap by thoroughly exploring the design space of ML classifiers for real-time stress monitoring systems. Our aim is to design a set of ML classifiers that provide favorable accuracy-power tradeoffs, whilst complying with the stringent area constraints of FE. To that end, we develop a custom standard cell library for FE, optimized for ultra-low-power operation at 1 V, significantly lower than commercial solutions at 3 V [10], enabling more power-efficient circuits. We explore a wide design space comprising of i) different ML algorithms, including Decision Trees (DTs), Multilayer Perceptrons (MLPs), and Support Vector Machines (SVMs), ii) statistical-based feature selection techniques, and iii) neural minimization techniques such as pruning and low-precision quantization. Leveraging the low cost of FE, we fully customize our circuits to each ML model, i.e., bespoke hardware implementation [11], which greatly contributes to reducing power/area overheads.

## Our novel contributions within this work are as follows:

- 1) To the best of our knowledge, this work is the first to design stress classifiers<sup>1</sup> in flexible electronics.
- 2) We introduce an automated design space exploration (DSE) framework, evaluating over 1200 classifiers, that integrates both software and hardware optimizations, including feature selection, neural minimization, bespoke circuit design,

<sup>1</sup>Our stress classifiers are available at https://github.com/floAfentaki/EDA-Driven-ML-Circuits-for-Flexible-Electronics

- and low-precision arithmetic, in order to identify optimal power-accuracy trade-offs.
- 3) Our work highlights the feasibility of real-time flexible stress monitoring: our DSE identified solutions with  $9\,\mu\mathrm{W}$  power and  $0.2\,\mathrm{mm}^2$  area, satisfying battery and area constraints in flexible electronics.

#### II. FLEXIBLE ELECTRONICS

The Indium Gallium Zinc Oxide (IGZO) Thin-Film Transistors (TFTs), incorporated into Flexible Integrated Circuit (FlexIC) technology, are advancing flexible electronics by combining mechanical adaptability with cost-efficient manufacturing [8]. Unlike conventional silicon-based solutions, IGZO TFTs can be fabricated on lightweight flexible substrates (e.g., polyimide) using low-temperature lithography, eliminating the need for additional protective packaging. This method not only avoids rigid silicon wafers and high-temperature fabrication but also significantly lowers production expenses and reduces environmental impact. Moreover, IGZO TFTs possess inherent mechanical flexibility, allowing them to bend without additional encapsulation. Their streamlined manufacturing process also shortens fabrication time from 32 weeks to under 3.5 days, making them suitable for scalable applications [12].

Despite these advantages, IGZO TFTs face limitations compared to CMOS technology, particularly in terms of performance and feature size, with a typical minimum feature size of 800 nm, significantly larger than that of silicon transistors [8], [13]. Thus, designing complex circuits, such as ML classifiers, for applications with strict area constraints, like wearables, poses a significant challenge in FE. Additionally, IGZO TFT technology relies solely on n-type transistors, restricting designs to unipolar logic. Specifically, resistor-NMOS (R-NMOS) logic is utilized, where a pull-up resistor replaces the PMOS transistor. The absence of p-type devices increases resistance, affecting delay and power consumption, leading to design challenges. To mitigate these, hardwaresoftware co-design strategies, such as fully parallel bespoke (see Section III-B) implementations and power-efficient logic simplifications, are necessary [14]. Our approach incorporates these strategies to optimize flexible components, achieving reductions in memory usage, gate count and power consumption, eliminating also the need for memory elements that are scarce and coslty in FE technology-key factors for developing efficient, lightweight healthcare monitoring systems.

# III. DESIGNING A STRESS MONITORING SYSTEM

## A. System Overview

Fig. 1 presents an abstract block diagram of our targeted flexible classification system for real-time stress monitoring. Flexible biosensors [15] capture bio-signals, which are quantized by Analog-to-Digital Converters (ADCs) [16] and processed by feature extractors to generate input features for the classifier. The trained ML classifier processes these features and predicts stress levels. To the best of our knowledge, this is the first work to systematically design and optimize classifiers specifically for stress prediction in FE.



Fig. 1. Overview of the mechanically flexible real-time stress monitoring classification system.

Significant research on FE has focused on flexible sensors [9] and ADCs [16], while the feature extractor could be implemented with a flexible microprocessor like in [8]. However, a thorough investigation of the impact of feature selection (i.e., number of features and selection method) on the flexible ML classifier is missing from the literature. Additionally, no prior work has systematically explored neural simplification techniques (e.g., pruning and quantization) alongside classifier selection to minimize the hardware footprint of a flexible classifier. In our work, we focus on optimizing the classifier within the context of the stress monitoring system.

## B. Bespoke Flexible Classifier Design

Targeting to comply with strict area constraints (i.e., a primary design objective of FE), we design fully-parallel bespoke ML circuits. Bespoke refers to hardwiring the model coefficients to the hardware implementation, significantly boosting the efficiency compared to conventional designs [17]. Such customization is enabled by the low non-recurring engineering (NRE) and fabrication costs of FE. Additionally, fully-parallel designs are purely combinational, alleviating the need for excess memory units, which are costly in FE [8].

Based on these guidelines, each ML algorithm demands its own tailored implementation. MLPs compute a weighted sum per neuron, adding a final bias. Our fully-parallel MLPs instantiate one bespoke multiplier per weight and a semibespoke adder-tree (i.e., accumulation of bespoke products), where the weight and bias are known a priori and set as constants, leading to much simplified circuitry. SVMs with a linear kernel operate on the same principle, computing a weighted sum; however, in this case, a separate weighted sum is calculated for each output class i.e., binary classifier. Finally, DT circuits comprise a series of comparators, where input features are compared against hardwired thresholds in parallel, determining the activated tree branches and finally, the predicted class. This work demonstrates the feasibility of fully-parallel and mechanically-flexible classifiers targeting real-time stress prediction.

# IV. PROPOSED FLEXIBLE CLASSIFIER EXPLORATION

We propose a DSE which aims to identify accuracy-power Pareto-optimal designs for real-time stress monitoring classifiers, focusing on design feasibility under the constraints of FE. We present the pseudocode for our flexible classifier design space exploration in Algorithm 1, while the algorithmic flowchart is shown in Fig. 2. Our DSE incorporates

## Algorithm 1 Flexible Classifier Design Space Exploration

```
Require: Feature set X, labels Y
Ensure: Optimized classifier C
1: Initialize feature set S = X
2: for each feature selection F \in \{DISR, Fisher, JMI\} do
       Select top-k features S_F = F(X, Y)
3:
4:
       for each ML classifier C \in \{SVM, MLP, DT\} do
5:
           for each hyperparameter set H_C do
6:
              Train classifier C using S_F and H_C
7:
              if C is an MLP then
8:
                  for each sparsity s \in \{0.2, 0.5, 0.9\} do
9.
                      Apply pruning-aware retraining with s
10:
               end if
11:
               for each precision p \in \{4, 6, 8, 10\} do
12:
13:
                  Apply quantization with p-bit precision
14:
                  Obtain Hardware Description
15:
                  Apply Hardware Evaluation
16:
                  Obtain Accuracy A_C
17:
                  Obtain Power consumption P_C
18:
              end for
19:
           end for
20:
       end for
21: end for
22: Return C^* = Pareto_C(A_C, P_C)
```

techniques such as feature selection, pruning (for MLPs), and low-precision arithmetic, to optimize hardware efficiency and enable classifiers' realization in FE implementations. For each ML algorithm, we first apply statistical feature selection, using state-of-the-art techniques, to identify a subset of the most relevant statistics, and then train the respective classifier. Quantization is used to explore low-precision implementations across all classifiers, reducing the hardware overheads, while state-of-the-art unstructured pruning is also applied to the MLPs. The above form a complex design space of software-hardware design techniques, all aiming to reduce the footprint of the flexible classifier without deteriorating the application accuracy. Finally, the hardware description of the stress classifiers is obtained via custom Python-to-Verilog code templates in a fully automated way, and their hardware evaluation is conducted with a low-power standard cell library via commercial EDA tools. To the best of our knowledge, this is the first time such an exploration is conducted for stress monitoring within FE.

## A. Feature Selection

Feature selection significantly impacts both hardware efficiency and classification accuracy. Limiting the number of selected features directly reduces the classifier's inputs and consequently its size and parameters, lowering the associated overheads. Additionally, choosing a technique that identifies and retains only the most relevant features can allow for higher achievable accuracy. Accounting for the above, we simultaneously explore the feature selection algorithm and the number of selected features, in order to achieve optimal accuracy while satisfying the strict constraints of FE.

Our feature selection is conducted offline, and three state-of-the-art statistical-based algorithms are explored [6]: Double Input Symmetrical Relevance (DISR), Fisher Score, and Joint Mutual Information (JMI). The Fisher Score evaluates the



Fig. 2. Algorithmic flowchart of our proposed design space exploration.

discriminative power of features by comparing inter-class to intra-class variance, where, higher scores indicate stronger class separation. JMI evaluates the shared information between the features, while DISR normalizes JMI by the combined entropy of the features, emphasizing those with unique information. Taking into account the uncertainty in the system, DISR highlights features that contribute distinct information. Given a dataset with M features  $X = \{x_1, x_2, \ldots, x_M\}$  and corresponding labels Y, we define an optimization objective to select a subset  $S^*$  of k features that maximizes a relevance criterion F(S):

$$S^* = \arg \max_{S \subset X, |S| = k} F(S). \tag{1}$$

where F(S) represents the relevance criterion computed using one of the feature selection algorithms (DISR, Fisher Score, or JMI). This process ensures that only the most important features are retained, maintaining high accuracy while reducing hardware requirements. Beyond the algorithm exploration, we evaluate different numbers of features per algorithm by varying k within a predefined limit, ensuring that our flexible classifiers' input size stays within reasonable area bounds.

## B. Classifier Training and Neural Minimization Techniques

Following the feature selection process, the classifiers are then trained with the selected features. During training, *hyper-parameter tuning* is performed to identify the optimal model architecture for each considered ML algorithm, and obtain the most accurate model. SVMs are restricted to use only linear kernels, avoiding complex non-linear alternatives which induce significant hardware overheads. MLPs use the Rectified Linear Unit (ReLu) activation function due to its simplicity and low hardware requirements. Finally, different splitting criteria for DTs are explored, including gini, impurity, and entropy.

Quantization is then applied on the trained classifier to mimic the effect of precision scaling in hardware, leveraging the efficiency of low-precision arithmetic. As precision decreases, the hardware overhead of all arithmetic components (i.e., multipliers/adders in MLPs and SVMs, comparators in DTs) decreases accordingly at the cost of accuracy loss. Importantly, our bespoke designs enable the precision of coefficients and input features to be tailored to specific application requirements, allowing for a maximum exploitation of low-precision arithmetic while adhering to high accuracy constraints.

As an additional minimization technique, we consider *pruning-aware retraining* for MLPs, due to their elevated hardware cost. Unstructured pruning removes unimportant coefficients during training, leading to smaller models whilst

preserving accuracy. In hardware, the multipliers corresponding to pruned coefficients are removed from our bespoke fullyparallel circuits [11], while the adder tree is simplified by accumulating less addends. For example, if we consider a bespoke neuron with 5 inputs, the weighted sum would be  $in_0c_0 + in_1c_1 + in_2c_2 + in_3c_3 + in_4c_4$ . After pruning, if some weights are set to zero, e.g.,  $c_1 = 0$  and  $c_4 = 0$ , the neuron's output becomes  $in_0c_0 + in_2c_2 + in_3c_3$ , saving two multipliers and two operands of the adder tree. We evaluate three state-of-the-art pruning techniques, each employing a distinct ranking criterion: L2-norm, Hessian, and activationaware pruning. In L2 Norm method, weights with smaller absolute magnitudes are pruned, as smaller weights contribute less to each neuron's output. Hessian-Based pruning removes weights with smaller second-order derivatives, represented by the diagonal elements of the Hessian matrix, as these weights have minimal impact on reducing the model's loss function. Activation-Aware considers both the magnitude of weights and the norms of their corresponding activations, pruning based on their combined importance. An exhaustive exploration as a calibration step demonstrates that pruning-aware retraining using the L2-norm criterion consistently outperforms the other two techniques. Thus, the L2-norm criterion is selected for further exploration. Finally, we explore different sparsity ratios to cover a wide range of possible accuracy-power tradeoffs. Given the small sizes of ML models designed for FE and the high-level nature of the optimizations, the design space is exhaustively explored in parallel, enabling fast evaluation and generalization to other ML applications of FE.

# V. RESULTS & ANALYSIS

## A. Flexible Standard Cell Library Characterization

In this work, we develop and characterize a custom standard cell library for FE, optimized for ultra-low-power operation at 1 V using the PragmatIC FlexICs PDK second-generation Helvellyn 2.1.0 [18]. The target operating voltage of 1 V was selected to minimize cell size and power consumption while achieving a 1 µs delay across all cells. To address the inherent challenges of IGZO TFTs, careful adjustments to the transistor threshold voltages and drive strengths are implemented to guarantee proper switching behavior and functionality under low-voltage conditions. Design considerations include reducing leakage currents, compensating for increased pull-up resistance, optimizing resistor placement for power efficiency and speed, and minimizing parasitic capacitances through interconnect optimization. Fig. 3 shows the layout of our R-NMOS 2-input NAND (left) and NOR (right) gates optimized for 1 V operation. The area of all cells, obtained from final layouts, is reported in Table I. Cells are characterized through simulations considering input slew and output load capacitance using the PDK's typical transistor model. Transition times, static and dynamic power, and drive strength are measured to ensure accurate modeling. The extracted data is compiled into a Liberty (.lib) file for integration with commercial EDA tools for delay, area, and power analysis.



Fig. 3. Resistor-NMOS layout of 2-input a) NAND and b) NOR gates. TABLE I

AREA MEASUREMENTS OF OUR PRIMITIVE GATES USING THE PRAGMATIC FLEXICS PDK SECOND GENERATION HELVELLYN 2.1.0 [18].

| Cell Name | Area (μm <sup>2</sup> ) |
|-----------|-------------------------|
| INVX1     | 747.82                  |
| AND2      | 2121.00                 |
| NAND2     | 919.01                  |
| OR2       | 2468.85                 |
| NOR2      | 1053.62                 |
| DFFNRX1   | 16 195.00               |

#### B. Stress Datasets

- 1) Wearable Stress and Affect Detection Dataset: We evaluate our proposed flexible classifiers using the Wearable Stress and Affect Detection (WESAD) dataset [19]. The collected data originate from two wearable devices on the chest (RespiBAN) and at the wrist (Empatica E4), worn by 17 participants for around 100 minutes. In our exploration, all the signals from RespiBAN are used, while from the Empatica E4 only BVP is used. Each recorded sample was labeled in one of the two individual's stress level: baseline/rest and stress, i.e., binary classification.
- 2) AffectiveROAD: The AffectiveROAD dataset [20] focuses on stress monitoring in driving scenarios, integrating both physiological and contextual data collected from two devices: the Empatica E4 and Zephyr Bioharness 3. It includes data from 13 real-world driving sessions conducted by nine experienced drivers, self-identified as four women and five men. For our analysis, only raw sensor data from the Empatica E4 is used. These datasets enable us to validate our system across diverse real-world stress applications.

## C. Experimental Setup

Feature Extraction was performed using the Scipy and pyHRV packages [21]. For training, we normalize the extracted features within [0,1] and then randomly split the formed dataset into training and testing subsets, with a 70%/30% split ratio. Stratification ensured a balanced distribution of each target class within both the training and testing sets. Scikit-learn's GridSearchCV is used for hyperparameter selection during training with 5-fold cross validation, and the ML models are trained until convergence with default tolerance. For synthesis and mapping, we use the standard-cell library developed at 1V with the Pragmatic FlexIC PDK [16] developed as mentioned in SectionV-A. Synopsys Design Compiler S-2021.06, VCS T-2022.06, and PrimeTime T-2022.03 are used for synthesis and hardware analysis. The



Fig. 4. Feature selection evaluation of accuracy-power trade-offs, across all considered ML algorithms: (a, d) DTs, (b, e) SVMs, and (c, f) MLPs. The figures (a-c) presents the WESAD evaluation, while (d-f) studies AffectiveROAD dataset. The coefficients of the ML models are 8-bit fixed-point values.

accuracy is reported on the test dataset, and all designs are synthesized at clock period of 2kHz, aligning with the performance of typical stress monitoring applications [22]. Over 1200 classifiers are evaluated in our exploration.

#### D. Evaluation of our Flexible Classifiers

1) Feature Selection Evaluation: First, we evaluate the impact of different feature selection techniques and number of selected features in designing low-area but highly accurate stress classifiers. Fig. 4 presents the evaluation of feature selection for all three classifiers (DT, SVM and MLP), with 8-bit fixed-point coefficients, which has demonstrated optimal gains without accuracy loss in the state of the art [23]. We explore a number of selected features within the range of 5 to 30, using increments of 5. The top of Fig. 4 illustrates the accuracyarea tradeoffs, while the bottom depicts the accuracy-power tradeoffs. Fig. 4 (a-c) and (d-f) illustrates the accuracy-power tradeoffs of WESAD and AffectiveROAD dataset respectively. Due to space limitations, the accuracy-area tradeoff figure is not included; however, the accuracy-power and accuracyarea tradeoffs would appear identical. In flexible electronics (FE), nearly 99% of power consumption is static [8], making area and power linearly correlated. Hence, area and power are linearly correlated and minimizing area also minimizes power since, unlike in conventional IC, the contribution of switching activity to the overall power consumption is negligible.

Overall, we observe that each Pareto front is populated by diverse feature selection techniques. For SVMs and MLPs for the WESAD, 67% and 71% of Pareto-optimal points use DISR and Fisher, respectively, whereas Pareto solutions for DTs are divided into DISR and JMI. For AffectiveROAD, 67% of Pareto-optimal points for MLP use DISR, while Pareto solutions for MLPs are split between DISR and JMI, and for DT, the techniques are equally selected. This indicates that the choice of selection method is not trivial and depends on the type of classifier, showcasing the necessity to explore different



Fig. 5. Accuracy-power evaluation of our DSE for generating flexible stress classifiers, using feature selection and neural minimization techniques, across all considered ML algorithms: (a, d) DTs, (b, e) SVMs, and (c, f) MLPs. The figure (a-c) presents the evaluation of WESAD, while (d-f) of AffectiveROAD.

techniques per ML algorithm. Interestingly, we observe that the number of features does not necessarily correlate with the classifier's power, as fewer features may result in larger circuits. This might happen since in bespoke circuits, where coefficient values define the area of instantiated arithmetic components, hardware overheads are highly influenced by both the number of trained parameters and their specific value. Similar observations can be extended to accuracy, where adding more features does not necessarily enhance it, and the significance of different feature combinations varies w.r.t. the achieved accuracy.

2) DSE Evaluation: Fig. 5 presents the accuracy-power trade-offs obtained from our entire DSE, using different neural minimization approaches on top of feature selection, across all ML algorithms. Specifically, quantization is explored for 4, 6, 8, and 10 bits, and L2-norm pruning with sparsity ratios of 20%, 50%, and 90% (for MLPs only). We observe that our flexible classifiers can highly benefit from quantization, as all accuracy-power Pareto fronts are populated by low-precision designs. Specifically for WESAD, 43\% of all Pareto-optimal classifiers feature 4-bit precision, 39% use 6 bits, and only 18% require 8 bits or more. For AffectiveROAD, 30% of all Pareto-optimal classifiers feature 4-bit precision, 26% use 6 bits, 26% require 8 bits and only 17% 10 bits. This highlights the effectiveness of exploring low-precision arithmetic in our bespoke fully-parallel designs, as quantization highly influences which solutions become Pareto-optimal. We also observe that MLPs demonstrate better robustness to quantization, as they exhibit only a small accuracy drop at 6 bits compared to 10 bits. Contrarily, reducing precision incurs significant drops for SVMs and DTs, of 2% on average. This trend is illustrated in Fig. 5, where the MLP solutions exhibit greater robustness to reduced precision, with accuracy degrading more gracefully compared to the steeper drops observed for SVM and DT. Finally, pruning facilitates the removal of parameters,

TABLE II
COMPARISON OF OUR HIGHLY-ACCURATE FLEXIBLE CLASSIFIERS

| Dataset                 | WESAD            |       |       | AffectiveRoad    |        |        |
|-------------------------|------------------|-------|-------|------------------|--------|--------|
| Model                   | MLP <sup>1</sup> | SVM   | DT    | MLP <sup>2</sup> | SVM    | DT     |
| Feature Selection       | Fisher           | DISR  | DISR  | JMI              | Fisher | DISR   |
| #Features               | 25               | 25    | 25    | 30               | 20     | 15     |
| Precision               | 10               | 10    | 8     | 8                | 10     | 10     |
| Accuracy (%)            | 94               | 85    | 94    | 98               | 65     | 47     |
| F1 Score (%)            | 93               | 85    | 94    | 100              | 86     | 99     |
| Area (cm <sup>2</sup> ) | 8.8              | 0.021 | 0.002 | 9.3              | 0.065  | 0.009  |
| Power (mW)              | 48               | 0.12  | 0.009 | 26.5             | 0.19   | 0.025  |
| Latency (ms)            | 6.3              | 0.7   | 0.14  | 97               | 15     | 7.1    |
| Energy (µJ)             | 300              | 0.08  | 0.001 | 2600             | 2.85   | 0.1775 |

 $^1$ MLP is pruned with L2-norm criterion and 90% sparsity.  $^2$ MLP is pruned with L2-norm criterion and 50% sparsity.

and therefore yields power gains in MLP circuits. The Pareto-optimal points for WESAD are distributed as follows: 45.5% of the points correspond to a sparsity of 20%, 18.2% to a sparsity of 50%, and 36.4% to a sparsity of 90%. While for the AffectiveROAD are distributed as follows: 29% of the points correspond to a sparsity of 20%, 57.0% to a sparsity of 50%, and 14% to a sparsity of 90%. Interestingly, the majority of Pareto-optimal points use 20% and 50% sparsity ratio, for WESAD and AffectiveROAD respectively, highlighting the fact that determining the best pruning ratio is not trivial. It is important to reiterate that in bespoke architectures, area and power overheads depend on the coefficient values. During pruning-aware retraining, accuracy recovery may favor less hardware-friendly coefficients, leading to less pruning and fewer changes, better suited for bespoke architectures.

3) Evaluation of our Most Accurate Classifiers: Prioritizing application accuracy, Table II provides a detailed analysis of our flexible classifiers that achieve the highest accuracy for each ML algorithm, as obtained by our DSE. For WESAD, we observe that MLPs and DTs achieve the highest accuracy. However, DTs are also the most hardware-efficient, requiring on average only  $0.2 \,\mathrm{mm}^2$  and consuming just  $9 \,\mu\mathrm{W}$  with only 0.001 µJ energy per inference. Both DTs and SVMs consume less than 2 mW of power, allowing them to be powered by existing flexible energy harvesters. This enables self-sustaining operation, a key advantage for wearable applications. Our most accurate MLP, even after pruning 90% of its parameters, still incurs high hardware costs, with power consumption exceeding 30 mW i.e, non-adequate for battery-powered operation. It is important to underscore that, while energy consumption is often a consideration, power availability is the more critical constraint in FE. Given that printed batteries can be customized in capacity, shape, and voltage [24], managing peak power consumption is a higher priority than optimizing for total energy usage [25]. For AffectiveROAD, Both DTs and SVMs consume less than 0.1 mW, but their low accuracy limits their practical use. However, it is worth mentioning that SVM accuracy is comparable to other state-of-the-art approaches [4], while DTs have not been used in the literature. In contrast, MLPs, despite higher hardware costs, provide realistic accuracy while being able to operate with the largest

TABLE III
STATE-OF-THE-ART COMPARISON OF STRESS MONITORING CLASSIFIERS

| Feature                    | Silicon-based [5], [6] | Our              |  |  |
|----------------------------|------------------------|------------------|--|--|
| Flexibility                | X Rigid                | ✓ Fully flexible |  |  |
|                            | Bulky                  | ✓ Patch-based    |  |  |
| Cost                       | > 10 dollar            | sub-dollar       |  |  |
| <b>Computational Model</b> | Cloud-dependent        | Fully edge-based |  |  |
| Accuracy                   | 87%, 81%               | 94%, 98%         |  |  |

printed battery i.e, Molex. The above highlights the importance of a design space exploration like ours, as different stress datasets require different ML models and design decisions, to balance accuracy and hardware constraints in FE.

4) State-of-the-Art Comparison: As mentioned in Section I, the state of the art for stress monitoring uses rigid, silicon-based wearables focused on algorithmic design [2]–[6]. These systems rely on general-purpose hardware (e.g., CPUs [5]) and require continuous transmission of data to the cloud [6]. Table III compares our flexible solutions with the rigid systems in [5], [6]. Our solution enables a fully-flexible classification system that can be realized as a standalone on-body patch in flexible technology. In contrast to rigid silicon, our system offers lower cost, significantly reduced power consumption, and purely edge-based computations.

The rigid silicon system in [6] achieved 87% accuracy for WESAD, using SVM as the optimal classifier for edge computing. However, it did not explore DTs or MLPs, which our work identifies as optimal (see also Table II). For the AffectiveROAD dataset, the authors in [5] achieved 81% accuracy, focusing on algorithmic optimization and general-purpose CPUs. In contrast, we achieve state-of-the-art accuracy (94%, 81%), surpassing previous silicon-based solutions [2]–[6]. Our flexible classifiers, with minimal area overheads and battery-powered operation, can be seamlessly integrated into conformal and accessible wearable devices.

## VI. CONCLUSION

In this work, we conduct the first comprehensive design space exploration of mechanically-flexible low-power classifiers for real-time stress monitoring applications. To that end, we incorporate diverse ML algorithms in our exploration, accounting for the hardware impact of varied sets of features and neural simplification techniques, such as unstructured pruning and low-precision quantization. Our flexible classifiers are designed as bespoke fully-parallel circuits, aiming to comply with the stringent constraints of FE. We designed and evaluated over 1200 classifiers within our exploration. Our results reveal that our Pareto-optimal flexible classifiers enable personalized stress classification, achieving state-of-the-art accuracy with a small, conformal, and accessible device compared to rigid, state-of-the-art solutions.

# ACKNOWLEDGMENT

This work is partially supported by the European Research Council (ERC) and co-funded by the H.F.R.I call "Basic Research Financing (Horizontal support of all Sciences)" under the National Recovery and Resilience Plan "Greece 2.0" (H.F.R.I. Project Number: 17048).

#### REFERENCES

- [1] B. S. McEwen, "Neurobiological and Systemic Effects of Chronic Stress," *Chronic Stress*, vol. 1, 2017.
- [2] A. Kumar, K. Sharma, and A. Sharma, "Hierarchical deep neural network for mental stress state detection using IoT based biomarkers," *Pattern Recognit. Lett.*, vol. 145, pp. 81–87, 2021.
- [3] S. A. H. Aqajari, E. K. Naeini, M. A. Mehrabadi, S. Labbaf, A.-M. Rahmani, and N. D. Dutt, "GSR Analysis for Stress: Development and Validation of an Open Source Tool for Noisy Naturalistic GSR Data," *ArXiv*, 2020.
- [4] D. Lopez-Martinez, N. El-Haouij, and R. Picard, "Detection of real-world driving-induced affective state using physiological signals and multi-view multi-task machine learning," in 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE, 2019, pp. 356–361.
- [5] V.-T. Ninh, S. Smyth, M.-T. Tran, and C. Gurrin, "Analysing the performance of stress detection models on consumer-grade wearable devices," in *New Trends in Intelligent Software Methodologies, Tools and Techniques*. IOS Press, 2021, pp. 524–537.
- [6] S. Jiang, F. Firouzi, K. Chakrabarty, and E. B. Elbogen, "A resilient and hierarchical iot-based solution for stress monitoring in everyday settings," *IEEE Internet Things J*, vol. 9, 2022.
- [7] W. Gao *et al.*, "Fully integrated wearable sensor arrays for multiplexed in situ perspiration analysis," *Nature*, vol. 529, pp. 509–514, 01 2016.
- [8] E. Ozer et al., "Bendable non-silicon risc-v microprocessor," Nature, 2024
- [9] S. V. R. Kaipu, J. G. D'sa, D. Sachan, and M. Goswami, "Fabrication of flexible sensors for electrodermal activity measurement," in 29th International Conference on Microelectronics (ICM), 2017.
- [10] J. Costa, V. Barlier, H. Norman, E. Ozer, F. Alkhalil, and R. Price, "11-5: Invited paper: Evolving into an era of natively flexible smart systems," in SID Symposium Digest of Technical Papers, vol. 54, no. 1, 2023, pp. 136–139.
- [11] G. Armeniakos, G. Zervakis, D. Soudris, M. B. Tahoori, and J. Henkel, "Co-design of approximate multilayer perceptron for ultra-resource constrained printed circuits," *IEEE Trans. Comput.*, 2023.
- [12] Pragmatic, "Advancing semiconductor sustainability," White Paper, vol. V1, 2023.

- [13] H. Çeliker, A. Sou, B. Cobb, W. Dehaene, and K. Myny, "Flex6502: a flexible 8b microprocessor in 0.8 μm metal-oxide thin-film transistor technology implemented with a complete digital design flow running complex assembly code," in 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65. IEEE, 2022, pp. 272–274.
- [14] E. Ozer *et al.*, "Bespoke machine learning processor development framework on flexible substrates," in *Int. Conf. Flexible and Printable Sensors and Systems (FLEPS)*, 2019, pp. 1–3.
- [15] T. Moy et al., "An eeg acquisition and biomarker-extraction system using low-noise-amplifier and compressive-sensing circuits based on flexible, thin-film electronics," *IEEE Journal of Solid-State Circuits*, 2017.
- [16] P. C. Lozano Duarte, F. Afentaki, G. Zervakis, and M. Tahoori, "Design and in-training optimization of binary search adc for flexible classifiers," in *Proceedings of the 30th Asia and South Pacific Design Automation Conference*, 2025, pp. 754–760.
- [17] M. H. Mubarik et al., "Printed machine learning classifiers," in Annu. Int. Symp. Microarchitecture (MICRO), 2020, pp. 73–87.
- [18] D. o. E. IEEE PES DSCE. Europractice | flexible electronics. [Online]. Available: https://europractice-ic.com/technologies/flexible-electronics/
- [19] P. Schmidt, A. Reiss, R. Dürichen, C. Marberger, and K. V. Laerhoven, "Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection," in ACM International Conference on Multimodal Interaction (ICMI), 2018.
- [20] "Affectiveroad dataset," MIT Massachusetts Institute of Technology, 2018.
- [21] P. Gomes, P. Margaritoff, and H. Silva, "pyhrv: Development and evaluation of an open-source python toolbox for heart rate variability (hrv)," in *Proc. IcETRAN*, 2019.
- [22] X. Wang, Z. Liu, and T. Zhang, "Flexible sensing electronics for wearable/attachable health monitoring," Small, vol. 13, 2017.
- [23] F. Afentaki, G. Saglam, A. Kokkinis, K. Siozios, G. Zervakis, and M. B. Tahoori, "Bespoke approximation of multiplication-accumulation and activation targeting printed multilayer perceptrons," in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2023, pp. 1–9
- pp. 1–9.
  [24] S. Lanceros-Méndez and C. M. Costa, *Printed Batteries: Materials*, *Technologies and Applications*. Wiley, 2018.
- [25] J. Henkel et al., "Approximate computing and the efficient machine learning expedition," in Int. Conf. on Computer-Aided Design (ICCAD), 2022.