# A mechanics-informed deep learning framework: 
# for data-driven nonlinear viscoelasticity. Faisal As’ad , Charbel Farhat

# Mechanics-informed artificial neural networks for elasticity

The computational framework presented in [22] for building and training mechanics-informed data-driven ANNs for the constitutive modeling of nonlinear elastic materials distinguishes itself from alternatives by the simultaneous enforcement, in a computationally efficient manner, of the following mechanics-based hard constraints on the network architecture and loss function:
- Dynamic stability. In the absence of any dissipative mechanism, preventing an uncontrolled growth of the kinetic energy of an elastic material requires such a material to be more precisely hyperelastic. That is, the stresses that may develop in the material must derive from a strain energy density function $W$. This constraint is simply enforced by using the ANN to represent the strain energy density function rather than the stress. Consequently, the output of the ANN is chosen to be $W$; and the loss function is formulated in terms of the gradient of $W$ with respect to its input (determined below), rather than its output $W$.

- Objectivity. To be objective, the strain energy density function $W$ should be invariant to orthogonal transformations of the frame of reference. To achieve this, $W$ must be expressible in terms of the Green-Lagrange strain $\boldsymbol{E} \in \operatorname{Sym}_d^2(\mathbb{R}),{ }^2$ where $d$ is the dimensionality of the problem (one-, two-, or three-dimensional). Consequently, the input to the ANN is chosen to be $\boldsymbol{E}$. Since dynamic stability requires hyperelasticity, the output stress measure is chosen to be the energy conjugate of $\boldsymbol{E}$ - that is, the second Piola Kirchhoff stress tensor $S \in \operatorname{Sym}_d^2(\mathbb{R})$.
- Material stability. Material stability characterizes a material which, under small loads, cannot experience arbitrarily large deformations. It is ensured by the ellipticity of $\bar{W}(\boldsymbol{F})$ - that is, the strain energy density function expressed in terms of the deformation gradient $F \in \mathbb{R}^{d \times d}$ - which itself is implied by the convexity of $W$ in the Green-Lagrange strain $E \in \operatorname{Sym}_d^2(\mathbb{R})$, when the material is in a tensile state [22]. The convexity of $W(\boldsymbol{E})$ leads to a symmetric positive definite elasticity tensor. Unlike that of $\bar{W}(\boldsymbol{F})$, which necessarily excludes the physically reasonable growth condition $\bar{W} \rightarrow \infty$ as $\operatorname{det} \mathbf{F} \rightarrow 0^{+}$and prevents instability phenomena such as buckling (see [22]), the convexity of $W(\boldsymbol{E})$ is not too restrictive. The convexity of the ANN representation of the output $W$ with respect to the input $E$ is enforced using the restrictions of Input Convex Neural Networks [29].
- Consistency. In [22] and in this work, consistency refers to the preservation of rigid body modes. That is, a material undergoing zero strain should experience zero stress. This essential property is enforced by adding a linear correction to the ANN output $W$ whose slope is equal to the negative of the ANN gradient at the origin.

Throughout the remainder of this paper, a hat over a symmetric second-order tensor such as stress or strain tensor represents the vectorization of its unique components (for example, using the Voigt or Mendel notation). Hence, a nonlinear elastic material is represented by

$$ 
\begin{aligned}
& W_{(\boldsymbol{\theta})}(\widehat{\boldsymbol{E}})=\mathcal{N}_{\boldsymbol{\theta}}(\widehat{\boldsymbol{E}})-\frac{\partial \mathcal{N}_{\boldsymbol{\theta}}}{\partial \widehat{\boldsymbol{E}}}(\mathbf{0}) \cdot \widehat{\boldsymbol{E}} \\
& \widehat{\boldsymbol{S}}_{(\boldsymbol{\theta})}(\widehat{\boldsymbol{E}})=\frac{\partial W_{(\boldsymbol{\theta})}}{\partial \widehat{\boldsymbol{E}}}(\widehat{\boldsymbol{E}})=\frac{\partial \mathcal{N}_{\boldsymbol{\theta}}}{\partial \widehat{\boldsymbol{E}}}(\widehat{\boldsymbol{E}})-\frac{\partial \mathcal{N}_{\boldsymbol{\Theta}}}{\partial \widehat{\boldsymbol{E}}}(\mathbf{0})
\end{aligned}
$$

where $\mathcal{N}_{\Theta}: \mathbb{R}^{d^{\prime}} \rightarrow \mathbb{R}$ is a fully-connected ANN parameterized by $\Theta$ and $d^{\prime} \geq d$ is the number of unique components of the stress tensor - which depends on the dimensionality of the problem or the state of the stress/strain (for example, plane stress). Given $N_m$ stress-strain data pairs $\left\{\left(\hat{\boldsymbol{E}}^{(m)}, \hat{\boldsymbol{S}}^{(m)}\right)\right\}_{m=1}^{N_m}$, the optimal values of the parameters $\Theta$ are determined as

$$
\Theta=\underset{\theta \in \mathbb{R}^N \boldsymbol{\theta}}{\operatorname{argmin}} \sum_{m=1}^{N_m} \sum_{k=1}^{d^{\prime}}\left(\frac{\widehat{\boldsymbol{S}}_{(\theta)_k}\left(\widehat{\boldsymbol{E}}^{(m)}\right)-\widehat{S}_k^{(m)}}{\sigma_k}\right)^2
$$

where $\sigma_k$ denotes the standard deviation of the $k$ th component of the stress data and $N_{\Theta}$ denotes the cardinality of $\Theta$.
In [22], the framework overviewed above was shown to favor learning the structure of a constitutive relation rather than overfitting the training data; achieve robustness with respect to data sparsity and noise; promote numerical stability and accuracy for inputs outside the training domain; outperform purely data-driven regression ANNs for complex nonlinear elastic problems; enable the stable execution of static and dynamic FE analyses; and to massively accelerate multi-scale FE computations.

# 3. Constitutive modeling of nonlinear viscoelasticity

Viscoelastic materials exhibit both viscous and elastic characteristics. Viscoelasticity manifests itself in a constitutive law through two main mechanisms: strain-rate dependence; and hysteresis. Thus, unlike an elastic counterpart, a viscoelastic material law cannot be represented by a function of the instantaneous strain only. Instead, it should be represented by a function which, at any time $t$, depends on the entire strain history preceding $t$. For example, a candidate representation of the instantaneous second Piola Kirchhoff stress tensor $\boldsymbol{S}$ associated with a viscoelastic material is

$$
\boldsymbol{S}(t)=\boldsymbol{F}\left(\{\boldsymbol{E}(\tau)\}_{\tau=-\infty}^t\right)
$$

where $\boldsymbol{F}: \operatorname{Sym}_d^2\left(C^0((-\infty, t])\right) \rightarrow \operatorname{Sym}_d^2(\mathbb{R})$ and $C^0((-\infty, t])$ denotes the set of time-continuous functions defined over the time interval $(-\infty, t]$. However, the above candidate representation is too general as it also applies to plastic, viscoplastic - and for that matter, any other path-dependent material law. In other words, it is not sufficiently specific to characterize a viscoelastic material. Indeed, unlike a plastic material (for example), a viscoelastic material experiences fading memory [30] and therefore does not experience permanent deformations or residual stresses as a result of its strain history. Furthermore, the overly general representation (1) poses difficulties to regression-based constitutive modeling. To begin, the number of inputs to the sought-after ANN-based model associated with a time-discretization of the strain history - and thus, the number of model parameters - can be prohibitively large to train the model. In addition, the online execution of a trained model - say, within a FE analysis - may require a significant amount of data storage and CPU resources. Over and above all of this, a representation such as (1) is difficult to interpret and thus difficult to inform with mechanics-based constraints.

For all the above reasons, a more concise and yet sufficiently expressive representation of nonlinear viscoelasticity is sought after in this paper. To this end, it is noted that it is tempting to achieve this objective by simply adding a strain-rate dependence to a standard elastic model. However, that would fail to capture the relaxation effect commonly observed in viscoelastic materials. It is also noted that a commonly used constitutive model for linear viscoelasticity is the standard linear solid (SLS) model, which draws on the spring dashpot analogy to account for both dependences on the stress rate and strain rate. This model leads to a parametric linear relationship between the stress, strain, stress rate, and strain rate. Specifically, it consists of a first-order ordinary differential equation (ODE) for the stress, in which the strain and strain rate appear as source terms. In the nonlinear setting however, a more sophisticated model such as that proposed in this paper is required.
3.1. An internal wariable approach to viscoelasticity

Internal variable approaches [31] are most-appropriate for modeling hysteresis in a constitutive law, particularly in the viscoelastic setting. They can lead to a concise regression model that is amenable to both the enforcement of hard mechanical constraints and fast online execution. In such approaches, the stress $S$ can be expressed as a function of not only the strain $E$, but also the time evolving strain-like internal variable $\alpha \in \mathrm{Sym}_d^2(\mathbf{R})$. Here, such an approach is adopted and thus $S$ is modeled as follows

$$
S(t)=\widetilde{S}(E(t), \alpha(t))
$$

where $\tilde{S}: \operatorname{Sym}_d^2(\mathbf{R}) \times \operatorname{Sym}_d^2(\mathbf{R}) \rightarrow \operatorname{Sym}_d^2(\mathbf{R})$.
In (2), the evolution of the internal variable $\alpha$ is govemed by an ODE of the form

$$
\dot{\boldsymbol{\alpha}}(t)=\overrightarrow{\boldsymbol{\alpha}}(\boldsymbol{\beta}(t), \boldsymbol{E}(t))
$$

where a dot designates the time derivative; $\stackrel{2}{\alpha}: \operatorname{Sym}_d^2(\mathbb{R}) \times \operatorname{Sym}_d^2(\mathbf{R}) \rightarrow \operatorname{Sym}_d^2(\mathbb{R})$ is some function of the "internal stress" $\beta \in \operatorname{Sym}_d^2(\mathbf{R})$ and of $E$; and for closure, $\beta$ is related to the internal variable and to the strain by a relationship of the form

$$
\beta(t)=\tilde{\beta}(\alpha(t), E(t))
$$

where $\vec{\beta}: \operatorname{Sym}_d^2(\mathbf{R}) \times \operatorname{Sym}_d^2(\mathbf{R}) \rightarrow \operatorname{Sym}_d^2(\mathbf{R})$.
The system of Eqs. (2) to (4) constitutes a well-defined constitutive law in the sense that, starting from an initial state ( $E_0, \alpha_0$ ) specified at time $t_0$ - that is, $\left(E_0, \alpha_0\right)=\left(E\left(t_0\right), \alpha\left(t_0\right)\right)$ - this system can be advanced in time to obtain the instantancous stress $S(t)$ at any time $t \geq t_0$.

Remark 1. In order to keep the notation as simple as possible, the dependencies on time of $\boldsymbol{E}, \alpha$, and $\beta$ are not highlighted in the remainder of this paper when they are irrelevant to the calculus being performed. The same convention is adopted, including for other variables, except when detrimental to clarity. Similarly and for the same purpose, all dependencies of all functions and functionals on their arguments are not highlighted within the narrative, except when detrimental to clarity.
3.2. Mechanics-informed constraints

To complete the description of the viscoelastic constitutive model based on an internal variable approach outlined above, it remains to determine the three functions $\widetilde{S}, \vec{\alpha}$, and $\bar{\beta}$ that govern it. These should be chosen such that the resulting constitutive law obeys well established laws of mechanics.

To guarantee objectivity and dynamic stability when $\alpha$ is held constant (see Section 2 and/or [22]), the strain measure is chosen again to be the Green-Iagrange strain $\boldsymbol{E}$; the existence of an energy density function $U(\boldsymbol{E}, \boldsymbol{\alpha}): \mathrm{Sym}_d^2(\mathbf{R}) \times \mathrm{Sym}_d^2(\mathbf{R}) \rightarrow \mathbf{R}$ is postulated; and the energy-conjugate second Piola Kirchhoff $S$ is assumed to derive from the energy density. It follows that the stress function $\tilde{\boldsymbol{S}}$ in (2) is restricted to the form

$$
\widetilde{\boldsymbol{S}}(\boldsymbol{E}, \boldsymbol{\alpha})=\frac{\partial U}{\partial \boldsymbol{E}}(\boldsymbol{E}, \boldsymbol{\alpha})
$$


Additional hard constraints particularly relevant to viscoelasticity are discussed next.
3.2.1. Hyperelasticity of the internal stress

As shown in [22] and recalled in Section 2, hyperelasticity guarantees the dynamic stability of elastic materials, as it results in a stress field that prevents the accumulation of infinite kinetic energy during a cyclic deformation process. For this reason and by analogy, the internal stress function $\beta$ along a constant $E$ is required here to derive from the energy density function $U$ - that is,

$$
\bar{\beta}(\boldsymbol{\alpha}, \boldsymbol{E})=-\frac{\partial U}{d \boldsymbol{\alpha}}(\boldsymbol{E}, \boldsymbol{\alpha})
$$


In (6) above, the negative sign is introduced for the convenience of the formulation next of additional hard constraints.

3.2.2. Stability of the evolution of the internal variable

The solution of the ODE (3), which, together with (4), govern the evolution of the internal variable $\alpha$, must be stable. That is, in the absence of any source term, the time trajectories of $\alpha$ should not change significantly under small perturbations; otherwise, it would be possible to prescribe some strain history for which $\alpha$ or the energy of an isolated viscoelastic material would grow unbounded. Moreover, since hysteresis in a viscoelastic material necessarily dies away after a sufficiently long time, any perturbation of $\alpha$ must strictly decay in time to zero.

Consider now the linearization of the ODE (3) about a state defined at time $t_{\star}$ by $\left(\alpha_{\star}, E_{\star}\right)$ and let $\beta_{\star}=\tilde{\beta}\left(\alpha_{\star}, E_k\right)$. If the obtained linearized ODE is stable at all points $\left(\alpha_{\star}, E_{\star}\right)$, the underlying nonlinear ODE is stable throughout its domain of definition. Specifically, let $\Delta \boldsymbol{\alpha}=\boldsymbol{\alpha}-\boldsymbol{\alpha}_{\star}$ and $\Delta \boldsymbol{E}=\boldsymbol{E}-\boldsymbol{E}_{\star}$. From Eqs. (3) to (5), it follows that the linearization of the time-derivative of $\Delta \boldsymbol{\alpha}$ about $\left(\alpha_{\star}, E_{\star}\right)$ - for which $\beta_{\star}=\tilde{\beta}\left(\alpha_{\star}, E_{\star}\right)$ - can be written as

$$
\begin{aligned}
\widehat{\Delta \alpha}(t)=\alpha(t) & =\bar{\alpha}\left(\beta_{\star}, E_{\star}\right)+\frac{\partial \tilde{\alpha}}{\partial \alpha}\left(\beta_{\star}, E_{\star}\right): \Delta \alpha(t)+\frac{\partial \bar{\alpha}}{\partial E}\left(\beta_{\star}, E_{\star}\right): \Delta E(t) \\
& =\bar{\alpha}\left(\beta_{\star}, E_{\star}\right)+\frac{\partial \overline{\tilde{\alpha}}}{\partial \beta}\left(\beta_{\star}, E_{\star}\right): \frac{\partial \widehat{\beta}}{\partial \alpha}\left(\alpha_{\star}, E_{\star}\right): \Delta \alpha(t)+\frac{\partial \dot{\tilde{\alpha}}}{\partial E}\left(\beta_{\star}, E_{\star}\right): \Delta E(t) \\
& =\bar{\alpha}\left(\beta_{\star}, E_{\star}\right)-\frac{\partial \overline{\tilde{\alpha}}}{\partial \beta}\left(\beta_{\star}, E_{\star}\right): \frac{\partial^2 U}{\partial \alpha^2}\left(E_{\star^{\prime}} \alpha_{\star}\right): \Delta \alpha(t)+\frac{\partial \tilde{\tilde{\alpha}}}{\partial E}\left(\beta_{\star^*} E_{\star}\right): \Delta E(t) \\
& =c_{\star}-\boldsymbol{H}\left(\alpha_{\star}, E_{\star}\right): \Delta \alpha(t)+\boldsymbol{R}\left(\alpha_{\star}, E_{\star}\right): \Delta E(t)
\end{aligned}
$$

where

$$
c_{\star}=\bar{\alpha}\left(\beta_{\star}, E_{\star}\right), \quad H(\alpha, E)=\frac{\partial \tilde{\tilde{z}}}{\partial \beta}(\tilde{\beta}(\alpha, E), E): \frac{\partial^2 U}{\partial \alpha^2}(E, \alpha), \quad R(\alpha, E)=\frac{\partial \tilde{\gamma}}{\partial E}(\tilde{\beta}(\alpha, E), E)
$$

and the operator : acting on two tensors denotes their double contraction. Hence, the linearized version of (3) about ( $\boldsymbol{\alpha}_k, \boldsymbol{E}_k$ ) is the linear ODE

$$
\widehat{\Delta \alpha}(t)+H\left(\alpha_{\star}, E_{\star}\right): \Delta \alpha(t)-R\left(\alpha_{\star}, E_{\star}\right): \Delta E(t)=c_{\star}
$$

whose solution is

$$
\Delta \alpha(t)=\underbrace{e^{-\left(\alpha t_{\star}\right) H\left(\alpha_{\star}, E_{\star}\right)}: \alpha_{\star}}_{\text {homogeneous solution }}+\underbrace{c_{\star} t+\int_{t_{\star}}^t e^{-(r-\tau) H\left(\alpha_{\star}, E_{\star}\right)}: \boldsymbol{R}\left(\alpha_{\star}, E_{\star}\right): \Delta \boldsymbol{E}(\tau) \mathrm{d} \tau}_{\text {particular solution }}
$$


It is possible that when the homogeneous solution highlighted in (8) is unstable and $c_{\star}=\mathbf{0}$, the part of the particular solution driven by $\Delta E$ can stabilize the overall solution locally - that is, in the neighborhood of the linearization point ( $\alpha_{\star}, E_{\star}$ ) for which $\boldsymbol{\beta}_{\star}=\bar{\beta}\left(\boldsymbol{\alpha}_{\star}, E_k\right)$. However, in the general case where $\Delta E$ is associated with an arbitrary strain history, the solution (8) of the linear ODE (7) must be locally stable for any $\Delta E$, including $\Delta E=0$. It follows that the homogeneous solution must be stable and therefore should not exhibit any exponential growth, which implies that the tensor $\boldsymbol{H}$ must have positive eigenvalues at all points ( $\boldsymbol{\alpha}_{\star}, \boldsymbol{E}_{\star}$ ). Noting that the product of two symmetric positive definite (SPD) tensors has positive eigenvalues, it suffices then to require that $U(E, \alpha)$ be convex in $\alpha$ and that $\frac{\partial \bar{\alpha}}{\partial \beta}$ be SPD to ensure that $\boldsymbol{H}$ has positive eigenvalues. Therefore, it is required here that: $U(E, \alpha)$ be convex in $\alpha$; and there exists some functional $G: \operatorname{Sym}_d^2(R) \times \operatorname{Sym}_d^2(R) \rightarrow R$ that is convex in its first argument and such that

$$
\dot{\alpha}(\boldsymbol{\beta}, \boldsymbol{E})=\frac{\partial G}{\partial \beta}(\boldsymbol{\beta}, \boldsymbol{E})
$$

3.2.3. Clausius-Duhem form of the second law of thermodynamics

The Clausius-Duhem inequality - which is one way of expressing the second law of thermodynamics - states that the dissipation in a material, denoted here by $\Xi^2$, must be non-negative. Ignoring the effects of heat flow and thus assuming an adiabatic dissipation, this inequality can be written as

$$
\Xi(\boldsymbol{E}, \alpha)=\tilde{S}(\boldsymbol{E}, \alpha): \dot{\boldsymbol{E}}-\dot{U}(\boldsymbol{E}, \boldsymbol{\alpha}) \geq 0
$$


Expanding in (10) the energy-rate term and recalling (5), (6), and (9) transforms the above inequality into

$$
\tilde{\boldsymbol{S}}(\boldsymbol{E}, \boldsymbol{\alpha}): \hat{E}-\frac{\partial U}{\partial \boldsymbol{E}}(\boldsymbol{E}, \boldsymbol{\alpha}): \hat{\boldsymbol{E}}-\frac{\partial U}{\partial \alpha}(\boldsymbol{E}, \boldsymbol{\alpha}): \dot{\boldsymbol{\alpha}}=-\frac{\partial U}{\partial \alpha}(\boldsymbol{E}, \boldsymbol{\alpha}): \dot{\boldsymbol{\alpha}}=\tilde{\boldsymbol{\beta}}(\boldsymbol{\alpha}, \boldsymbol{E}): \frac{\partial G}{\partial \beta}(\tilde{\beta}(\boldsymbol{\alpha}, \boldsymbol{E}), \boldsymbol{E}) \geq 0
$$


Hence, the second law of thermodynamics poses a direct restriction on the properties of the potential functional $G$.
To find a class of functions that satisfy the hard inequality constraint (11), it is noted that for all convex functions $f$ of a tensor $x$ and all $(a, b) \in \operatorname{Dom}(f) \times \operatorname{Dom}(f)$

$$
f(a) \geq f(b)+\frac{\partial f}{\partial x}(b):(a-b)
$$


In particular, for $b=0$,

$$
\frac{\partial f}{\partial x}(a): a \geq f(a)-f(0)
$$


Since $G$ was restricted in Section 3.2 .2 to be convex in its argument $\beta$, it follows from (12) (with $f=G, x=\beta$ and $a=\beta$ ) that restricting $G$ further to satisfy for all values of $E$

$$
\min _\beta G(\beta, E) \geq G(0, E)
$$

ensures that the hard inequality constraint (11) will be satisfied for all values of $\boldsymbol{E}$, $\boldsymbol{\alpha}$, and $\boldsymbol{\rho}$.
Consequently, $G$ is furthermore required here to satisfy (13).
3.2.1. Consistency

As reminded in Sections 1 and 2, consistency refers in this paper to the preservation of rigid body modes. Hence, consistency is equivalent to stating that a body that undergoes a rigid motion should not experience any deformation, which is equivalent to stating that when both the strain $E$ and internal variable $\alpha$ are zero, the stress $S$ should also be zero. In other words,

$$
\tilde{S}(0,0)=\frac{\partial U}{\partial E}(0,0)=0
$$


In this case, the internal stress (not to be confused with an initial or residual stress) should also be zero - that is,

$$
\overline{\boldsymbol{\beta}}(\mathbf{0}, \mathbf{0})=-\frac{\partial U}{\partial \alpha}(\mathbf{0}, \mathbf{0})=\mathbf{0}
$$


Finally, in the absence of any internal stress, the internal variable should not evolve; therefore,

$$
\check{\alpha}(\mathbf{x}, E)=\frac{\partial G}{\partial \beta}(0, E)=0
$$


From (3) and (4), it follows that the above conditions guarantee that if $\boldsymbol{E}$ is identically zero and the state of a viscoelastic material starts from a zero initial condition ( $\alpha_0=0$ ), then: $\boldsymbol{\alpha ( t )}=\mathbf{0}$ for all $t \geq t_0$; and the material does not experience any stress or internal stress. Therefore, Eqs. (14) to (16) are identified here as additional hard constraints.

Note that from Eq. (16), it follows that $\boldsymbol{\beta}=\mathbf{0}$ is an extremum point of the potential functional $\boldsymbol{G}$. Hence, if $\boldsymbol{G}$ is convex in $\beta$ as required in Section 3.2 .2 , this extremum point is a minimum point with a value of 0 . It follows that when $G$ is convex in $\beta$, the hard constraint (16) implies the hard constraint (13).
3.2.5. Recovery of elasticity

In $[32,33]$, it was suggested that a viscoelastic material should experience hysteresis only at intermediate strain rates; in the limit of infinitely slow or infinitely fast deformations, it should recover an elastic behavior and thus its constitutive law should be expressible purely in terms of the instantaneous strain. In the limit of infinitely slow deformations, where the time-scale of the prescribed strain is much longer than the natural time-scale of viscoelasticity, the response to a quasi-static loading should be recovered (the decay of the transients ensured by the stability condition discussed in Section 3.2 .2 promote the existence of such a response). Hence, in the limit of infinitely slow deformations, $\dot{\alpha} \rightarrow 0$; which, from (9), implies $\frac{\partial G}{\partial \beta} \rightarrow 0$.

In Section $3.2 .2, G$ was restricted to be convex in $\boldsymbol{\beta}$ and in Section 3.2 .4 , it was restricted to satisfy for all values of $\boldsymbol{E}$, $\frac{\partial G}{\partial \beta}(0, E)=0$. With these restrictions, $\beta=0$ is the only stationary point of $G$ in its first argument and therefore,

$$
\frac{\partial G}{\partial \beta}(\beta, E)=0, V E \Leftrightarrow \beta=0
$$


Hence, the quasi-static limit ( $\dot{\alpha} \rightarrow \mathbf{0}$ ) is reached only for $\beta \rightarrow 0$. In order to ensure the recovery of the elastic stress in this limit, the viscous part of the stress should vanish when $\beta$ vanishes. To this end, consider the additive decomposition of the strain energy density function into elastic (equilibrium) and viscoelastic (non-equilibrium) components

$$
U(E, a)=W(E)+V(E, \alpha)
$$

where $W: \operatorname{Sym}_d^2(\mathbb{R}) \rightarrow \mathbb{R}$ denotes the elastic component and $\boldsymbol{V}: \mathrm{Sym}_d^2(\mathbf{R}) \times \mathrm{Sym}_d^2(\mathbf{R}) \rightarrow \mathbf{R}$ denotes the viscoelastic component. The same constraints summarized in Section 2 are imposed on $W$ (essentially, the convexity of $W$ in $\boldsymbol{E}$ and $\frac{\partial W}{\partial \boldsymbol{E}}(\boldsymbol{0})=0$ ), so that the material behavior is physical in the quast-static limit. On the other hand, the viscoelastic energy $V$ is required to be convex in $a$, as established in Section 3.2.2.

To satisfy the quasi-static limit, $V(E, \alpha)$ is further restricted here to the set of functions satisfying (see (17))

$$
\bar{\rho}(\alpha, E)=-\frac{\partial U}{\partial \alpha}(E, \alpha)=-\frac{\partial V}{\partial \alpha}(E, \alpha)=0 \Rightarrow \frac{\partial V}{\partial E}(E, \alpha)=0
$$


Accordingly, the class of functions depending only on an affine combination of $E$ and $\alpha$, rather than on $E$ and $\alpha$ individually, is proposed

$$
V(E, \alpha)=\widetilde{V}(\underbrace{\mathrm{C}_1: E+\mathrm{C}_2: \alpha+B}_\delta)=\tilde{V}(\delta)
$$

where $\mathrm{C}_1, \mathrm{C}_2 \in \mathrm{Sym}_d^4(\mathrm{R})$ are symmetric fourth-order tensors, $B \in \mathrm{Sym}_d^2(\mathrm{R})$ is a constant, and the variable $\delta \in \mathrm{Sym}_d^2(\mathrm{R})$ is introduced for the sake of convenience. The partial derivatives of the potential $V$ with respect to its arguments are

$$
\frac{\partial V}{\partial E}=\frac{\partial \delta}{\partial E}: \frac{\partial \widetilde{V}}{\partial \delta}=\mathrm{C}_1: \frac{\partial \widetilde{V}}{\partial \delta}
$$


and

$$
\frac{\partial V}{\partial \alpha}=\frac{\partial \delta}{\partial \alpha}: \frac{\partial \tilde{V}}{\partial \delta}=\mathrm{C}_2: \frac{\partial \tilde{V}}{\partial \delta}
$$


From the above, it follows that as long as $\mathrm{C}_2$ is non-singular, $\frac{\partial V}{\partial \alpha}=0$ implies $\frac{\partial \tilde{V}}{\partial \delta}=0$, which in turn implies $\frac{\partial V}{\partial E}=0$. Moreover, if $\tilde{V}$ is convex in $\delta$, the convexity of $V$ in $\alpha$ is maintained because composition with an affine function preserves convexity. Additionally, if $\frac{\partial \widetilde{V}}{\partial S}(0)=0$ and $B=0$ is chosen, then Eq. (15) is satisfied.

In the limit of infinitely fast deformations, the time scale of the prescribed strain is much shorter than the characteristic time-scale of a viscoelastic material and therefore the internal variable does not have time to evolve. Thus, in this limit, $\boldsymbol{\alpha}=\mathbf{0}$. This implies that the strain energy density (17), and thus the stress, is a function of the strain only - that is,

$$
U(E, 0)=W(E)+V(E, 0)=\widetilde{W}(E)
$$

which corresponds to a hyperelastic behavior. Hence, $\widetilde{W}$ as a whole, rather than just $W$, must be subject to the same hard constraints of a nonlinear elastic material overviewed in Section 2. This is ensured by the form of $V$ chosen in (19), as convexity in $\delta$ implies convexity in $\boldsymbol{E}, \boldsymbol{V}$ vanishes when $\boldsymbol{\alpha}=\boldsymbol{E}=\mathbf{0}$, and hence the elastic limit is recovered in this case.

Remark 2. While Eq. (19) implies Fq. (18), the converse is not true. Indeed, Eq. (19) represents only a particular family of functions that satisfy Eq. (18). Nonetheless, this form is chosen here for two reasons. First, finding the most general family of function that satisfy Eq. (18) is a nontrivial task. Second, the choice of functional form described in Eq. (19) eases the enforcement of the convexity of $V$ by simply enforcing the convexity of $\bar{V}$.
3.3. Summary

To summarize, Sections 3.1 and 3.2 described a three potential framework grounded in the laws of mechanics that is proposed in this paper for modeling viscoelasticity. The general constitutive law generated by this framework is governed by three potentials $W, V$, and $G$; associated restrictions on their functional properties, and the following set of equations

$$
\begin{aligned}
& S(t)=\frac{\partial W}{\partial E}(E(t))+\frac{\partial V}{\partial E}(E(t), \alpha(f)) \\
& \alpha(t)=\frac{\partial G}{\partial \beta}(\beta(t), E(t)) \\
& \beta(t)=-\frac{\partial V}{\partial \alpha}(E(t), \alpha(t))
\end{aligned}
$$

Eqs. (20) to (22) form a closed system that, given at $t_0$ an initial condition ( $E_0, \alpha_0$ ) and given a prescribed strain history $E(t)$, can be integrated forward in time to determine the stress at any time $t \geq t_0$. The three potentials $W, V$, and $G$ are constrained as follows:
- $W$ is convex and $\frac{\partial W}{\partial E}(0)=0$.
$\cdot V=\tilde{V}\left(\mathbb{C}_1: E+\mathbb{C}_2: \alpha\right)$, where $\tilde{V}$ is convex in $\delta=\mathrm{C}_1: E+\mathrm{C}_2: \alpha, \mathrm{C}_2$ is non-singular, and $\frac{\partial \bar{V}}{\partial \delta}(0)=0$.
- $G$ is convex in $\beta$, and $\frac{\partial G}{\partial \beta}(\mathbf{0}, E)=\mathbf{0}$.

The above restrictions are not inherent to the proposed framework. Instead, some are necessary conditions but others are only sufficient conditions for achieving stability, satisfying the second law of thermodynamics, capturing the correct material behavior in the limits of infinitely slow and infinitely fast deformations, and preserving rigid body modes.

To completely determine the constitutive law of viscoelasticity resulting from the framework outlined above, one could propose further expressions for $\boldsymbol{W}, \boldsymbol{V}$, and $G$; and identify experimentally any associated constants, including those defining the fourth-order tensors $\mathrm{C}_1$ and $\mathrm{C}_2$. Instead, it is proposed here to learn all these quantities in a data driven approach, using ANNs that enforce hard constraints.

Remark 3. At this point, it is noted that the two-potential approaches for constructing ANNs that model viscoelasticity recently proposed in $[26,27]$ enforce the second law of thermodynamics, weakly and strongly, respectively. On the other hand, the alternative approach proposed in this work, which requires three rather than two potentials and constrains them to satisfy strongly the set of mathematical properties outlined above, enforces on an ANN the longer list of constraints discussed in Section 3.2 to make it truly mechanics-informed. Hence, this approach constitutes a significant advancement of the state of the art of mechanics-informed deep learning for data-driven nonlinear viscoelasticity.
4. Learning nonlinear viscoelasticity using deep artificial neural networks
4.1. Data-driven modeling a nonlinear viscoelastic material using deep artificial neural networks

Three ANNs are introduced here: $\mathcal{N}_\theta$ parameterized by $\theta$ to leam the potential $W ; \mathcal{M}_{\Phi}$ parameterized by $\Phi$ to learn the potential $V$; and $D_r$ parameterized by $\Gamma$ to learn the potential $G$. Each of these is forced to be convex with respect to its appropriate input(s)


- as per the requirements identified in Section 3.3 - by altering the architecture of the ANN as in [29]. The alteration involves, among others, adopting SoftplusSquared activation functions as these have desirable properties that are discussed in [22].

Specifically, each of the aforementioned potentials is represented by its ANN as follows:
- Flastic component $W$ of the strain energy density function $U$

$$
W_{(\theta)}(\hat{E})=\mathcal{N}_\theta(\hat{E})-\frac{\partial \mathcal{N}_\theta}{\partial \hat{E}}(0) \cdot \hat{E}
$$

where $\mathcal{N}_{\boldsymbol{\theta}}$ is convex.
- Viscoelastic component $V$ of $U$

$$
V_{(\Phi)}(\hat{E}, \hat{\alpha})=\mathcal{M}_\phi\left(C_1 \hat{E}+C_2 \hat{\alpha}\right)-\frac{\partial \mathcal{M}_\phi}{\partial \hat{\delta}}(0) \cdot\left(C_1 \hat{E}+C_2 \hat{\alpha}\right)
$$

where $\mathcal{M}_{\Phi}$ is convex in $\hat{\delta}=C_1 \hat{E}+C_2 \hat{\alpha}, C_1$ and $C_2$ are the counterparts of $C_1$ and $C_2$, respectively, that are compatible with the algebra for quantities written using the Voigt notation - specifically, $C_1, C_2 \in R^{d^{\prime} x d^{\prime}}$ and $C_2$ is non-singular. Since $C_1$ and $C_2$ parameterize the potential $V_{(\Phi)}$, they are included in the set of parameters $\Phi$ as this simplifies the notation.
- Dissipation potential $G$ associated with the evolution of the strain-like internal variable $a$

$$
G_{(\sigma)}(\hat{\beta}, \hat{E})=D_r(\hat{\beta}, \hat{E})-\frac{\partial D_r}{\partial \hat{\beta}}(0, \hat{E}) \cdot \hat{\beta}
$$

where $\mathcal{D}_r$ is convex in $\hat{\boldsymbol{\beta}}$.
As such, the representations (23) to (25) ensure that the generated material law satisfies all restrictions derived in Section 3 . From (2), (20), and these representations, it follows that the stress predicted by the above ANN-based models is parameterized by $(\Theta, \Phi, \Gamma)$ and can be written as

$$
\hat{S}_{(\Theta, \Phi, n)}(t)=\hat{\tilde{S}}_{(\Theta, \Phi, n)}\left(\hat{\boldsymbol{E}}(t), \hat{\alpha}_{(\Phi, n)}(t)\right)=\frac{\partial W_{(\Theta)}}{\partial \hat{\boldsymbol{E}}}(\hat{\boldsymbol{E}}(t))+\frac{\partial V_{(\Phi)}}{\partial \hat{\boldsymbol{E}}}\left(\hat{\boldsymbol{E}}(t), \hat{\alpha}_{(\Phi, n)}(t)\right)
$$

where $\dot{\alpha}$ evolves as

$$
\hat{\boldsymbol{\alpha}}_{(\Phi, n)}(l)=\frac{\partial G_{(I)}}{\partial \hat{\beta}}\left(\hat{\beta}_{(\Phi, n)}(l), \widehat{E}(l)\right)
$$

and the internal stress tensor is determined as

$$
\hat{\beta}_{(\phi, r)}(t)=-\frac{\partial V_{(\phi)}}{\partial \hat{\boldsymbol{\alpha}}}\left(\hat{\boldsymbol{E}}(t), \hat{a}_{(\phi, I)}(t)\right)
$$

In practice, given a time dependent strain history, one predicts the corresponding time dependent stress by discretizing Eq. (27) using any preferred time-integration scheme and advancing the prediction forward in time recursively. In the context of a FE simulation, each time-step requires evaluating each of Eqs. (26) to (28) once and storing the internal variable for tuse in the subsequent time-step.
4.2. Training the model parameters

Given $N_p$ sets of conjugate stress-strain time histories $\left\{\left\{\left(\hat{E}^\rho\left(t_m\right), \hat{S}^p\left(t_m\right)\right)\right\}_{m=1}^{N_m}\right\}_{p=1}^{N_p}$ along different strain paths, the ANN representations (23) to (25) can be trained as usual - that is, by computing values of the model parameters that minimize the least square error between the stress data and the predicted model stress (26), which can be written as

In (29) above, $\hat{\alpha}_{(\hat{p}, j)}^{\left(t_m\right)}$ can be determined at each training epoch by time-integrating Eq. (27) - for example, using the computationally economical forward Euler scheme, in which case at each time step $t_m$,

$$
\hat{\alpha}_{(\hat{\beta}, y)}^p\left(t_m\right)=\tilde{\alpha}_{(\hat{p} y)}^p\left(t_{m-1}\right)+\Delta t_m \frac{\partial G_{(\gamma)}}{\partial \hat{\beta}}\left(\tilde{\rho}_{(\phi, y)}^p\left(t_{m-1}\right), \hat{E}^p\left(t_{m-1}\right)\right)
$$

and from (28),

$$
\hat{\rho}_{(\phi, y)}^p\left(t_{m-1}\right)=-\frac{d V_{(\phi)}}{d \hat{\alpha}}\left(\hat{E}^p\left(t_{m-1}\right), \hat{\alpha}_{(\phi, j)}^p\left(t_{m-1}\right)\right)
$$


In the standard training procedure outlined above, each discrete value of the internal variable $\hat{\alpha}_{(\phi) p}^p\left(t_m\right)$ depends on all its counterparts at earlier times. Consequently, the loss function in (29) cannot be decomposed as the sum of independent terms and hence its evaluation and that of its gradients cannot be parallelized. Moreover, the training data cannot be batched and therefore

prominent algorithms such as the Stochastic Gradient Descent (SGD) method cannot be used to determine the optimal values of the model parameters. Furthermore, evaluating the gradients of the loss function in (29) at time $t_m$ requires in this case propagating the gradients of the ANNs all the way to time $t_0$. Hence, it is vulnerable to the vanishing gradient problem that recurrent neural networks (RNNs) suffer from [34] and is computationally intensive. Hence, for all these reasons, the standard training procedure defined by (29) to (31) is computationally intractable.

Therefore, it is proposed here instead to enforce the evolution of the internal variable $\hat{\boldsymbol{\alpha}}^p$ corresponding to the training strain path $p$ in differential form within an augmented loss function; represent it for this purpose by a surrogate model $\hat{\alpha}_{(v)}^p(t)$ constructed using an auxiliary ANN $\mathcal{A}_T^\rho$ parameterized by $\Psi$; and enforce $\hat{\alpha}_{(\bar{F})}^{\partial}\left(f_0\right)=\hat{\alpha}_0$ by performing the surrogate representation specifically as follows

$$
\dot{\alpha}_{(F)}^p(t)=\mathcal{A}_\psi^p(t)-\mathcal{A}_{\Gamma}^p\left(t_0\right)+\dot{\alpha}_0
$$


This leads to the altemative training procedure

$$
\begin{aligned}
& \left.+\lambda_k\left(\frac{\mathrm{d} \hat{a}_{(p) k}^p}{\mathrm{~d} t}\left(t_m\right)-\frac{\partial G_{(j)}}{\partial \widehat{\beta}_k}\left(\hat{\beta}_{(\phi, p)}^p\left(t_m\right), \hat{E}^p\left(t_m\right)\right)\right)^2\right]
\end{aligned}
$$

where $\lambda_k \in \mathbf{R}^{+}$are penalty parameters chosen as described in Appendix and, in view of (28),

$$
\hat{\beta}_{(\phi, w)}^\rho\left(t_m\right)=-\frac{\partial V_{(\phi)}}{\partial \dot{\alpha}}\left(\hat{E}^p\left(t_m\right), \dot{\alpha}_{(p)}^P\left(t_m\right)\right)
$$


Unlike the standard training procedure, the alternative one outlined above and equipped with the algorithm described in Appendix for determining the penalty parameters $\lambda_k \in \mathbf{R}^{+}$is amenable to parallelization and batch data processing. Furthermore, it avoids the deep propagation of ANN gradients. This alternative training procedure is not only computationally tractable, but also fast. Most importantly, it has proved in numerous tests to be robust.

Remark 4. The internal variable $\hat{\alpha}^P$ corresponding to the training strain path $p$ is represented by the ANN-based surrogate model $\hat{\tilde{\alpha}}_{(Y)}^p$ only during the offline training procedure. Online, the trained viscoelastic constitutive model is evaluated using Fis. (26) to (28) which treat all mechanics-informed constraints as hard constraints.
4.3. Generation of struin paths

Unlike the stress response of an elastic material, that of a viscoelastic one can be highly dependent on the strain path it experiences. Therefore, it is important to train the model to be generated on strain paths that represent well the possible strain time-histories and can influence the response of the material. Here, such paths are generated stochastically using the two-step approach described below.

In Step $1, K$ vectorized strain data points are generated in a time-interval $\left[t_0, T_f\right]$ using the random walk approach described in [35], as follows

$$
\hat{\boldsymbol{E}}_I^{(i+1)}=\hat{\boldsymbol{E}}_I^{(i)}+\sqrt{\frac{T_f}{K-1}} \varepsilon v, \quad i=0, \ldots, K-1
$$

where $\hat{\boldsymbol{E}}_I \in \mathbf{R}^{d^{\prime}}, v \in[-1,1]^{d^{\prime}}$ is a vector of uniform random variables, $\mathcal{E}$ is a parameter controlling the step-size of the random walk, $T_f$ denotes the final time of the strain path, and the subscript $I$ designates the $K$ points as interpolation data. Here, $\widehat{\boldsymbol{E}}_I^{(0)}=\widehat{\boldsymbol{E}}_0$ is chosen. Typically, the relevant strain range [ $\left.\widehat{E}_{\text {min }}, \widehat{E}_{\max }\right]$ is known a priori, in which case it is desirable that the generated strain paths do not exceed its bounds.

Therefore, noting that any randomly generated vectorized strain data point must be such that the squares of the principal stretches - which are given by the eigenvalues of $2 \boldsymbol{E}+\boldsymbol{I}$ - are positive, the random walk generator (34) is adapted next so that: (a) each generated data point $\hat{E}_I^{(j)}$ is interpreted as a candidate data point; and (b) this candidate point is tested for acceptance/rejection using the condition

$$
\exists j:\left(E_{I j}^{(i+1)}>E_{\max , j}\right) \vee\left(E_{I j}^{(l+1)}<E_{\min , j}\right) \vee\left(\min \left(\operatorname{eig}\left(2 E_I^{(i+1)}+I\right)\right) \leq 0\right)
$$

where cig $(\dagger$ ) denotes the set of eigenvalues of the matrix $\dagger$. If the above condition is satisfied, $v$ is redrawn randomly and the current step of the random walk is repeated until condition (35) is not satisfied (effectively, this amounts to placing reflecting boundaries on the lower and upper limits of the feasible strain range).

In Step 2, the $K$ vectorized strain data points generated as described above are interpolated using piecewise Hermite polynomials - that is, one Hermite interpolant for each two successive data points $\hat{\boldsymbol{E}}_I^{(i)}$ and $\hat{\boldsymbol{E}}_I^{(i+1)}$. This ensures that a strain path generated using a coarse random walk is sufficiently smooth and differentiable. Finally, such a generated strain path is sampled at $N_m$ vectorized strain data points in the time-interval $\left[t_0, T_f\right]$ using a sampling step-size sufficiently smaller than $\frac{T_f}{K-1}$ to ensure a sufficient number of training data points.

5. Applications and performance assessments

Here, the proposed mechanics-informed deep learning framework for data-driven nonlinear viscoelasticity is illustrated using four different applications for which its performance is also assessed:
- First, an academic problem, where the framework is applied to train an ANN -based viscoelastic model on synthetic data generated by a nonlinear convolution integral whose integrand includes a nonlinear functional form associated with a viscoelastic material. The purpose of this problem is to demonstrate the ability of the mechanics-informed three potential framework summarized in Section 3.3 to capture nonlinear kernel convolutions that lack a trivial internal variable dual formulation.
- Next, a coupon problem, where the framework is applied to construct an ANN for representing a viscoelastic material law generated by a FE-based multi-scale homogenization of a woven fabric. The purpose here is to demonstrate the framework's ability to teach an ANN a material law generated by the homogenization of a nonlinear viscoelastic material for the purpose of accelerating multi-scale computations.
- Then, a FE analysis of a membrane characterized by the material law generated in the previous application and uni-axial loading and unloading at different strain rates. The purpose of this application is to demonstrate that the ANN-based model constructed using the proposed framework exhibits hysteresis only at intermediate strain rates and otherwise reproduces an elastic behavior, thanks to its mechanics-based hard constraints.
- And finally, FE simulations of the bouncing of a rigid ball on a viscoelastic membrane associated with different material models for the membrane, including that generated in the previous application. The purpose of these simulations is two-fold: to demonstrate the impact on the numerical stability of a FE analysis of constitutive laws generated by different data-driven approaches; and to highlight the ability of the proposed framework for data-driven constitutive modeling to generate a viscoelastic material law that not only promotes the numerical stability of a FE analysis, but also enables it to reproduce expected dynamic behaviors.

In all cases:
- The following settings are adopted: $t_0=0, \alpha_0=0$, and $E_0=\mathbf{0}$.
- Attention is restricted to plane-stress states $\left(S^Z=\left[S_{33} S_{23} S_{13}\right]^T=0\right)$, where the constitutive law is a mapping between the time-history of the in-plane strains $E^M=\left[\begin{array}{lll}E_{11} & E_{22} & 2 E_{12}\end{array}\right]^T$ and the in-plane stresses $S^M=\left[\begin{array}{lll}S_{11} & S_{22} & S_{12}\end{array}\right]^T$. Thus, attention is focused on the cases corresponding to $d=2$ and $d^{\prime}-3$. For this reason, it is noted that all arguments and methods presented in Section 4 remain valid when all occurrences of $\widehat{E}$ and $\widehat{S}$ in that section are replaced by $\boldsymbol{E}^M$ and $\boldsymbol{S}^M$, respectively.
- The ANN-based material models are constructed and trained using the machine learning framework PyTorch; and the FE analyses are performed using the nonlinear FE structural analyzer AERD-S [36,37] equipped with the multi-scale homogenization method described in [16].
5.1. Learming synthetic data generated by a nonlinear convolution

Given a strain path, synthetic viscoelastic stress data is generated here using the nonlinear convolution integral

$$
S^M(t)=A E^M(t)+\int_0^t K(t-\tau) f\left(E^M(\tau), E^M(\tau)\right) \mathrm{d} \tau
$$

where

$$
\begin{aligned}
K(t-\tau) & =e^{-\boldsymbol{Q}(t-\tau)} \\
f\left(E^M, \dot{E}^M\right) & =D \dot{E}^M\left(E^M, E^M\right)
\end{aligned}
$$

$A, \boldsymbol{O}$, and $D \in \operatorname{Sym}_3^2(\mathbb{R}) \geq 0$ are constant, and the elastic component of the viscoelastic material represented by (36) is therefore linear. To this end, it is recalled that the mechanics-informed, data-driven framework for learning nonlinear elasticity developed in [22] and summarized in Section 2, which is incorporated by design in the counterpart framework proposed in this paper for nonlinear viscoelasticity, was already shown in [22] to capture well nonlinear elasticity. Thus the focus is placed here on the ability of the new framework to represent nonlinear viscoelasticity.

To this end, three strain paths are generated as described in Section 4.3: two to train the ANN-based model; and one to test its predictive capability. Fig. 1 shows how well the trained ANN-based model fits to the training data, despite the highly oscillatory nature of the strain path. Fig. 2 highlights the predictive capability of the constructed ANN-based material model - that is, its performance for an "unseen" strain path - which is impressive, considering that training is performed in this case on only two strain paths. Collectively, the results reported in Figs. 1 and 2 demonstrate the ability of the ANN-based model to learn the structure of the constitutive law underlying the training data, rather than just memorizing it. Without the mechanics-informed constraints developed in this paper, this task would require a significantly larger amount of data and many more training strain paths. Furthermore, the aforementioned results suggest a potential equivalence between the nonlinear internal variable approach adopted in the proposed framework and nonlinear kemel convolutions.

5.2. Learning a homogenized viscoelastic material law at the coupon level to accelerate multi-scale computations

Multi-scale homogenization, which is critical for the design and development of fibrous materials with enhanced performance and tailorable properties, is typically computationally intractable - for example, when performed using the concept of a locally attached micro-structure [38-41] or more specifically, the "FE squared" ( $\mathrm{FE}^2$ ) method [40,42]. The recent development of a variety of surrogate modeling techniques - including nonlinear, projection-based model order reduction [43-45], kriging [46], and ANN