### Representation of a Nonlinear Elastic Law by Black box ANN

- **Tensor Definitions**:
  - Green-Lagrange strain tensor: $\mathbf{E} \in \operatorname{Sym}_3(\mathbb{R})$
  - Second Piola-Kirchhoff stress tensor: $\mathbf{S} \in \operatorname{Sym}_3(\mathbb{R})$

- **Constitutive Mapping**:
  - Constitutive mapping between vectorized tensors: $\widehat{\mathbf{S}} = \mathbf{G}(\widehat{\mathbf{E}})$
  - $\mathbf{G}: \mathbb{R}^6 \rightarrow \mathbb{R}^6$
  - Voigt notation for strain tensor: 
    $$\widehat{\mathbf{E}} = [E_{11}, E_{22}, E_{33}, 2E_{23}, 2E_{13}, 2E_{12}]^T$$
  - Voigt notation for stress tensor: 
    $$\widehat{\mathbf{S}} = [S_{11}, S_{22}, S_{33}, S_{23}, S_{13}, S_{12}]^T$$

- **Algorithm (Forward Pass of ANN)**:
  - **Input**: $x^{(0)}$
  - **Output**: $\boldsymbol{y}$
  - **Loop for $l = 1, \ldots, N_l - 1$**:
    - $$\boldsymbol{x}^{(l)} = \phi\left(\mathcal{W}^{(l)} \boldsymbol{x}^{(l-1)} + \mathbf{b}^{(l)}\right)$$
  - **End Loop**
  - **Final Output**:
    - $$\boldsymbol{y} = \mathcal{W}^{(N_l)} \boldsymbol{x}^{(N_l-1)} + \mathbf{b}^{(N_l)}$$

- **Training Data**:
  - Set of $N_m$ strain-stress data pairs:
    $$\left\{\left(\widehat{\mathbf{E}}^{(m)}, \widehat{\mathbf{S}}^{(m)}\right)\right\}_{m=1}^{N_m}$$

- **Loss Function (Squared Error)**:
  - To determine the ANN parameters:
    $$\theta = \underset{\beta \in \mathbb{R}^{N_\theta}}{\operatorname{argmin}} \sum_{m=1}^{N_m} \sum_{k=1}^d \left(\left(\mathcal{N}_\beta\left(\widehat{\mathbf{E}}^{(m)}\right)\right)_k - \widehat{S}_k^{(m)}\right)^2$$
  - Where:
    - $\left(\right)_k$: $k$-th entry of vector $\left(\right)$
    - $d$: Dimensionality of vectorized stress tensor (generally $d=6$ for a solid)

- **Scaled Loss Function**:
  - To address stress components with varying magnitudes:
    $$\theta = \underset{\beta \in \mathbb{R}^{N_\theta}}{\operatorname{argmin}} \sum_{m=1}^{N_m} \sum_{k=1}^d \left(\frac{\mathcal{N}_\beta\left(\left(\widehat{\mathbf{E}}^{(m)}\right)\right)_k - \widehat{S}_k^{(m)}}{\sigma_k}\right)^2$$
  - Where $\sigma_k$ denotes the component-wise standard deviation of the training stress data.
  - the loss function is minimized using a stochastic gradient descent algorithm
  - gradients with respect to each of the ANN’s parameters are obtained exactly through automatic  differentiation.




- **Objective of Approximation with ANN**:
  - Approximate $\mathbf{G}$ with ANN as: $\mathbf{G} \approx \mathcal{N}_\theta$
  - ANN parameters: 
    $$\theta=\bigcup_{l=1}^{N_l}(\mathcal{W}^{(l)}, \mathbf{b}^{(l)})$$
    - $\mathbf{W}^{(l)}$: Tensor-valued parameter
    - $\mathbf{b}^{(l)}$: Vector-valued parameter

- **Activation Function**:
  - Nonlinear activation function used: $\phi: \mathbb{R} \rightarrow \mathbb{R}$





### Mechanics-Based Model Constraints

- **General Challenges of ANN Models for Nonlinear Elasticity**:
  - When the size of an defined ANN is allowed to grow, it can fit well, in the least squares sense, to data from materials governed by complicated nonlinear elastic laws.
  - However, as a **phenomenological model**, it can violate fundamental principles in mechanics, making it less suitable for numerical simulations. This can be due to:
    - Imperfect training
    - Noisy training data
    - Overfitting
  - Lack of interpretability of ANNs makes it difficult to evaluate the physical soundness of the model parameters after training.

- **Importance of Mechanics-Based Constraints**:
  - **Objective**: To enforce mechanics-based constraints in the construction of a data-driven constitutive law to ensure physical validity.
  - **Advantages**:
    - Embedding **a priori knowledge** of mechanics in a data-driven model helps to:
      - Favor learning the structure of a constitutive relation over overfitting.
      - Reduce the model's sensitivity to noisy data.
      - Promote robustness to inputs outside the training domain.
    - Mechanics-based constraints act as a form of **regularization**.
  

### C1. Dynamic Stability

- **Definition of Dynamic Stability**:
  - it is defined as the **ability of a system to always maintain finite kinetic energy when finite work is performed on it**.
    $$\mathbf{S}(\mathbf{E}) = \frac{\partial W}{\partial \mathbf{E}}(\mathbf{E})$$
    - Where $W: \operatorname{Sym}_3(\mathbb{R}) \rightarrow \mathbb{R}$ represents the **strain energy density** of the body.

- **Challenges of Standard ANN Approaches**:
    $$\widehat{\mathbf{S}} = \mathcal{N}_\theta(\widehat{\mathbf{E}})$$
    cannot be expected to **necessarily satisfy** the Dynamic Stability.
  - Even with **noise-free data** and **zero loss convergence**, the **interpolation** and **extrapolation** by a standard ANN inside and outside the training domain are not guaranteed to be **conservative**.


- **Proposed Approach for Guaranteeing Dynamic Stability**:
  - To ensure Dynamic Stability for **arbitrary strain inputs**, it is proposed to represent the constitutive law using an ANN that learns the **strain energy density function**:
    $$W = \mathcal{N}_\theta(\widehat{\mathbf{E}})$$
  
- **Novel Training Approach**:
  - The ANN parameters $\theta$ are determined by minimizing:
    $$\theta = \underset{\beta \in \mathbb{R}^{N_\theta}}{\operatorname{argmin}} \sum_{m=1}^{N_m} \sum_{k=1}^d \left( \frac{\frac{\partial \mathcal{N}_\beta}{\partial \hat{\mathrm{E}}_k}\left(\widehat{\mathbf{E}}^{(m)}\right) - \widehat{S}_k^{(m)}}{\sigma_k} \right)^2$$
  
- **Training Process**:
  - The **weights** of the ANN are trained so that the **partial derivatives** of the network with respect to the input match the training stress data.
  - This promotes the learning of a **strain energy density function** (up to an irrelevant additive constant).
  
- **Obtaining Stresses**:
  - After training, the **stresses** can be obtained by differentiating the ANN with respect to the strains:
    $$\widehat{\mathbf{S}} = \frac{\partial \mathcal{N}_\theta}{\partial \widehat{\mathbf{E}}}(\widehat{\mathbf{E}})$$
  
- **Resulting Properties**:
  - The resulting strain-stress mapping is **unconditionally Dynamicly Stable** by construction.
  - This holds **regardless** of the strain input or the smallest value attained by the training loss.

    

- **Leveraging Reverse Mode Automatic Differentiation**:
  - **Reverse mode automatic differentiation** can be used to **exactly differentiate** a trained ANN and achieve **computational efficiency** during the online computation of stresses.
  - It is particularly efficient for obtaining the **Jacobian** of a function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$, where $m \gg n$.
    - In the context of this article, where the ANN learns the **strain energy density**, $n = 1$ and $m = 6$.
  - The **gradient** of the ANN with respect to all its inputs can be obtained at roughly the **same computational cost** as for a single function evaluation.
  - In contrast, using **finite differencing** to compute the gradient would require **at least $m + 1$ function evaluations** and would be affected by **numerical errors**.

- **Eager Execution vs. Graph Execution**:
  - There is a notable difference in the **online computational cost** between **eager execution** and **graph execution** of the ANN model.
  - **Eager Execution**:
    - Interprets the code and executes it in **real-time**.
    - The evaluation of the constitutive law involves evaluating $W$, constructing the **backwards graph**, and propagating through the backwards graph.
    - This approach introduces **unnecessary computations** and **software-related computational overhead**.
  - **Graph Execution**:
    - Interprets the code as a **graph**.
    - The backwards graph is constructed and **compiled offline**.
    - Online evaluation of the constitutive law requires only **propagation through a single graph** that directly relates the strains to the stresses.
    - This approach is **computationally more economical**.


### C2: Objectivity

- **Concept of Objectivity**:
  - **Objectivity** is the concept of **material frame indifference**—the position or orientation of an observer should not affect any quantity of interest.

- Satisfeis by writing energy density function as a function of Green–Lagrange strain
    $$W = \mathcal{N}_\theta(\widehat{\mathbf{E}})$$



### C3: Material Stability

- **Concept of Material Stability**:
  - **Material stability** ensures that **small loads** do not lead to **arbitrary deformations**.
- We need to enforce convexity condition on  $W = \mathcal{N}_\theta(\widehat{\mathbf{E}})$ with respect to its input. We shoud use input-convex neural networks (ICNNs).
  1. All weights, except those connected directly to the input, are non-negative.
  2. Activation functions are convex and non-decreasing.

- To ensure positive weights compatible with gradient-based optimization, weights are expressed as:
  $$
  \mathcal{W}_{ij}^{(l)} = \operatorname{Softplus}(\mathcal{Q}_{ij}^{(l)}; \alpha) = \frac{1}{\alpha^2} \log\left(1 + e^{\alpha^2 Q_{ij}^{(l)}}\right)
  $$

  where $\mathcal{Q}_{ij}^{(l)}$ and $\alpha$ are trainable parameters, always resulting in non-negative weights.

- Passthrough layers with unconstrained weights $\widetilde{W}_{ij}^{(l)}$ can also be included to improve predictive power without sacrificing convexity.

- Any convex function $f$ of the input vector $\boldsymbol{x}$ can be added to the ANN's output $y$.
    - In this work, the following convex function is used:
    $$
    f(\boldsymbol{x}) = \boldsymbol{x}^T \mathbf{A}^T \mathbf{A} \boldsymbol{x}
    $$
    - Here, $\mathbf{A}$ is a matrix-valued trainable parameter of the ANN.

- Parameters of the Network**:
    - The complete set of parameters for the network is given by:
    $$
    \theta = \left( \mathbf{A}, \alpha, \bigcup_{l=1}^{N_l} \left( \mathcal{Q}^{(l)}, \widetilde{\boldsymbol{w}}^{(l)}, \mathbf{b}^{(l)} \right) \right)
    $$
    - $\mathcal{Q}$ and $\widetilde{\mathcal{W}}$ are matrices that collect the parameters $\mathcal{Q}_{ij}$ and $\widetilde{\boldsymbol{W}}_{ij}$, respectively.

- ICNN Algorithm

  - **Input**: $\boldsymbol{x}^{(0)}$
  - **Output**: $y$

  1. $\boldsymbol{x}^{(1)} = \phi\left(\widetilde{\mathcal{W}}^{(1)} \boldsymbol{x}^{(0)} + \mathbf{b}^{(1)}\right)$
  2. **For** $l = 2, \ldots, N_l - 1$:
    $$
    \boldsymbol{x}^{(l)} = \phi\left(\operatorname{Softplus}\left(\mathcal{Q}^{(l)}; \alpha\right) \boldsymbol{x}^{(l-1)} + \mathbf{b}^{(l)} + \widetilde{\mathcal{W}}^{(l)} \boldsymbol{x}^{(0)}\right)
    $$
  3. **End For**
  4. $y = \operatorname{Softplus}\left(\mathcal{Q}^{(N_l)}; \alpha\right) \boldsymbol{x}^{(N_l-1)} + f(\boldsymbol{x}^{(0)})$

- Activation Function Requirement

  To differentiate the ANN (for both stress computation and tangent modulus), the activation function must be:
  - At least twice differentiable
  - Convex and non-decreasing
  - Have non-vanishing second derivatives

  Popular activation functions such as tanh, ReLU, ELU, and Softplus do not meet these requirements. Therefore, a new activation function, **SoftplusSquared**, is proposed:

  $$
  \phi(z) = \operatorname{SoftplusSquared}(z; \beta) = \frac{1}{2\beta^4} \log\left(1 + e^{\beta^2 z}\right)^2
  $$

  where $\beta$ is a trainable parameter that controls the curvature of the function at the origin.





### C4: Consistency

1. **Definition of Consistent Material Law**:
    - A consistent material law ensures that a numerical computation, such as a finite element (FE) analysis, preserves the rigid body modes of a structure.
    - This means mapping a state of zero strain onto a state of zero stress, ensuring $\mathbf{S}(0) = 0$.

2. **Problem with Standard ANN Models**:
    - Standard regression ANN models may violate this property even if the training data is consistent.
    - Such a violation causes issues in numerical simulations, leading to deformation without load or prescribed displacement.

3. **Proposed Solution**:
    - To ensure consistency, the strain energy density function is represented as a combination of the ANN model and a linear correction term:
    $$
    W(\widehat{\mathbf{E}}) = \mathcal{N}_\theta(\widehat{\mathbf{E}}) + \mathbf{h} \cdot \widehat{\mathbf{E}}
    $$
    - The stress is derived as:
    $$
    \widehat{\mathbf{S}}(\widehat{\mathbf{E}}) = \frac{\partial W}{\partial \widehat{\mathbf{E}}}(\widehat{\mathbf{E}}) = \frac{\partial \mathcal{N}_\theta}{\partial \widehat{\mathbf{E}}}(\widehat{\mathbf{E}}) + \mathbf{h}
    $$

4. **Ensuring Consistency**:
    - By choosing $\mathbf{h} = -\frac{\partial \mathcal{N}_\theta}{\partial \hat{\mathbf{E}}}(0)$, the desired consistency property is guaranteed.
    - This approach ensures that the ANN preserves both hyperelasticity and convexity of the strain energy density function.

5. **Embedding Correction in Training**:
    - To maintain accuracy, the correction should be embedded directly into the training procedure, rather than applied afterward.
    - This is done by modifying the loss function:
    $$
    \theta = \underset{\beta \in \mathbb{R}^{N_\theta}}{\operatorname{argmin}} \sum_{m=1}^{N_m} \sum_{k=1}^d \left( \frac{\frac{\partial \mathcal{N}_\beta}{\partial \widehat{\mathrm{E}}_k}\left(\widehat{\mathbf{E}}^{(m)}\right) - \frac{\partial \mathcal{N}_\beta}{\partial \widehat{\mathrm{E}}_k}(0) - \widehat{S}_k^{(m)}}{\sigma_k} \right)^2
    $$
