### Representation of a Nonlinear Elastic Law by Black box ANN

- **Tensor Definitions**:
  - Green-Lagrange strain tensor: $\mathbf{E} \in \operatorname{Sym}_3(\mathbb{R})$
  - Second Piola-Kirchhoff stress tensor: $\mathbf{S} \in \operatorname{Sym}_3(\mathbb{R})$

- **Constitutive Mapping**:
  - Constitutive mapping between vectorized tensors: $\widehat{\mathbf{S}} = \mathbf{G}(\widehat{\mathbf{E}})$
  - $\mathbf{G}: \mathbb{R}^6 \rightarrow \mathbb{R}^6$
  - Voigt notation for strain tensor: 
    $$\widehat{\mathbf{E}} = [E_{11}, E_{22}, E_{33}, 2E_{23}, 2E_{13}, 2E_{12}]^T$$
  - Voigt notation for stress tensor: 
    $$\widehat{\mathbf{S}} = [S_{11}, S_{22}, S_{33}, S_{23}, S_{13}, S_{12}]^T$$

- **Algorithm (Forward Pass of ANN)**:
  - **Input**: $x^{(0)}$
  - **Output**: $\boldsymbol{y}$
  - **Loop for $l = 1, \ldots, N_l - 1$**:
    - $$\boldsymbol{x}^{(l)} = \phi\left(\mathcal{W}^{(l)} \boldsymbol{x}^{(l-1)} + \mathbf{b}^{(l)}\right)$$
  - **End Loop**
  - **Final Output**:
    - $$\boldsymbol{y} = \mathcal{W}^{(N_l)} \boldsymbol{x}^{(N_l-1)} + \mathbf{b}^{(N_l)}$$

- **Training Data**:
  - Set of $N_m$ strain-stress data pairs:
    $$\left\{\left(\widehat{\mathbf{E}}^{(m)}, \widehat{\mathbf{S}}^{(m)}\right)\right\}_{m=1}^{N_m}$$

- **Loss Function (Squared Error)**:
  - To determine the ANN parameters:
    $$\theta = \underset{\beta \in \mathbb{R}^{N_\theta}}{\operatorname{argmin}} \sum_{m=1}^{N_m} \sum_{k=1}^d \left(\left(\mathcal{N}_\beta\left(\widehat{\mathbf{E}}^{(m)}\right)\right)_k - \widehat{S}_k^{(m)}\right)^2$$
  - Where:
    - $\left(\right)_k$: $k$-th entry of vector $\left(\right)$
    - $d$: Dimensionality of vectorized stress tensor (generally $d=6$ for a solid)

- **Scaled Loss Function**:
  - To address stress components with varying magnitudes:
    $$\theta = \underset{\beta \in \mathbb{R}^{N_\theta}}{\operatorname{argmin}} \sum_{m=1}^{N_m} \sum_{k=1}^d \left(\frac{\mathcal{N}_\beta\left(\left(\widehat{\mathbf{E}}^{(m)}\right)\right)_k - \widehat{S}_k^{(m)}}{\sigma_k}\right)^2$$
  - Where $\sigma_k$ denotes the component-wise standard deviation of the training stress data.
  - the loss function is minimized using a stochastic gradient descent algorithm
  - gradients with respect to each of the ANN’s parameters are obtained exactly through automatic  differentiation.




- **Objective of Approximation with ANN**:
  - Approximate $\mathbf{G}$ with ANN as: $\mathbf{G} \approx \mathcal{N}_\theta$
  - ANN parameters: 
    $$\theta=\bigcup_{l=1}^{N_l}(\mathcal{W}^{(l)}, \mathbf{b}^{(l)})$$
    - $\mathbf{W}^{(l)}$: Tensor-valued parameter
    - $\mathbf{b}^{(l)}$: Vector-valued parameter

- **Activation Function**:
  - Nonlinear activation function used: $\phi: \mathbb{R} \rightarrow \mathbb{R}$





### 3. Mechanics-Based Model Constraints

- **General Challenges of ANN Models for Nonlinear Elasticity**:
  - When the size of an ANN (as defined by Equation (2)) is allowed to grow, it can fit well, in the least squares sense, to data from materials governed by complicated nonlinear elastic laws.
  - However, as a **phenomenological model**, it can violate fundamental principles in mechanics, making it less suitable for numerical simulations. This can be due to:
    - Imperfect training
    - Noisy training data
    - Overfitting
  - Lack of interpretability of ANNs makes it difficult to evaluate the physical soundness of the model parameters after training.

- **Importance of Mechanics-Based Constraints**:
  - **Objective**: To enforce mechanics-based constraints in the construction of a data-driven constitutive law to ensure physical validity.
  - **Advantages**:
    - Embedding **a priori knowledge** of mechanics in a data-driven model helps to:
      - Favor learning the structure of a constitutive relation over overfitting.
      - Reduce the model's sensitivity to noisy data.
      - Promote robustness to inputs outside the training domain.
    - Mechanics-based constraints act as a form of **regularization**.
  
- **Mechanics-Based Constraints for ANN Constitutive Models**:
  - This section identifies and discusses **four mechanics-based constraints** that are crucial for a regression-based constitutive law.
  - The methods for enforcing these constraints in representing a nonlinear elastic law by a regression ANN are also proposed.
  - The constraints and associated enforcement methods are complementary—enforcing one does not compromise the formulation or enforcement of another.


### 3.1 Dynamic Stability

- **Definition of Dynamic Stability**:
  - The definition of dynamic stability of a mechanical system varies widely in the literature.
  - In this context, it is defined as the **ability of a system to always maintain finite kinetic energy when finite work is performed on it**.

- **Theorem 1**: 
  - A body described by an elastic material law is **dynamically stable** if and only if it is **hyperelastic**, meaning:
    $$\mathbf{S}(\mathbf{E}) = \frac{\partial W}{\partial \mathbf{E}}(\mathbf{E})$$
    - Where $W: \operatorname{Sym}_3(\mathbb{R}) \rightarrow \mathbb{R}$ represents the **strain energy density** of the body.

- **Proof (Inspired by Carroll)**:
  - The proof is based on observations originally made by R.S. Rivlin.
  - **Assumptions**:
    - All processes considered are **isothermal**.
    - All **traction** and **body force densities** are **time-independent** in the reference configuration.
  - **Notations**:
    - Reference configuration: $\Omega$
    - Boundary: $\Gamma = \partial \Omega$
    - Configuration-dependent quantity at time $t$: $(\bullet)_t$
  - **Rate of Change of Total Kinetic Energy**:
    $$\frac{\mathrm{d} K}{\mathrm{~d} t}(t) = -\int_{\Omega} \mathbf{S} : \frac{\mathrm{d} \mathbf{E}}{\mathrm{~d} t} \, \mathrm{d} \Omega + \int_{\Gamma_t} \mathbf{t}_t \cdot \dot{\mathbf{u}} \, \mathrm{d} \Gamma_t + \int_{\Omega_t} \rho_t \mathbf{b} \cdot \dot{\mathbf{u}} \, \mathrm{d} \Omega_t$$
    - Where:
      - $\mathbf{t}$: Traction
      - $\mathbf{b}$: Body force density
      - $\rho$: Material density
      - $\mathbf{u}$: Deformation state
      - $\dot{\mathbf{u}}$: Time derivative of $\mathbf{u}$
      - The symbol ":" denotes the **double contraction** between two tensors.

- **Transformation to Reference Configuration**:
  - To pull all integrals back to the reference configuration:
    $$\mathbf{t}_t \, \mathrm{d} \Gamma_t = \mathrm{d} \mathbf{f}_t = \mathbf{F}_t \, \mathrm{d} \mathbf{f} = \mathbf{F}_t \mathbf{F}_t^{-1} \mathbf{t} \, \mathrm{d} \Gamma = \mathbf{t} \, \mathrm{d} \Gamma$$


### 3.1 Dynamic Stability (Continued)

- **Transformation to Reference Configuration (Continued)**:
  - Material density transformation:
    $$\rho_t \, \mathrm{d} \Omega_t = \rho_t \, \operatorname{det}(\mathbf{F}_t) \, \mathrm{d} \Omega = \rho \, \mathrm{d} \Omega$$
    - Where $\mathbf{F}$ is the **deformation gradient** and $\mathbf{f}$ is the **force vector**.
  - The rate of change of total kinetic energy can be rewritten as:
    $$\frac{\mathrm{d} K}{\mathrm{~d} t}(t) = -\int_{\Omega} \mathbf{S} : \frac{\mathrm{d} \mathbf{E}}{\mathrm{~d} t} \, \mathrm{d} \Omega + \int_{\Gamma} \mathbf{t} \cdot \dot{\mathbf{u}} \, \mathrm{d} \Gamma + \int_{\Omega} \rho \mathbf{b} \cdot \dot{\mathbf{u}} \, \mathrm{d} \Omega$$

- **Cyclic Deformation and Change in Kinetic Energy**:
  - Consider a **cyclic deformation** from time $t_a$ to $t_b$, such that $\mathbf{u}(t_a) = \mathbf{u}(t_b)$ everywhere in $\Omega$, implying $\mathbf{E}(t_a) = \mathbf{E}(t_b)$.
  - The change in kinetic energy over the cycle is:
    $$\Delta K = \int_{t_a}^{t_b} \left( -\int_{\Omega} \mathbf{S} : \frac{\mathrm{d} \mathbf{E}}{\mathrm{~d} t} \, \mathrm{d} \Omega + \int_{\Gamma} \mathbf{t} \cdot \dot{\mathbf{u}} \, \mathrm{d} \Gamma + \int_{\Omega} \rho \mathbf{b} \cdot \dot{\mathbf{u}} \, \mathrm{d} \Omega \right) \mathrm{d} t$$

- **Internal Energy Consideration**:
  - For a cyclic process, the **internal energy change** is zero since the deformation and thermal states are identical at the beginning and end:
    $$\Delta U_{\text{int}} = \int_{t_a}^{t_b} \frac{\mathrm{d} U_{\text{int}}}{\mathrm{d} t} \, \mathrm{d} t = \int_{t_a}^{t_b} \int_{\Omega} \mathbf{S} : \frac{\mathrm{d} \mathbf{E}}{\mathrm{~d} t} \, \mathrm{d} \Omega \, \mathrm{d} t + \int_{t_a}^{t_b} \left( -\int_{\Gamma} \mathbf{n} \cdot \mathbf{q} \, \mathrm{d} \Gamma + \int_{\Omega} \rho q_s \, \mathrm{d} \Omega \right) \mathrm{d} t = 0$$
    - Where:
      - $\mathbf{q}$: **Heat flux**
      - $q_s$: **Heat supplied** per unit mass

- **Second Law of Thermodynamics**:
  - Using the **second law of thermodynamics**, it can be shown that the combination of heat flux and heat supply terms is non-positive:
    $$\int_{t_a}^{t_b} \left( -\int_{\Gamma} \mathbf{n} \cdot \mathbf{q} \, \mathrm{d} \Gamma + \int_{\Omega} \rho q_s \, \mathrm{d} \Omega \right) \mathrm{d} t \leq \int_{t_a}^{t_b} T \frac{\mathrm{d} S}{\mathrm{~d} t} = 0$$
    $$\therefore \int_{t_a}^{t_b} \int_{\Omega} \mathbf{S} : \frac{\mathrm{d} \mathbf{E}}{\mathrm{~d} t} \, \mathrm{d} \Omega \, \mathrm{d} t \geq 0$$
    - Where $S$ is the **total entropy** in the body.

- **Work Done by Body Forces and Tractions**:
  - Since body forces and tractions are assumed to be **time-independent**, they perform **zero net work** on the body, as shown in the subsequent expressions.


### 3.1 Dynamic Stability (Continued)

- **Work Done by Body Forces and Tractions**:
  - The work done by **tractions** over the cycle:
    $$\int_{t_a}^{t_b} \int_{\Gamma} \mathbf{t} \cdot \dot{\mathbf{u}} \, \mathrm{d} \Gamma \, \mathrm{d} t = \int_{\Gamma} \int_{t_a}^{t_b} \mathbf{t} \cdot \dot{\mathbf{u}} \, \mathrm{d} t \, \mathrm{d} \Gamma = \int_{\Gamma} \left( \mathbf{t} \cdot \mathbf{u}(t_b) - \mathbf{t} \cdot \mathbf{u}(t_a) \right) \, \mathrm{d} \Gamma = 0$$
  - The work done by **body forces** over the cycle:
    $$\int_{t_a}^{t_b} \int_{\Omega} \mathbf{b} \cdot \dot{\mathbf{u}} \, \mathrm{d} \Omega \, \mathrm{d} t = \int_{\Omega} \int_{t_a}^{t_b} \mathbf{b} \cdot \dot{\mathbf{u}} \, \mathrm{d} t \, \mathrm{d} \Omega = \int_{\Omega} \left( \mathbf{b} \cdot \mathbf{u}(t_b) - \mathbf{b} \cdot \mathbf{u}(t_a) \right) \, \mathrm{d} \Omega = 0$$

- **Change in Kinetic Energy Over a Cycle**:
  - The change in kinetic energy over a cycle can be rewritten as:
    $$\Delta K = K(t_b) - K(t_a) = -\int_{\Omega} \int_{t_a}^{t_b} \mathbf{S} : \frac{\mathrm{d} \mathbf{E}}{\mathrm{~d} t} \, \mathrm{d} t \, \mathrm{d} \Omega \leq 0$$
    - The **inequality** is a consequence of the earlier derived conditions.
  - **Intuitive Explanation**:
    - The **non-positivity** of the change in kinetic energy makes intuitive sense:
      - If $\Delta K$ were positive, given that the cyclic process can be repeated an arbitrary number of times, it would imply the possibility of generating an **infinite amount of kinetic energy** using zero net work, which is **non-physical**.
      - The result $\Delta K < 0$ is also not permissible, because reversing the direction of the deformation cycle (introducing a negative sign in the expression) would lead to $\Delta K > 0$.
      - Hence, the only **physical and dynamically stable** possibility is:
        $$\Delta K = 0$$
      - This is satisfied for arbitrary $\Omega$ and deformation paths if and only if the integrand of the outer integral in (12) identically vanishes.
  
- **Conclusion on Dynamic Stability**:
  - This condition implies:
    $$\mathbf{S} : \mathrm{d} \mathbf{E} = \mathrm{d} W$$
    - Which means $\mathbf{S}$ is a **perfect differential**.
    - **Equivalently**:
      $$\mathbf{S} = \frac{\partial W}{\partial \mathbf{E}}$$
    - This condition represents the **hyperelasticity** of the material and ensures **dynamic stability**.


### 3.2 Hyperelasticity via Learning Strain Energy Density

- **Challenges of Standard ANN Approaches**:
  - Even if the training data for the surrogate model comes from a **hyperelastic material**, a straightforward regression ANN mapping strains to stresses:
    $$\widehat{\mathbf{S}} = \mathcal{N}_\theta(\widehat{\mathbf{E}})$$
    cannot be expected to **necessarily satisfy** the hyperelastic condition (Equation 3).
  - Even with **noise-free data** and **zero loss convergence**, the **interpolation** and **extrapolation** by a standard ANN inside and outside the training domain are not guaranteed to be **conservative**.

- **Proposed Approach for Guaranteeing Hyperelasticity**:
  - To ensure hyperelasticity for **arbitrary strain inputs**, it is proposed to represent the constitutive law using an ANN that learns the **strain energy density function**:
    $$W = \mathcal{N}_\theta(\widehat{\mathbf{E}})$$
  - **Previous Approaches**:
    - Some previous works have relied on using **strain energy density as part of the training data**, ${}^{21}$ which is incompatible with **physical experiments**.
    - Other works have **approximated the hyperelasticity constraint** by weakly enforcing **symmetry of the tangent modulus**. ${}^{22}$

- **Novel Training Approach**:
  - The proposed method trains the ANN to implicitly **learn the integral of the data** instead of learning the data directly.
  - The ANN parameters $\theta$ are determined by minimizing:
    $$\theta = \underset{\beta \in \mathbb{R}^{N_\theta}}{\operatorname{argmin}} \sum_{m=1}^{N_m} \sum_{k=1}^d \left( \frac{\frac{\partial \mathcal{N}_\beta}{\partial \hat{\mathrm{k}}_k}\left(\widehat{\mathbf{E}}^{(m)}\right) - \widehat{S}_k^{(m)}}{\sigma_k} \right)^2$$
  
- **Training Process**:
  - The **weights** of the ANN are trained so that the **partial derivatives** of the network with respect to the input match the training stress data.
  - This promotes the learning of a **strain energy density function** (up to an irrelevant additive constant).
  
- **Obtaining Stresses**:
  - After training, the **stresses** can be obtained by differentiating the ANN with respect to the strains:
    $$\widehat{\mathbf{S}} = \frac{\partial \mathcal{N}_e}{\partial \widehat{\mathbf{E}}}(\widehat{\mathbf{E}})$$
  
- **Resulting Properties**:
  - The resulting strain-stress mapping is **unconditionally hyperelastic** by construction.
  - This holds **regardless** of the strain input or the smallest value attained by the training loss.


### 3.3 Computational Efficiency via Reverse Mode Automatic Differentiation

- **Remark 2: Leveraging Reverse Mode Automatic Differentiation**:
  - **Reverse mode automatic differentiation** can be used to **exactly differentiate** a trained ANN and achieve **computational efficiency** during the online computation of stresses.
  - It is particularly efficient for obtaining the **Jacobian** of a function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$, where $m \gg n$.
    - In the context of this article, where the ANN learns the **strain energy density**, $n = 1$ and $m = 6$.
  - The **gradient** of the ANN with respect to all its inputs can be obtained at roughly the **same computational cost** as for a single function evaluation.
  - In contrast, using **finite differencing** to compute the gradient would require **at least $m + 1$ function evaluations** and would be affected by **numerical errors**.

- **Eager Execution vs. Graph Execution**:
  - There is a notable difference in the **online computational cost** between **eager execution** and **graph execution** of the ANN model.
  - **Eager Execution**:
    - Interprets the code and executes it in **real-time**.
    - The evaluation of the constitutive law involves evaluating $W$, constructing the **backwards graph**, and propagating through the backwards graph.
    - This approach introduces **unnecessary computations** and **software-related computational overhead**.
  - **Graph Execution**:
    - Interprets the code as a **graph**.
    - The backwards graph is constructed and **compiled offline**.
    - Online evaluation of the constitutive law requires only **propagation through a single graph** that directly relates the strains to the stresses.
    - This approach is **computationally more economical**.


### 3.2 I Objectivity

- **Concept of Objectivity**:
  - **Objectivity** is the concept of **material frame indifference**—the position or orientation of an observer should not affect any quantity of interest. ${}^{23}$
  - An **objective scalar** has the same value for all observers.
  - Therefore, a **strain energy density function** must be invariant to any **orthogonal transformation** of the deformation gradient.

- **Mathematical Condition for Objectivity**:
  - Let $\widetilde{W}$ denote the strain energy density function expressed in terms of the **deformation gradient** $\mathbf{F}$.
  - Objectivity requires that:
    $$\widetilde{W}(\mathbf{Q F}) = \widetilde{W}(\mathbf{F})$$
    - For any **orthogonal matrix** $\mathbf{Q}$ such that $\mathbf{Q}^T \mathbf{Q} = \mathbf{I}$.
  - This condition is crucial for numerical simulations using **data-driven constitutive models** to ensure that the results are independent of the definition of a reference frame.

- **Polar Decomposition**:
  - Let $\mathbf{F} = \mathbf{R U}$ be the **polar decomposition** of the deformation gradient:
    - $\mathbf{R}$: Orthogonal matrix describing a **rigid body mode**.
    - $\mathbf{U}$: Matrix associated with the **stretch tensor**.
  - Choosing $\mathbf{Q} = \mathbf{R}^T$ leads to:
    $$\widetilde{W}(\mathbf{R}^T \mathbf{F}) = \widetilde{W}(\mathbf{R}^T \mathbf{R} \mathbf{U}) = \widetilde{W}(\mathbf{U})$$
    - This implies that the strain energy density function should be expressible solely in terms of $\mathbf{U}$ and not the individual components of $\mathbf{F}$.

- **Necessary Condition for Objectivity**:
  - There must exist some functions $W$ and $\mathbf{T}$ such that:
    $$\widetilde{W}(\mathbf{F}) = W(\mathbf{T}(\mathbf{U}))$$
  - This condition is also **sufficient** because the **stretch tensor** is invariant under arbitrary orthogonal transformations of $\mathbf{F}$:
    $$\mathbf{Q F} = \mathbf{Q} (\mathbf{R U}) = \overline{\mathbf{R}} \mathbf{U}$$
    - Where $\overline{\mathbf{R}} = \mathbf{Q R}$ is an orthogonal matrix.

- **Green-Lagrange Strain**:
  - The **Green-Lagrange strain**:
    $$\mathbf{E} = \frac{1}{2}(\mathbf{U}^2 - \mathbf{I})$$
    - It is expressible purely as a function of $\mathbf{U}$.
    - Using $\mathbf{E}$ as input to the ANN automatically guarantees the **objectivity** of $\widetilde{W}$ through the existence of $W$.
  - Therefore:
    $$\widetilde{W}(\mathbf{Q F}) = W\left( \frac{1}{2} \left( \mathbf{F}^T \mathbf{Q}^T \mathbf{Q} \mathbf{F} - \mathbf{I} \right) \right) = W\left( \frac{1}{2} \left( \mathbf{F}^T \mathbf{F} - \mathbf{I} \right) \right) = \widetilde{W}(\mathbf{F})$$

- **Compatibility with Hyperelastic Strain-Stress Mapping**:
  - For compatibility with a **hyperelastic strain-stress map**, the **second Piola-Kirchhoff stress** $\mathbf{S}$ must be chosen as the corresponding **stress measure**.
  - This is because $\mathbf{S}$ is the **energy conjugate** of $\mathbf{E}$.


### 3.3 Material Stability

- **Concept of Material Stability**:
  - **Material stability** ensures that **small loads** do not lead to **arbitrary deformations**.
  - Stability is ensured by the **ellipticity** of the strain energy density function $\widetilde{W}$ with respect to the deformation gradient $\mathbf{F}$. ${}^{24}$
  - A strain energy density function $\widetilde{W}(\mathbf{F})$ is **elliptic** if and only if:
    $$(\mathbf{a} \otimes \mathbf{b}) : \frac{\partial \widetilde{W}}{\partial \mathbf{F}} \mathbf{F} : (\mathbf{a} \otimes \mathbf{b}) \geq 0 \quad \forall \, \mathbf{a}, \mathbf{b} \in \mathbb{R}^3$$

- **Ellipticity and Convexity**:
  - For **twice differentiable** strain energy density functions, ellipticity is equivalent to **convexity** along directions corresponding to **rank-one tensors** (known as **rank-one convexity**).
  - Physically, this implies that only **real wave speeds** are permissible in the material.
  - Enforcing **ellipticity** a priori is challenging; therefore, a common approach in **continuum mechanics** is to enforce a **stronger mathematical property** that implies ellipticity:
    $$\text{convexity} \Rightarrow \text{polyconvexity} \Rightarrow \text{ellipticity}$$

- **Challenges with Convexity**:
  - **Convexity** of $\widetilde{W}(\mathbf{F})$ can be enforced to achieve ellipticity, but it is often **too strong** a condition, imposing **non-physical restrictions** on material behavior. ${}^{25}$
  - The **domain** of deformation gradients with a positive determinant is **non-convex**.
  - Enforcing convexity in $\mathbf{F}$ excludes the **physically reasonable growth condition**:
    $$\widetilde{W} \rightarrow \infty \quad \text{as } \text{det}(\mathbf{F}) \rightarrow 0^+$$
    - This condition places a **barrier on material inversion**; without it, the volume of the material could collapse to a point or line using finite work.
  - **Non-uniqueness** of the energy minimizer is also disallowed by convexity, which prevents observing instability phenomena such as **buckling**.
  - Additionally, it can be shown through **counterexamples** that convexity of $\widetilde{W}(\mathbf{F})$ **violates objectivity**. ${}^{26}$

- **Polyconvexity as an Alternative**:
  - Given these limitations, the standard approach is to enforce **polyconvexity**, which is a **weaker constraint** than convexity.
  - The strain energy density function $\widetilde{W}(\mathbf{F})$ is **polyconvex** if and only if it can be expressed as a **convex function** of the **minors** of $\mathbf{F}$:
    $$\exists f \text{ convex, such that } \widetilde{W}(\mathbf{F}) = f(\mathbf{F}, \operatorname{Cof}(\mathbf{F}), \text{det}(\mathbf{F})), \quad \forall \mathbf{F} \in \mathbb{R}^{3 \times 3}$$
    - Where $\operatorname{Cof}(\mathbf{F})$ denotes the **cofactor matrix** of $\mathbf{F}$.


### 3.3 Material Stability (Continued)

- **Objectivity vs. Polyconvexity**:
  - In **Section 3.2**, **objectivity** was identified as an essential property, particularly for the **numerical exploitation** of data-driven models.
  - Therefore, the **objective strain measure** $\mathbf{E}$ was chosen as input to the ANN model.
  - However, enforcing **polyconvexity** in terms of $\mathbf{F}$ is challenging because $\mathbf{E}$ is not generally **convex** in $\mathbf{F}$.

- **Convexity in Strain Measure $\mathbf{E}$**:
  - Instead, consider the **convexity** of the strain energy density function $W$ in $\mathbf{E}$.
  - This is equivalent to the **symmetric positive semi-definiteness** of the gradient of $\mathbf{S}$ with respect to $\mathbf{E}$:
    $$\mathbb{C} = \frac{\partial \mathbf{S}}{\partial \mathbf{E}} \geqslant 0$$

- **Symmetric Positive Semi-Definiteness for Fourth-Order Tensors**:
  - The concept of **symmetric positive semi-definiteness** extends to **fourth-order tensors**.
  - A fourth-order tensor $\mathbb{T}$ is said to be **symmetric positive semi-definite** if, for all $\mathbf{V}, \mathbf{W} \in \operatorname{Sym}_n(\mathbb{R})$, the following conditions hold: ${}^{27}$
    $$\mathbf{V} : \mathbb{T} : \mathbf{W} = \mathbf{W} : \mathbb{T} : \mathbf{V}$$
    $$\mathbf{V} : \mathbb{T} : \mathbf{V} \geq 0$$

- **Hessian of $W(\mathbf{E}(\mathbf{F}))$**:
  - Double contracting the **Hessian** of $W(\mathbf{E}(\mathbf{F}))$ with respect to $\mathbf{F}$ on both left and right sides by a test matrix $\boldsymbol{\Psi} \in \mathbb{R}^{3 \times 3}$ gives:
    $$\boldsymbol{\Psi} : \frac{\partial^2 W}{\partial \mathbf{F} \partial \mathbf{F}} \mathbf{F} : \boldsymbol{\Psi} = \mathbf{S} : (\boldsymbol{\Psi}^T \boldsymbol{\Psi}) + (\boldsymbol{\Psi}^T \mathbf{F} + \mathbf{F}^T \boldsymbol{\Psi}) : \mathbb{C} : (\boldsymbol{\Psi}^T \mathbf{F} + \mathbf{F}^T \boldsymbol{\Psi})$$


In [None]:
from torch import nn
import torch
import numpy as np
from sklearn.utils import shuffle
from torch.optim import lr_scheduler

In [2]:
class SoftplusSquared(nn.Module):
    def __init__(self, beta=3):
        super(SoftplusSquared, self).__init__()
        self.beta = beta

    def forward(self, x):
        return (1 / (2 * self.beta ** 4)) * (torch.log(1 + torch.exp(self.beta ** 2 * x))) ** 2

In [None]:
class Elasticenergy_potential(nn.Module):
    def __init__(self, seed=42):
        super(Elasticenergy_potential, self).__init__()
        torch.manual_seed(seed)
        self.architecture = [1, 64, 64, 64, 1]
        self.layers = nn.ModuleDict()
        self.act_fun = SoftplusSquared()
        # Construct layers based on architecture
        for layer_idx in range(len(self.architecture) - 1):
            self.layers[str(layer_idx)] = nn.Linear(self.architecture[layer_idx], self.architecture[layer_idx + 1])
        self.model = self._create_nn()
        self.input_min, self.input_max = 0.0, 0.0
        self.output_min, self.output_max = 0.0, 0.0


    def _create_nn(self):
        model = nn.Sequential()
        for i in range(len(self.architecture) - 1):
            model.add_module(f"linear_{i}", self.layers[str(i)])
            if i < len(self.architecture) - 2:
                model.add_module(f"activation_{i}", self.act_fun)
        return model


    def forward(self, x, do_unscaling=True):
        if isinstance(x, torch.Tensor):
            x_frd = (x - self.input_min) / (self.input_max - self.input_min)
            out = x_frd.reshape(len(x), 1)
            out = self.model(out)
            if do_unscaling:
                out = out * (self.output_max - self.output_min) + self.output_min
            return out
        else:
            print("Input is not a PyTorch tensor. Converting input to suitable format...")

    def scale_dataset(self, x, y, do_shuffle=True):
        input_size = x.size
        self.input_min = np.min(x)
        self.input_max = np.max(x)
        self.output_min = np.min(y)
        self.output_max = np.max(y)
        x_res = (x - min(x)) / (max(x) - min(x))
        if max(y) - min(y) != 0:
            y_res = (y - min(y)) / (max(y) - min(y))
        else:
            y_res = np.zeros(len(y))
        if do_shuffle:
            x_res, y_res = shuffle(x_res, y_res, random_state=42)

        return torch.Tensor(x_res).reshape(input_size, 1), torch.Tensor(y_res).reshape(input_size, 1)


    def convex_training(self, input_data, target_data, epochs=25000, epsilon=30, learning_rate=0.01, do_convex_training=True):
   
        """
        Trains the model using convex optimization.
        :param do_convex_training: apply convexity and monotonicity constraint. Default value is True
        :param input_data: Array, the input dataset.
        :param target_data: Array, the target dataset.
        :param epochs: Integer, number of training epochs.
        :param epsilon: Float, parameter for convexity constraint.
        :param learning_rate: Float, learning rate for the optimizer.
        """
        input_scaled, target_scaled = self.scale_dataset(input_data, target_data)

        # Split data into training and validation sets
        n_samples = len(input_scaled)
        train_indices = list(range(n_samples // 2))
        val_indices = [i for i in range(n_samples) if i not in train_indices]

        x_train, y_train = input_scaled[train_indices], target_scaled[train_indices]
        x_val, y_val = input_scaled[val_indices], target_scaled[val_indices]

        optimizer = torch.optim.AdamW(self.model.parameters(), lr=learning_rate)
        scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.8, patience=1500)
        loss_func = torch.nn.MSELoss()

        for epoch in range(epochs):
            # Forward pass
            train_output = self.model(x_train)
            val_output = self.model(x_val)

            # Compute loss
            train_loss = loss_func(train_output, y_train)
            val_loss = loss_func(val_output, y_val)

            # Log training process
            if (epoch + 1) % 500 == 0:
                print(f"Epoch [{epoch + 1}/{epochs}] || "
                      f"Train Loss: {train_loss.item():.9f} || "
                      f"Val Loss: {val_loss.item():.9f} || "
                      f"LR: {optimizer.param_groups[0]['lr']:.0e}")

            # Learning rate scheduler step
            scheduler.step(val_loss)

            # Backward and optimize
            optimizer.zero_grad()
            train_loss.backward()
            optimizer.step()

            if do_convex_training:
                # Apply convexity constraint
                self._apply_convexity_constraint(epsilon)

                # Apply monotonic decreasing constraint
                # self._apply_monotonic_decreasing_constraint()

            if optimizer.param_groups[0]['lr'] <= 9e-08:
                print('LR lower than threshold')
                break

    def _apply_monotonic_decreasing_constraint(self):
        """
        Applies a monotonic decreasing constraint to the first layer weights.
        """
        first_layer = self.model[0]
        first_layer.weight.data[first_layer.weight.data > 0] = 0

    def _apply_convexity_constraint(self, epsilon):
        """
        Applies a convexity constraint to the model parameters.
        :param epsilon: Float, parameter for convexity constraint.
        """
        for name, param in self.model.named_parameters():
            if "weight" in name and "0.weight" not in name:
                param.data[param < 0] = torch.exp(param[param < 0] - epsilon)

    def __str__(self):
        print("#---- Neural Network architecture -------------------#")
        for key, layer in self.nn_architecture_dict.items():
            print(f'{key}: {layer}')
        print("#----------------------------------------------------#")
        return ""
