In [1]:
import requests
import json
from IPython.display import display, Markdown

In [21]:
query = "Make a search of 10 recent publications and write a literature review on recent advances in deep learning optimizers with an accent on novel optimization algorythms"
params = {"query":query}
url = "http://localhost:8086/agent"
response = requests.get(url, params = params)

In [22]:
response

<Response [200]>

In [23]:
result = json.loads(response.content)["message"]

In [24]:
display(Markdown(result))

Below is a literature review that synthesizes recent developments in deep learning optimization and its applications in scientific computing, uncertainty quantification, and quantum computing. The review highlights surrogate‐modeling for partial differential equations (PDEs), modern theoretical insights into deep neural networks, evidential methods for uncertainty quantification, novel second‐order optimizers, comprehensive overviews of deep learning optimization techniques, and even applications in quantum circuit optimization and hyperparameter tuning.

──────────────────────────────
1. Introduction

Deep learning has become an indispensable tool across many scientific and engineering disciplines. Its success—in part driven by powerful optimization algorithms and sophisticated network architectures—has transformed fields ranging from image recognition and natural language processing to computational fluid dynamics (CFD) and quantum computing. Particularly, in problems where the underlying mathematical models are governed by partial differential equations (PDEs) (Lye, Mishra, & Ray, 2019) or where uncertainty quantification is paramount (Pandey & Yu, 2023), efficient optimization and robust surrogate modeling become essential. This review synthesizes recent work on surrogate‐based optimization for PDE constraints (Lye, Mishra, Ray, & Chandrasekhar, 2020), theoretical advances in deep learning (Berner et al., 2021), improvements in evidential deep learning (Pandey & Yu, 2023), novel second‐order optimizers such as AdaFisher (Gomes, 2025), comprehensive overviews of optimization methods (Shulman, 2023), deep learning methods in computational physics (Ray, Pinti, & Oberai, 2023), quantum circuit optimization using deep reinforcement learning (Fösel et al., 2021), and hyperparameter optimization strategies (Yang & Shami, 2020). Together, these works chart a broad landscape where theoretical advances and practical algorithms intersect.

──────────────────────────────
2. Surrogate Modeling and PDE-Constrained Optimization

For many engineering applications—such as aerodynamic shape design, optimal control, and parameter identification—direct numerical solution of PDEs is computationally prohibitive, particularly when embedded within an optimization loop. Lye et al. (2020) introduce an iterative surrogate model optimization (ISMO) algorithm that leverages deep neural networks (DNNs) to approximate the map from design parameters to observables in PDE problems. Building upon earlier formulations where a fixed training set is used to learn a surrogate (the DNNopt approach), ISMO employs an active learning paradigm; candidate minimizers obtained via a standard optimization algorithm are iteratively added to the training set. The authors show that, under appropriate regularity and convexity assumptions, the error decay of the optimizers is exponential with respect to the number of training iterations—a significant improvement over the algebraic decay inherent in fixed training-set methods.

Lye et al. (2019) similarly propose deep learning surrogates for “parameters to observable” maps in CFD applications. Here, the emphasis is on approximating integral quantities (e.g., lift or drag) that are derived from high‐resolution CFD simulations. By judiciously selecting network architectures based on theoretical regularity estimates (e.g., Lipschitz continuity and Sobolev or bounded variation norms) and combining them with ensemble training on low–discrepancy sample sets, the authors demonstrate rapid evaluation of observables with errors in the range of 1–2% even when training samples are few. Moreover, when combined with quasi‐Monte Carlo (QMC) methods, these surrogates enable uncertainty propagation that is orders of magnitude faster than classical Monte Carlo approaches (Lye, Mishra, & Ray, 2019).

──────────────────────────────
3. Theoretical Perspectives on Deep Learning

The success of deep learning has spurred intense theoretical investigation into its approximation, optimization, and generalization properties. Berner et al. (2021) provide an overview of the modern mathematics of deep learning. Their review bridges classical learning theory with contemporary approaches that exploit norm‐based complexity measures, margin theory, and the study of optimization dynamics. For example, the generalization power of overparameterized networks is explained through bounds that scale with the learned weight norms rather than the sheer number of parameters. This perspective is consistent with later studies on implicit regularization: while first‐order stochastic gradient descent (SGD) appears to navigate nonconvex landscapes effectively, researchers observe that aspects such as “lazy training” and the neural tangent kernel regime help ensure convergence to maximum‐margin solutions.

The reviewed literature also discusses specialized architectures. Convolutional and residual networks (ResNets) are shown to have significant advantages due to their inductive biases—such as translation invariance and the stabilization of gradients through skip connections—and even connections to ordinary differential equations can be drawn (e.g., via continuous time limits for ResNets). This interplay between architectural design and optimization can be seen as a recurring theme in modern deep learning theory (Berner et al., 2021).

──────────────────────────────
4. Evidential Deep Learning and Uncertainty Quantification

In many applications, such as in safety-critical systems, reliable uncertainty quantification is essential. Evidential deep learning offers a principled framework that augments traditional DNN predictions with uncertainty measures derived from belief theory and subjective logic. Pandey and Yu (2023) examine a fundamental limitation in common evidential models: the creation of “zero‐evidence” regions by the typical nonnegative activation functions (e.g., ReLU or SoftPlus). In such regions, gradients vanish and the model cannot learn effectively from the affected training samples. To overcome this deficiency, the authors theoretically analyze and compare various evidential loss formulations and activation functions. They further propose the use of the exponential activation function— which minimizes zero‐evidence regions—and introduce a novel correct evidence regularization (RED) term that leverages the model’s vacuity. Empirical evaluations on datasets like CIFAR10, CIFAR100, and mini-ImageNet show that the proposed modifications lead to both improved predictive performance and more reliable uncertainty estimates, making evidential deep learning a viable approach for robust, uncertainty-aware applications.

──────────────────────────────
5. Second-Order and Curvature-Based Optimization

While first-order methods (such as SGD and Adam) remain the industry standard for training deep neural networks, many researchers have explored second-order techniques that incorporate curvature information. In his recent thesis, Gomes (2025) introduces AdaFisher, an adaptive second-order optimizer that leverages a diagonal block-Kronecker approximation of the Fisher Information Matrix (FIM). The analysis reveals that, despite the high per-iteration cost associated with full curvature information, much of the “energy” of the FIM is concentrated along its diagonal blocks. AdaFisher uses this insight to precondition gradients more effectively than diagonal methods like Adam, achieving superior convergence properties while still remaining computationally efficient. Theoretical guarantees under smoothness and bounded-noise assumptions are provided, and extensive experiments on image classification and language modeling tasks show that AdaFisher often outperforms both first-order methods and alternative second-order approaches.

──────────────────────────────
6. Comprehensive Overviews of Optimization Techniques

In parallel with algorithmic innovations, comprehensive surveys have elucidated the landscape of optimization methods in deep learning. Shulman (2023) provides an extensive review covering classical methods—such as stochastic gradient descent, Adagrad, and RMSprop—and their momentum-based successors like Nesterov accelerated gradient, Adam, Nadam, and AMSGrad. Shulman’s discussion emphasizes not only the evolution of these algorithms but also the practical challenges they address, including vanishing/exploding gradients and the interplay with batch or layer normalization. Such reviews offer practitioners valuable guidance when selecting optimization methods tailored to the properties of specific datasets or model architectures. Complementing these discussions on continuous optimization, Yang and Shami (2020) survey hyperparameter optimization (HPO) techniques for machine learning algorithms. They contrast brute-force methods (e.g., grid and random search) with more sophisticated approaches such as Bayesian optimization, multi-fidelity strategies (Hyperband, BOHB), and population-based metaheuristics (genetic algorithms, particle swarm optimization). Their work underscores that the performance of a machine learning model depends critically on proper hyperparameter tuning, and that the choice of HPO method should be matched to the complexity of the model and the nature of the hyperparameter search space.

──────────────────────────────
7. Deep Learning in Computational Physics and Operator Learning

A growing body of research leverages deep learning to solve problems traditionally addressed by computational physics. Ray, Pinti, and Oberai (2023) offer a set of lecture notes that bridge deep learning and computational physics. Their material draws parallels between discretization techniques in numerical PDE solvers and the layered structure of neural networks, emphasizing that many deep learning techniques can be interpreted as approximations of classical numerical operators. In particular, physics-informed neural networks (PINNs) have been proposed to solve forward and inverse PDE problems by directly incorporating the governing equations into the training loss. Similarly, recent developments in operator networks—including Deep Operator Networks (DeepONets) and Fourier Neural Operators (FNOs)—extend the approximation capabilities from finite-dimensional functions to operator mappings between infinite-dimensional spaces. These approaches have shown promising results in applications ranging from fluid dynamics to materials modeling, though challenges remain in ensuring stability and accuracy in regions of low regularity.

──────────────────────────────
8. Quantum Circuit Optimization via Deep Reinforcement Learning

Beyond classical applications, deep learning techniques have also been applied to emerging domains like quantum computing. Fösel et al. (2021) address the problem of quantum circuit optimization—a crucial step for reducing error rates and execution time in near-term quantum devices (NISQ). Traditional circuit optimization methods, often based on high-level algebraic reductions, can ignore hardware-specific constraints such as qubit connectivity and native gate sets. By reformulating circuit optimization as a sequential decision-making problem, Fösel et al. deploy a reinforcement learning (RL) framework in which a deep convolutional neural network learns transformation strategies from circuit representations. The agent, trained via proximal policy optimization (PPO), shows promising results by reducing circuit depth by an average of 27% and reducing gate count by 15% on 12-qubit circuits. Importantly, the fully convolutional architecture allows for generalization to circuits with different sizes and architectures, thereby illustrating that deep RL can be an effective component of the quantum compiler toolchain.

──────────────────────────────
9. Discussion and Outlook

The reviewed body of work illustrates a vibrant interplay between theory and practice in deep learning optimization and its applications across multiple domains. Surrogate-model methods for PDE-constrained problems and computational fluid dynamics have accelerated uncertainty quantification and design optimization by replacing expensive simulations with fast, learned approximators (Lye et al., 2019; Lye et al., 2020). At the same time, modern mathematical analyses (Berner et al., 2021) and insights from evidential learning (Pandey & Yu, 2023) have deepened our understanding of the generalization and uncertainty properties of neural networks. Novel optimization methods that incorporate curvature information—such as AdaFisher (Gomes, 2025)—promise to combine the fast convergence of second-order methods with the efficiency required for large-scale deep learning. Comprehensive reviews by Shulman (2023) and Yang and Shami (2020) equip practitioners with guidelines for selecting appropriate optimization and hyperparameter tuning strategies. Finally, innovative applications such as quantum circuit optimization via deep reinforcement learning (Fösel et al., 2021) demonstrate that the tools developed in deep learning can be successfully adapted to novel, high-impact challenges.

Despite these advances, many open problems remain. For instance, while surrogate models work well for low- to moderate-dimensional parameters with reasonably regular observables, extending these techniques to full solution fields or to problems with severe shocks and discontinuities remains challenging. Similarly, while second-order optimizers like AdaFisher show promise, further work is needed to capture off-diagonal curvature effects without a prohibitive computational cost. In the domain of hyperparameter optimization, improved methods that scale gracefully with model complexity are an ongoing research frontier. In quantum computing, integrating RL-based circuit optimization with higher-level algorithm design and error correction remains a fertile area for future research.

Overall, the convergent trends in theory, algorithmic development, and application-specific adaptations indicate that deep learning optimization is a dynamic and evolving field—one that will continue to shape the future of scientific computing and artificial intelligence.

──────────────────────────────
10. References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Retrieved from https://tensorflow.org

Amy, M., Maslov, D., & Mosca, M. (2014). Polynomial-time T-depth optimization of Clifford+T circuits via matroid partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (Vol. 70, pp. 214–223). PMLR.

Berner, J., Grohs, P., Kutyniok, G., & Petersen, P. (2021). The modern mathematics of deep learning. arXiv preprint arXiv:2105.04026.

Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems (pp. 2546–2554).

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4), 303–314.

Fösel, T., Niu, M. Y., Marquardt, F., & Li, L. (2021). Quantum circuit optimization with deep reinforcement learning. arXiv preprint arXiv:2103.07585v1.

Gomes, D. M. (2025). Towards practical second-order optimizers in deep learning: Insights from Fisher information analysis. arXiv preprint arXiv:2504.20096v1.

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. Proceedings of the IEEE International Conference on Computer Vision.

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML).

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).

Lye, K. O., Mishra, S., Deep Ray, D., & Chandrasekhar, P. (2020). Iterative surrogate model optimization (ISMO): An active learning algorithm for PDE constrained optimization with deep neural networks. arXiv preprint arXiv:2008.05730v1.

Lye, K. O., Mishra, S., & Ray, D. (2019). Deep learning observables in computational fluid dynamics. arXiv preprint arXiv:1903.03040v2.

Maclaurin, D., Duvenaud, D., & Adams, R. P. (2015). Gradient-based hyperparameter optimization through reversible learning. In International Conference on Machine Learning.

Nesterov, Y. (1983). A method for solving the convex programming problem with convergence rate O(1/k²). Doklady Akademii Nauk SSSR, 269(3), 543–547.

Pandey, D. S., & Yu, Q. (2023). Learn to accumulate evidence from all training samples: Theory and practice. arXiv preprint arXiv:2306.11113v2.

Preskill, J. (2018). Quantum computing in the NISQ era and beyond. Quantum, 2, 79.

Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. In Proceedings of the 6th International Conference on Learning Representations (ICLR).

Ray, D., Pinti, O., & Oberai, A. A. (2023). Deep learning and computational physics (Lecture Notes). arXiv preprint arXiv:2301.00942v1.

Shulman, D. (2023). Optimization methods in deep learning: A comprehensive overview. arXiv preprint arXiv:2302.09566v2.

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems (pp. 2951–2959).

Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. arXiv preprint arXiv:2007.15745v3.

Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. arXiv preprint arXiv:1212.5701.

──────────────────────────────
Conclusion

The literature reviewed demonstrates that deep learning optimization continues to evolve rapidly—both in theoretical understanding and practical application. Techniques for surrogate modeling, uncertainty quantification, and curvature-based adaptation have already led to substantial improvements in the efficiency and reliability of computations in engineering, physics, and even quantum computing. As researchers further refine these methods and address outstanding challenges, we can expect deep learning to remain a transformative tool across scientific and industrial domains.

