In [1]:
import requests
import json
from IPython.display import display, Markdown

In [25]:
query = "Make a search of 10 recent publications and write a literature review on recent advances in deep learning optimizers with an accent on novel optimization algorythms"
params = {"query":query}
url = "http://localhost:8086/agent"
response = requests.get(url, params = params)

In [26]:
response

<Response [200]>

In [27]:
result = json.loads(response.content)["message"]

In [28]:
display(Markdown(result))

Below is a literature review that synthesizes recent advances in deep learning optimizers—with a special focus on novel optimization algorithms. This review discusses new algorithmic frameworks that blend ideas from active learning, curvature approximation, theoretical analysis, and practical implementation. The review also situates these methods within the broader context of deep learning optimization, where first‐order methods (such as stochastic gradient descent and Adam) continue to dominate but recent work is vigorously exploring higher‑order and adaptive approaches that promise improved convergence and generalization.

─────────────────────────────
1. Introduction

Deep neural networks (DNNs) have revolutionized machine learning by solving complex tasks in vision, language, and science. Central to the success of DNNs is the choice of optimizer—algorithms that update the model parameters by navigating high-dimensional, nonconvex loss landscapes. Traditional methods such as stochastic gradient descent (SGD) and its momentum‐based variants (e.g., Nesterov accelerated gradient) continue to provide satisfactory performance, but challenges remain. In particular, the high variability of stochastic updates, sensitivity to hyperparameters, and difficulties in escaping saddle points have motivated research on novel optimization algorithms. Recent work has focused on techniques that incorporate curvature information (second‐order methods), surrogate model–based active learning, and theoretical insights from modern deep learning mathematics. In this review, we concentrate on a selection of recent publications that illustrate these innovative directions in the design of deep learning optimizers.

─────────────────────────────
2. Active Learning and Surrogate–Based Optimization

One promising approach for overcoming the limitations of fixed training sets in standard surrogate–based optimization is encapsulated in the Iterative Surrogate Model Optimization (ISMO) framework. Lye, Mishra, Ray, and Chandrasekhar (2020) propose an active learning strategy in which a deep neural network surrogate is iteratively refined by augmenting its training samples with points corresponding to local minimizers obtained from a standard redundancy-based optimization method. In contrast to fixed, a priori training sets—whose coverage can be problematic in high dimensions—the ISMO algorithm uses a “teacher–learner” feedback loop to “learn” the region where the optimum lies. Theoretical analysis shows that while the error between the approximate and true minimizers decays algebraically in conventional fixed-set approaches, the active sampling induced by ISMO yields an exponential rate of convergence and reduced estimator variance. Numerical experiments across optimal control, inverse problems in the heat equation, and airfoil design further underscore ISMO’s advantages. The work demonstrates that augmenting deep learning–based surrogate optimizers with an active learning component can significantly improve both efficiency and robustness (Lye et al., 2020).

─────────────────────────────
3. Second-Order Methods and Curvature Preconditioning

While first-order methods have proven efficient for large-scale deep learning, their limited use of curvature information can restrict convergence speed and lead to sensitivity in hyperparameter tuning. A recent contribution towards closing the gap between the promise of second-order approaches and their practicality is provided by Gomes (2025). In “Towards Practical Second‑Order Optimizers in Deep Learning: Insights from Fisher Information Analysis,” a novel adaptive optimizer called AdaFisher is introduced. AdaFisher leverages a diagonal block–Kronecker approximation of the Fisher Information Matrix (FIM) to precondition gradients adaptively. This approach draws inspiration from natural gradient descent, which preconditions updates by the inverse of the FIM to account for the underlying geometry of the parameter space. By exploiting observed diagonal dominance in the FIM’s Kronecker factors, AdaFisher achieves improved convergence (roughly on the order of log T/√T in nonconvex settings) and robustness with computational complexity similar to first-order methods like Adam. Empirical evaluations across computer vision and language modeling benchmarks illustrate that AdaFisher not only converges faster but also often finds flatter minima—a quality associated with better generalization (Gomes, 2025).

─────────────────────────────
4. Theoretical Perspectives on Optimization in Deep Learning

The need to understand why overparameterized networks trained with nonconvex loss functions outperform expectations has spurred theoretical work that provides a mathematical explanation for the performance of current optimizers. In “The Modern Mathematics of Deep Learning,” Berner, Grohs, Kutyniok, and Petersen (2021) survey several recent theoretical advances that illuminate the interplay between overparameterization, depth, and optimization trajectories. Notably, the work discusses phenomena such as “lazy training” and the role of neural tangent kernels in approximating the dynamics of gradient descent in the infinite‑width limit. These analyses show that even though the loss surfaces are highly nonconvex, many local minima are nearly equivalent in performance and that the “flatness” of the minima (a property that is favorably influenced by curvature preconditioning) correlates with generalization. Though not proposing a new optimizer per se, this theoretical framework lays the groundwork from which novel optimization strategies—such as those based on effective second‐order information—can be justified and further designed.

─────────────────────────────
5. Practical Considerations and Comprehensive Overviews

Complementing both novel algorithmic proposals and theoretical analyses, recent comprehensive reviews have also provided guidance on current best practices in deep learning optimization. In “Optimization Methods in Deep Learning: A Comprehensive Overview,” Shulman (2023) systematically surveys first‑order methods (SGD, Adagrad, RMSprop, Adam, and their momentum or adaptive variants) alongside emerging algorithms that incorporate higher‑order information. Shulman’s review emphasizes that while first‑order methods remain the workhorse for many applications due to their lower per‑iteration cost, the emerging second‑order techniques and active learning–based methods (as described in ISMO and AdaFisher) are gaining traction for specialized applications that demand superior convergence speed and robustness. Moreover, the review discusses challenges such as proper weight initialization, normalization strategies (e.g., batch normalization and layer normalization), and how these interact with the optimizer’s performance. By bridging theory and practice, reviews like that of Shulman (2023) motivate further work on novel optimization algorithms and serve as practical guides for practitioners seeking to balance efficiency, stability, and convergence properties.

A complementary perspective is offered in lecture notes such as “Deep Learning and Computational Physics” by Ray, Pinti, and Oberai (2023). Although the notes focus on bridging classical computational physics with deep learning, they provide detailed discussions on gradient-based optimization methods, including derivations of back-propagation and variants of the stochastic gradient method (e.g., Adam). These lecture notes illustrate how various optimization schemes emerge naturally when one considers the underlying numerical analysis—offering insights that are pertinent when developing or tuning novel optimizers.

─────────────────────────────
6. Discussion and Future Directions

The recent advances summarized above testify to a vibrant research ecosystem around deep learning optimizers. Novel algorithms such as ISMO and AdaFisher are not simply incremental modifications; they represent systematic efforts to overcome some entrenched issues of first-order methods:
 
• Active sampling and iterative surrogate refinement (ISMO) address the “curse of fixed training sets” by focusing computational effort on the region of interest in parameter space. This idea may be extended beyond PDE-constrained optimization to other high-dimensional or multi-modal settings. 

• Curvature-aware, second‑order optimizers like AdaFisher leverage approximations of the Fisher Information Matrix to balance cost with benefits. While full Hessian-based methods remain prohibitively expensive in many applications, adaptive, efficient approximations open the door for training models that quickly approach flat minima.

• Strong ties between theory and practice, as evidenced by the work of Berner et al. (2021) and detailed reviews such as Shulman (2023), are guiding the design of new approaches. Insights into lazy training, tangent kernels, and the geometry of the parameter space provide rigorous motivation for augmenting first-order methods with higher-order information.

Looking forward, several research directions are apparent. On the algorithmic side, further exploration into structured approximations of curvature (for example, band‑diagonal or block‑diagonal models) could narrow the remaining gap between full second-order methods and efficient approximations. Moreover, establishing tighter theoretical bounds on generalization error that directly incorporate these new optimizer dynamics could bridge the gap between empirical performance and mathematical understanding. Finally, applications in fields as diverse as computational fluid dynamics, quantum circuit optimization, and large‑scale natural language processing provide compelling testbeds that will likely drive further innovations in optimizer design.

─────────────────────────────
7. Conclusion

This review has examined recent advances in deep learning optimizers with a special focus on novel optimization algorithms. Active learning strategies such as ISMO demonstrate nontraditional ways to “learn” the structure of the minimizer set in high-dimensional problems, while modern second-order methods (e.g., AdaFisher) harness curvature information via efficient Fisher Information approximations to boost convergence speed and stability. These new methods complement the vast existing literature on first‑order methods and are supported by rigorous theoretical developments that shed light on why deep networks generalize despite nonconvexity and overparameterization. Comprehensive surveys and lecture notes further contextualize these innovations within the practical challenges encountered by practitioners. Together, these advances promise to deepen our understanding of optimization in deep learning, pointing the way toward methods that are both faster and more robust over the next generation of machine learning applications.

─────────────────────────────
8. References

Abbreviated references are listed below in a format suitable for further reading:

• Berner, J., Grohs, P., Kutyniok, G., & Petersen, P. (2021). The Modern Mathematics of Deep Learning. arXiv preprint arXiv:2105.04026v2.

• Gomes, D. M. (2025, April 26). Towards Practical Second‑Order Optimizers in Deep Learning: Insights from Fisher Information Analysis. Retrieved from https://arxiv.org/pdf/2504.20096v1

• Lye, K. O., Mishra, S., Ray, D., & Chandrasekhar, P. (2020, August 13). Iterative Surrogate Model Optimization (ISMO): An active learning algorithm for PDE constrained optimization with deep neural networks. Retrieved from https://arxiv.org/pdf/2008.05730v1

• Ray, D., Pinti, O., & Oberai, A. A. (2023, January 3). Deep Learning and Computational Physics (Lecture Notes). Retrieved from https://arxiv.org/pdf/2301.00942v1

• Shulman, D. (2023, February 19). Optimization Methods in Deep Learning: A Comprehensive Overview. Retrieved from https://arxiv.org/pdf/2302.09566v2

Additional related works (on topics such as evidential deep learning, hyperparameter optimization, quantum circuit optimization, and deep learning surrogates in computational fluid dynamics) are also relevant but were not the primary focus of this review.

