<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
doconce format html week3.do.txt --no_mako -->
<!-- dom:TITLE: Advanced machine learning and data analysis for the physical sciences -->

# Advanced machine learning and data analysis for the physical sciences
**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway

Date: **February 5, 2026**

## Overview of third  week

* Neural Networks with codes and Physics Informed Neural Networks, theory and codes

* Discussion of possible projects during the lab session.

* Video of lecture at <https://youtu.be/68vKsagw6Zo>

* Whiteboard notes at <https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/HandwrittenNotes/2026/Notesweek3.pdf>  

* These lecture notes are at <https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week3/ipynb/week3.ipynb>

## Reminder on texts on the mathematics of deep learning

**Two recent books online.**

1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)

2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)

## Reminder on books with hands-on material and codes
* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)

* [David Foster, Generative Deep Learning with TensorFlow](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html)

* [Bali and Gavras, Generative AI with Python and TensorFlow 2](https://github.com/PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2)

All three books have GitHub addresses from where  one can download all codes. We will borrow most of the material from these three texts as well as 
from Goodfellow, Bengio and Courville's text [Deep Learning](https://www.deeplearningbook.org/)

## Reading recommendations
* Rashkca et al., chapters 11-13 for NNs and chapter 14 for CNNs, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)

* Goodfellow et al, chapter 6 and 7 contain most of the neural network background. For CNNs see chapter 9.

## Video lectures on CNNs
**Excellent lectures on CNNs and Neural Networks.**

* [Video on Deep Learning](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi)

* [Video  on Convolutional Neural Networks from MIT](https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini)

* [Video on CNNs from Stanford](https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering)

## From last week: Repetition on overarching view of a neural network

The architecture of a neural network defines our model. This model
aims at describing some function $f(\boldsymbol{x}$ that is meant to describe
some final result (outputs or target values $bm{y}$) given a specific input
$\boldsymbol{x}$. Note that here $\boldsymbol{y}$ and $\boldsymbol{x}$ are not limited to be
vectors.

The architecture consists of
1. An input and an output layer where the input layer is defined by the inputs $\boldsymbol{x}$. The output layer produces the model ouput $\boldsymbol{\tilde{y}}$ which is compared with the target value $\boldsymbol{y}$

2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)

3. A given activation function $\sigma(\boldsymbol{z})$ with arguments $\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.

4. The last layer, normally called **output** layer has an activation function tailored to the specific problem

5. Finally, we define a so-called cost or loss function which is used to gauge the quality of our model.

## The optimization problem

The cost function is a function of the unknown parameters
$\boldsymbol{\Theta}$ where the latter is a container for all possible
parameters needed to define a neural network

If we are dealing with a regression task a typical cost/loss function
is the mean squared error

$$
C(\boldsymbol{\Theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}.
$$

This function represents one of many possible ways to define
the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\boldsymbol{\Theta}$. This is in general not the case.

## Parameters of neural networks
For neural networks the parameters
$\boldsymbol{\Theta}$ are given by the so-called weights and biases (to be
defined below).

The weights are given by matrix elements $w_{ij}^{(l)}$ where the
superscript indicates the layer number. The biases are typically given
by vector elements representing each single node of a given layer,
that is $b_j^{(l)}$.

## Other ingredients of a neural network

Having defined the architecture of a neural network, the optimization
of the cost function with respect to the parameters $\boldsymbol{\Theta}$,
involves the calculations of gradients and their optimization. The
gradients represent the derivatives of a multidimensional object and
are often approximated by various gradient methods, including
1. various quasi-Newton methods,

2. plain gradient descent (GD) with a constant learning rate $\eta$,

3. GD with momentum and other approximations to the learning rates such as

  * Adapative gradient (ADAgrad)

  * Root mean-square propagation (RMSprop)

  * Adaptive gradient with momentum (ADAM) and many other

4. Stochastic gradient descent and various families of learning rate approximations

## Other parameters

In addition to the above, there are often additional hyperparamaters
which are included in the setup of a neural network. These will be
discussed below.

## Physics informed neural networks

The first part of the lecture deals with how we can use neural
networks to solve partial differential (and ordinary as well)
equations. The equations we will study as examples are the diffusion
equation and the Black-Scholes equation from finance. The latter is a
diffusion equation look-alike with a stochastic term.

We will use this example to remind ourselves about how we can use a code for neural networks to solve such problems.
The jupyter-notebook for this is at <https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week3/ipynb/BlackScholesPINN.ipynb>