# Lecture 14 - Symbolic Algorithms, Fitting Equations from Data, SINDy

Modelling and Machine Learning of Dynamical Systems in Julia 

## Project 

### Project Info - What's the Project?

* To pass this course, we ask you to work on a project
* Work alone or in pairs (pairs are expected to do a slightly more extensive project)
* Work on a project that extends one of the topics from the lectures or picks up a new topic with the methods that we talked about, e.g. exploring the dynamics of an extended version of one of the models we introduced in the lectures or combining one of the models with a machine learning method 
* We approve / hand out topics, send us an **email until the end of the week** (alistair.white@tum.de, maximilian.gelbrecht@tum.de)
    * If you have an idea please suggest it
    * If you have no idea name your favourite topics from the lecture and we will suggest something to you
* We use next week's lecture/exercise as an individual Q&A about the projects, we will send you your exact time
 
### Handing in the Project

* The project is comprised of a 
    * Short report 4-5 pages that introduces your topic, gives a short summary on the research history of the topic, the method/model that you use, and a short discussion of your results
    * Code: Invite us to a Git repository with the code base. Write your projects as scripts/notebooks and a Julia package that accompany it. The package should fulfil the usual Julia package requirements and include CI/CD
* We offer an (optional) initial review of your project with short feedback, 8th March, we strongly recommend that you use this opportunity 
    * Don't expect us to bugfix your code, but we will try to help you if something is going wrong
* Hand in the completed project by 22nd March 
* If you cannot make these deadlines for a good reason, please inform us about it early! 
* Indicate if you want to have a grade or just pass the course

## Recap

* In the last three lectures we were using Artificial Neural Networks (ANNs) to estimate dynamical systems

* Suppose we have a dynamical system $$\frac{d\mathbf{x}}{dt} = f(\mathbf{x},t;\theta)$$ that we observe some data from $\mathbf{X}=\{\mathbf{x}(t_i)\}$ for $t_i\in[t_0;t_f]$, in the beginning we will restrict ourselves to evenly sampled observations $t_i = i\cdot\Delta t + t_0$

* The correspoding discretized dynamical system is $$\mathbf{x}_{n+1} = g(\mathbf{x}_n, t_n; \theta)$$ where $g$ is one iteration of some numerical DE solver

* With **Neural Differential Equations** we replaced $f$ with an ANN and used it as a universal function approximator to learn the right hand side of the dynamical system from the observation $\mathbf{X}$
    * In order for this to work, we needed a differentiable DE solver, to take derivatives of solutions of the dynamical system e.g. via adjoint sensitivity analysis 
    
* With **Recurrent Neural Networks** (RNNs) like **Reservoir Computing**, we try to replace $g$ with ANN an learn it from data 
    * Reservoir computing simplifies RNNs to use a very large hidden layer that is not trainable, and only train the output layer

# Estimating Equations from Data 

* Another approach to estimate the dynamical system is to try to directly estimate it's equation from data, so that we really to reconstruct the symbolic expression of $f$ and not just numerically approximate the $f$ with a function approximator such as an ANN

* There are several approaches to this, here we will group them in two categories: 

* Fitting the data to models that already have a certain analytical form (often done in **System Identification**) 
    * Most regression tasks can bee seen to fall into this category, ANNs as well 
    * But it's difficult to really learn something about the dynamics from ANNs for some application 
    * Other candidate models include e.g. nonlinear autoregressive moving average processes (NARMA)
    * On particular popular one for climate dynamics is Emperical Model Reduction ([S. Kravtsov, D. Kondrashov, and M. Ghil](https://journals.ametsoc.org/view/journals/clim/18/21/jcli3544.1.xml))
        * Primarly intended for stochastic systems 
        * Multi-level linear regression 
        * It has been succesfully applied e.g. to the El Niño Southern Oscillation 
        * Often one pre-processes the data with a principal component analysis and tries to model the data with a nested iterative regression 
        
       
* **Symbolic Regression** 

    * But instead of already prescriping the functional form in a concrete way, can't we also let an algorithm find the functional form? 
    * Symbolic Regression tries to find the mathematical expression that best fits the $\mathbf{X}$
    * Applied to dynamical systems, it usually tries to find the mathematical expression for the right hand side $f$ of the system
    * Therefore we often also need derivative data (e.g. computed with finite differences) 
    * Symbolic regression usually provides a dictionary of possible expressions (e.g. polynomials up to a certain degree, trigonemtric functions, etc ...) and than performs a regression to dermine the coefficents or parameters of these elementary functions
    * But there are infintely many combinations of expressions: 
    
    ![Symbolic Regression](assets/slice2.jpg)
    
    * Most natural laws and equations just involve a handful of terms, a candidate model should be complex enough to replicate the behaviour of the system but also "simple" (see Occam's razor)
    * Therefore often some form of sparsity constraint is applied to the regression and one choses to only consider certain operations and experessions 
    * [AI Feynmann by Udrescu and Tegmark](https://arxiv.org/abs/1905.11481) attracted some attention: they do a symbolic regression with several different pre- and post-processing steps and apply it successfully to Feynmann's physics course books
    

* All of these methods have limitations when the complexity of the problem increases, data gets noisy and high-dimensional
* One reason: There are just too many possible combinations of expressions to be considered 

