# Discovering the KdV equation from data

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from scipy.io import loadmat

from pysindy import FiniteDifference # Borrowing finite difference function from PySINDy 
from pysr import PySRRegressor

## Importing the Dataset

In [None]:
KdV_data = loadmat('./kdv_data.mat')

In [None]:
u = KdV_data['u']
x = KdV_data['x'].flatten()
t = KdV_data['t'].flatten()
dt = t[1] - t[0]
dx = x[1] - x[0]

Checking the shape of the data

In [None]:
print(' Number of time points:', t.shape, '\n Number of spatial points:', x.shape, '\n Shape of the u:', u.shape)

Therefore, the data imported has the shape u(T,X), with T being time, and X being space.
You should also check that the time and space are indeed sampled at fixed frequency (open `t` and `x` to check!)

## Visualise

It is always a good idea to have a sanity check that the data make sense and see if there are potential challenges.

In [None]:
plt.plot(x,u[-1,:])
plt.xlabel('x')
plt.ylabel('u')
plt.title('KdV solution at final time')
plt.show()

In [None]:
plt.plot(t,u[:,200])
plt.xlabel('t')
plt.ylabel('u')
plt.title('Temporal evolution at the middle of the spatial domain')
plt.show()

In [None]:
# Plot derivative results
plt.figure()
plt.pcolormesh(x, t, u)
plt.xlabel('x', fontsize=16)
plt.ylabel('t', fontsize=16)
plt.title(r'$u(x, t)$', fontsize=16)


Things to look out for:
- What is the scale of time `t`, `x`, and `u`?
    - That gives you a sense of the expected scale of the derivatives. 
    - Do you need to rescale things to make the learning easier?
- What's the signal time scale and length scale? 
    - Am I sampling frequently enough (e.g. above Nyquist frequency)?
- If the signal is noisy, can you at least make out the lengthscale and time scale? That will help you choose the right derivative scheme. For example, if it's too noisy but frequently sampled, may be you want to use weak formulation to filter the frequency signal.

## Learning from analytically derived temporal and spatial derivatives (The easier option)
To get started, let's do the easier option of making use of the (noiseless) analytically derived spatial and temporal derivatives. 

Note that normally, derivatives are not provided as part of the dataset. These providede derivatives are purely for demonstration of the package.

In [None]:
# Loading the provided derivatives
u_x = KdV_data['u_x'] # Already flattened and ordered in columns of features
u_t = KdV_data['u_t'] # Already flattened 

print('Data Loading and success\n')


The variable `u_x` contain the spatial derivatives`[u ux uxx uxxx uxxxx]` at each location, stored as flatten columns.
The variable `u_t` contain the time derivatives at each location, stored as flatten columns.

In [None]:

model = PySRRegressor ## Your Code here
model.fit ## Your Code here
print('KdV Train Test \n')
print(model.get_best().equation)

The learning out is expressed as expressions of `x_0`, `x_1`...`x_5`, which corresponds to `u`, `ux`...`uxxxx`, or each column of the variable `u_x`.

## Learning from FD approximation to temporal and spatial derivatives (The realistic option)
In reality, data of the derivatives are rarely provided. Instead, it is approximated numerically.

There are many methods to approximate derivatives from given data, from smoothing through Golay filter, to Fourier decomposition, to weak formulation. 
Here, we use the most classic one - Finite Difference, and we are going to borrow the in-built `FiniteDifference` function from `PySINDy`.

### Generating the derivatives

In [None]:
# Time and space derivatives
ut = ## Your code here
ux = FiniteDifference(d=1, axis=1)._differentiate(u, dx)
## Your code here

In [None]:
target = ut.flatten()
feature = ## Your code here

In [None]:
model_real = PySRRegressor ## Your code here

model_real.fit(feature,target)
print('KdV Train Test \n')
print(model_real.get_best().equation)

The learning result may not converge to the equation you want. Why is that the case?
(Hint: Finite Difference is an approximation, not )

Try to play around with different 
- binary / unary operators,
- deriviative method, 
- max depth,
- nested constraints, 
- complexity of each operator,
- populations and population size.

Which combo of hyperparameters give you better/worse result? Why?
