In [None]:
import matplotlib.pyplot as plt
%matplotlib widget
import numpy as np
import scipy as sp
import sklearn
import matplotlib as mpl
import matplotlib.pyplot as plt
import chemiscope
from widget_code_input import WidgetCodeInput
from ipywidgets import Layout, Output, Textarea, HTML, HBox
from scwidgets import (AnswerRegistry, TextareaAnswer, CodeDemo,
                       ParametersBox, PyplotOutput, ClearedOutput,
                       AnimationOutput,CheckRegistry,Answer)
import ase
import functools
import copy
from ase.io import read, write
from ase.calculators import lj, eam
from tqdm.notebook import tqdm
from sklearn.linear_model import Ridge
from sklearn.decomposition import PCA

In [None]:
#### AVOID folding of output cell 

In [None]:
%%html
<style>
.jp-CodeCell.jp-mod-outputsScrolled .jp-Cell-outputArea  {  height:auto !important;
    max-height: 5000px; overflow-y: hidden }
</style>
<style>
.output_wrapper, .output {
    height:auto !important;
    max-height:4000px;  /* your desired max-height here */
}
.output_scroll {
    box-shadow:none !important;
    webkit-box-shadow:none !important;
}
</style>

In [None]:
check_registry = CheckRegistry() 
answer_registry = AnswerRegistry(prefix="module_07")
display(answer_registry)

In [None]:
module_summary = TextareaAnswer("general comments on this module")
answer_registry.register_answer_widget("module-summary", module_summary)
display(module_summary)

_References:_
- [Nature 559, 547–555 (2018)](https://www.nature.com/articles/s41586-018-0337-2)
- [J. Chem. Phys. 150, 150901 (2019)](https://doi.org/10.1063/1.5091842)
- [Springer Vol. 4. No. 4. (2006)](https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/)


<a id="data-driven"> </a>

# Data-driven modeling

This module provides a very brief and over-simplified primer on "data-driven" modeling. 
In abstract terms, a data-driven approach attempts to establish a relationship between _input_ data and _target_ properties by recognizing or "learning" patterns in the data itself instead of using deductive reasoning, i.e. without proceeding though a series of logical steps starting from an hypothesis on the physical behavior of a system. 
Once a generic, mathematically flexible model has been chosen, the empirical association between inputs and targets is taken as the only basis to establish an _inductive_ relationship between them: we only look at what the data tells to be a strong correlation, neither reasoning on causal links, nor on a coherent theory.

The traditional scientific method proceeds through a combination of induction and deduction, while data-driven approaches are intended to be entirely inductive. On the risks of purely inductive reasoning, see [Bertrand Russel's inductivist chicken story](http://www.ditext.com/russell/rus6.html). 
In practice, _inductive biases_ are often included in the modeling, by means of the choices that are made in the construction and the tuning of the model itself: this is how a component of physics-inspired (deductive) concepts can make it back into machine learning. 

As the most primitive data-driven model, consider the case of _linear regression_. 
A set of $n_\mathrm{train}$ data points and targets $\{x_i, y_i\}_{i=1}^{n_\mathrm{train}} $ are assumed to follow a linear relationship of the form $y(x)=a x$, where the slope $a$ is an adjustable parameter. 
For a given value of $a$, one can compute the _loss_, which measures the discrepancy between the true value of the targets and the predictions of the model. Here, we take the mean square error ($L^2$ norm)

$$
\ell = \frac{1}{n_\mathrm{train}} \sum_i^{n_\mathrm{train}} (y(x_i)-y_i)^2 
$$

This widget allows you to play around with the core idea of linear regression: by adjusting the value of $a$ you can minimize the discrepancy between predictions and targets, and find the best model within the class chosen to represent the input-target relationship

In [None]:
np.random.seed(1234)

lr_demo_figure, _ = plt.subplots(1, 1, tight_layout=True)
lr_pyplot_output = PyplotOutput(lr_demo_figure)

lr_x = (np.random.uniform(size=20)-0.5)*10
lr_y = 2.33*lr_x+(np.random.uniform(size=20)-0.5)*2
def lr_plot(a,visualizers):
    ax = visualizers[1].figure.get_axes()[0]
    ax.plot(lr_x, lr_y, 'b.')
    ax.plot([-5,5],[-5*a,5*a], 'r--')
    l = np.mean((lr_y-a*lr_x)**2)
    ax.text(-4,8,f'$\ell = ${l:.3f}')
    ax.set_xlabel('$x$')
    ax.set_ylabel('$y$')
    
    
linear_regression_pb = ParametersBox(a=(1.0, -5.0, 5.0, 0.1, r'$a$'),refresh_mode="continuous")
    
linear_regression_code_demo = CodeDemo(
            input_parameters_box=linear_regression_pb,
            visualizers = [ClearedOutput(),lr_pyplot_output],
            update_visualizers = lr_plot)
display(linear_regression_code_demo)
linear_regression_code_demo.run_demo()

<span style="color:blue">**01** What is (roughly) the best value of $a$ that minimizes the loss in the linear regression model? </span>

In [None]:
ex01_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex01-answer", ex01_txt)
display(ex01_txt)

In a linear regression model, the loss can be minimized with a closed expression, by setting $\partial \ell/\partial a = 0$ and solving for $a$.

<span style="color:blue">**02** Write the expression for the optimal $a$ for a one-dimensional linear regression problem where the loss is optimized on pairs of inputs and targets $(x_i, y_i)$ </span>

In [None]:
ex02_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex02-answer", ex02_txt)
display(ex02_txt)

This approach can be easily generalized to more complex models: in the most general terms, $\ell$ can be minimized numerically, by computing the derivatives of $y(x)$ with respect to the model parameters. 
Here, we consider the simpler case of a polynomial model, in which $y(x)=\sum_k^{d} w_k x^k$. 


_NB: this is a very bad choice of a polynomial basis to expand the function (most notably, because the different polynomials are not orthogonal). We are just doing this as a simple example, never try this for a real problem!_

This can actually be seen as a special case of multi-dimensional linear regression, where each sample is described by several _features_ (or often referred to as _descriptors_ in material science). In this case we would have a $d$-dimensional feature vector of form $x_{ik}=x_i^{k}$ and can recover the target as a vector dot product $y(\mathbf{x}_i) = \mathbf{w}\cdot\mathbf{x}_i$.


Play around with the widget below. It is _really_ difficult to fit the model manually!

In [None]:
# parameters of the polynomial target function
npoly = 5
pr_w = [3, 1, 1, -0.3, -0.05, 0.01]

np.random.seed(12345)
pr_x = (np.random.uniform(size=20)-0.5)*10
pr_y = (np.random.uniform(size=20)-0.5)*3
for k in range(len(pr_w)):
    pr_y += pr_w[k]*pr_x**k
pr_demo_figure, _ = plt.subplots(1, 1, tight_layout=True)
pr_pyplot_output = PyplotOutput(pr_demo_figure)    
def pr_plot(w_0,w_1,w_2,w_3,w_4,w_5,visualizers):    
    ax = visualizers[1].figure.get_axes()[0]    
    xx = np.linspace(-5, 5, 60)
    yy = np.zeros(len(xx))
    lw = [w_0,w_1,w_2,w_3,w_4,w_5]
    my = pr_x*0
    ty = np.zeros(len(xx))
    fy = np.zeros(len(xx))
    for k in range(len(lw)):
        yy += lw[k]*xx**k
        my += lw[k]*pr_x**k
        ty += pr_w[k]*xx**k
    
    l = np.mean((pr_y-my)**2)
    ax.plot(pr_x, pr_y, 'b.', label="train data")
    ax.plot(xx, yy, 'r--', label="manual fit")
    ax.text(-4,-1,f'$\ell = ${l:.3f}')
    ax.set_xlabel('$x$')
    ax.set_ylabel('$y$')
    ax.set_ylim(min(pr_y)-1, max(pr_y)+1)
pr_pb = ParametersBox(
    w_0=(1.0, -5.0, 5.0, 0.01,  r'$w_0$'),
    w_1=(0.01, -2.0, 2.0, 0.01, r'$w_1$'),
    w_2=(0.01, -1.0, 1.0, 0.01, r'$w_2$'),
    w_3=(-0.2, -1.0, 1.0, 0.01, r'$w_3$'),
    w_4=(0.01, -0.1, 0.1, 0.01, r'$w_4$'),
    w_5=(0.01, -0.1, 0.1, 0.01, r'$w_5$'),
    refresh_mode="continuous"
)
polynomial_regression_code_demo = CodeDemo(
            input_parameters_box=pr_pb,
            visualizers = [ClearedOutput(),pr_pyplot_output],
            update_visualizers = pr_plot)
display(polynomial_regression_code_demo)
polynomial_regression_code_demo.run_demo()

The loss can be written in a vectorial form, $\ell(\mathbf{x}_i, y_i) \propto \sum_i (\mathbf{w}\cdot\mathbf{x}_i - y_i)^2$. If $\mathbf{X}\in\mathbb{R}^{n_\mathrm{train}\times d}$ is the matrix collecting the $x_i^k\in\mathbb{R}$ in the rows, as it is $\mathbf{y}\in\mathbb{R}^{n_\mathrm{train}}$ for the targets, then a closed expression for the optimal weight vector can be derived as

$$
\mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}.\\
$$
See [here](https://en.wikipedia.org/wiki/Linear_regression#Least-squares_estimation_and_related_techniques) if you are interested in the derivation.
You can compare the expression with the one-dimensional case, and you will see immediately how this expression generalizes the $d$-dimensional case.

We can now start looking to more realistic issues that arise in the context of regression models. First, data can contain a certain level of _noise_. This can be actual random noise, or (often) hidden input features or relationships that cannot be captured by the chosen model. Second, a model that predicts the targets only for the data it had been trained on is of very little use: we want to be able to do predictions on unseen data.
For this reason, it is customary to set aside a fraction of the available data that is not used to determine the weights in the minimization of $\ell$. The error on this _test set_ is an indication of how good the model will perform on the prediction of new points, i.e. how well the model _generalizes_.

In [None]:
lm_demo_figure, _ = plt.subplots(1, 1, tight_layout=True)
lm_pyplot_output = PyplotOutput(lm_demo_figure) 
def lm_plot(noise,hidden, npoints, tgt, fit, visualizers):
    # npoints are training points
    ax = visualizers[1].figure.get_axes()[0]    

    min_npoints = 10
    max_npoints = 150
    ntest = 50
    poly_degree = 6
    
    xx = np.linspace(-5, 5, 60)
    yy = np.zeros(len(xx))
    np.random.seed(54321)
    pr_x = (np.random.uniform(size=max_npoints+ntest)-0.5)*10
    pr_X = np.vstack( [pr_x**k for k in range(poly_degree)]).T
    pr_y = (np.random.uniform(size=len(pr_x))-0.5)*noise
    for k in range(len(pr_w)):
        pr_y += pr_w[k]*pr_x**k
    pr_y += hidden*np.sin(pr_x*4)
    
    fit_w = np.linalg.lstsq(pr_X[:npoints], pr_y[:npoints], rcond=None)[0]  
    my = pr_x*0
    ty = np.zeros(len(xx))
    fy = np.zeros(len(xx))
    pr_fy = np.zeros(len(pr_x))
    for k in range(len(pr_w)):                
        ty += pr_w[k]*xx**k
    for k in range(poly_degree):
        fy += fit_w[k]*xx**k
        pr_fy += fit_w[k]*pr_x**k
    ty += hidden*np.sin(xx*4)
    
    
    l = np.mean((pr_y-pr_fy)[:npoints]**2)    
    lte = np.mean((pr_y-pr_fy)[max_npoints:]**2)
    ax0 = ax
    ax0.plot(pr_x[:npoints], pr_y[:npoints], 'b.', label="train data")
    ax0.plot(pr_x[max_npoints:], pr_y[max_npoints:], 'kx', label="test data")
        
    if tgt:
        ax0.plot(xx, ty, 'b:', label="true target")
    if fit:
        ax0.plot(xx, fy, 'b--', label="best fit")
    
    ax0.set_ylim(min(ty)-1, max(ty)+1+noise/2)
    ax0.text(0.1,0.15,f'$\mathrm{{RMSE}}_\mathrm{{train}} = ${np.sqrt(l):.3f}', transform=ax0.transAxes, c='r')
    ax0.text(0.1,0.05,f'$\mathrm{{RMSE}}_\mathrm{{test}} = ${np.sqrt(lte):.3f}', transform=ax0.transAxes, c='r')
    ax0.set_xlabel('$x$')
    ax0.set_ylabel('$y$')
    ax0.legend(loc="upper right")

""" TODO learning curve plot! possibly better as a later     
    n_samples = np.geomspace(min_npoints, max_npoints, 5, dtype=int)
    test_rmse = [np.sqrt(np.mean((pr_X[max_npoints:]@np.linalg.lstsq(pr_X[:n], pr_y[:n], rcond=None)[0] - pr_y[max_npoints:])**2))
                for n in n_samples]
    train_rmse = [np.sqrt(np.mean((pr_X[:n]@np.linalg.lstsq(pr_X[:n], pr_y[:n], rcond=None)[0] - pr_y[:n])**2))
            for n in n_samples]
    ax[1].plot(n_samples, test_rmse, marker='o', color='b', label='test')
    ax[1].plot(n_samples, train_rmse, marker='x', color='black', label='train')
    ax[1].set_xlabel('$n_\mathrm{train}$')
    ax[1].set_ylabel('$\mathrm{{RMSE}}$')
    ax[1].legend(loc="upper right")
    ax[1].set_ylim(0, 6.5)
    ax[1].set_xlim(0, max_npoints)
"""
    
linear_model_pb = ParametersBox(    
        noise=(5.0, 0.1,10,0.1, 'Noise'),
        hidden=(0.0, 0, 5,0.01, 'Hidden', {"readout_format" : ".2f"}),
        npoints=(20, 5, 150, 1, r'$n_\mathrm{train}$'),
        # poly_degree=(6, 2, 10, 1, r'$d$ degree of polynomial'),
        tgt=(False, r'Show true target'),
        fit=(True, r'Show best fit'))
    #fig_ax=fig_ax
linear_model_demo = CodeDemo(
            input_parameters_box=linear_model_pb,
            visualizers = [ClearedOutput(),lm_pyplot_output],
            update_visualizers = lm_plot)
display(linear_model_demo)
linear_model_demo.run_demo()

<span style="color:blue">**03a** Compare the error on the train and the test sets. Which is typically higher? How do train and test errors change when you change number of training points from the lowest to the highest level?  </span>

In [None]:
ex03a_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex03a-answer", ex03a_txt)
display(ex03a_txt)

<span style="color:blue">**03b** How do the train and test loss change when the level of noise is increased? And how do they change when the level of hidden relationships is increased or decreased? Is there a clear difference between the effect of noise and that of hidden terms? </span>

In [None]:
ex03b_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex03b-answer", ex03b_txt)
display(ex03b_txt)

The tendency of achieving a very low loss on the train set and a much larger test set error is a general phenomenon known as [_overfitting_](https://en.wikipedia.org/wiki/Overfitting). Overfitting is usually particularly bad when the train set size siginficant smalller than the number of model parameters. Polynomial regression with a high degree is notorious for overfitting.

A common strategy to avoid overfitting is known as _regularization_. In broad terms, regularization implies penalizing solutions of the model that are too rapidly varying, and favouring those that are smoother, even at the cost of a slight increase of the train set error. In linear regression, the most common approach for linear models is to introduce a penalty term on the variance of the model. The variance of a linear model is determined by the parameter space the weights can span. Therefore a penalty term on the norm of the weights is added to the loss function. The most popular case of the $L^2$-norm is referred to as the [Tikhonov regularization](https://en.wikipedia.org/wiki/Tikhonov_regularization). In this case the loss function is extended by a $L^2$ penalty on the weights.

$$
\ell = \frac{1}{n_\mathrm{train}} \sum_i (\mathbf{w}\cdot \mathbf{x}_i - y_i)^2 + \lambda \|\mathbf{w}\|^2_2.
$$

The model corresponding to this loss function is referred to as $L^2$-regularized least-squares estimator or ridge regression. The $L^2$-regularization is the most popular approach, because it yields a closed-form solution for the weights 

$$
\mathbf{w} = (\mathbf{X}^T\mathbf{X}+\lambda \mathbf{1})^{-1}\mathbf{X}^T\mathbf{y},
$$

and can therefore be efficiently computed. Alternative choices of norm, both in the error and in the regularization, may lead to desirable behaviors. For example the $L^1$ norm, referred to as [Lasso](https://en.wikipedia.org/wiki/Lasso_(statistics)), is typically used to obtain a _sparse_ weight matrix, i.e. a matrix in which many terms are vanishingly small.

This widget allows you to experiment with the effect of ridge regularization on the same polynomial fitting exercise. 

_NB: given that we are using a very poor basis, and different features span widely different scales, the underlying implementation is slightly more complicated, in that different weights are scaled differently before computing the Tikhonov term. This scaling is done so that a single parameter can be meaningfully used to control the regularity of the fit._ 

In [None]:
regularization_demo_figure, _ = plt.subplots(1, 1, tight_layout=True)
regularization_pyplot_output = PyplotOutput(regularization_demo_figure) 

def regularization_plot(noise, hidden, npoints, tgt, fit, lam,visualizers):    
    ax = visualizers[1].figure.get_axes()[0]       
    xx = np.linspace(-5, 5, 60)
    yy = np.zeros(len(xx))
    poly_degree = 6
    np.random.seed(54321)
    xsz = 10
    pr_x = (np.random.uniform(size=2*npoints)-0.5)*xsz
    pr_X = np.vstack( [pr_x**k for k in range(6)]).T
    pr_y = (np.random.uniform(size=len(pr_x))-0.5)*noise
    for k in range(len(pr_w)):
        pr_y += pr_w[k]*pr_x**k
    pr_y += hidden*np.sin(pr_x*4)
    wscale = np.asarray([1/(xsz*0.5)**k for k in range(npoly+1)])
    fit_w = np.linalg.solve(
        pr_X[::2].T@pr_X[::2]+
        10**lam*(npoints//2)*np.diag(wscale**2), 
        pr_X[::2].T@pr_y[::2])
    my = pr_x*0
    ty = np.zeros(len(xx))
    fy = np.zeros(len(xx))
    pr_fy = np.zeros(len(pr_x))
    for k in range(len(pr_w)):                
        ty += pr_w[k]*xx**k
        fy += fit_w[k]*xx**k
        pr_fy += fit_w[k]*pr_x**k
    ty += hidden*np.sin(xx*4)
    
    l = np.mean((pr_y-pr_fy)[::2]**2)
    lte = np.mean((pr_y-pr_fy)[1::2]**2)
    ax.plot(pr_x[::2], pr_y[::2], 'b.', label="train data")
    ax.plot(pr_x[1::2], pr_y[1::2], 'kx', label="test data")    
    
    if tgt:
        ax.plot(xx, ty, 'b:', label="true target")
    if fit:
        ax.plot(xx, fy, 'b--', label="best fit")
    
    ax.set_ylim(min(ty)-1, max(ty)+1+noise/2)    
    ax.text(0.1,0.25,f'$\mathrm{{RMSE}}_\mathrm{{train}} = ${np.sqrt(l):.3f}', transform=ax.transAxes, c='r')
    ax.text(0.1,0.15,f'$|\mathbf{{w}}| = ${np.linalg.norm(wscale*fit_w):.3f}', transform=ax.transAxes, c='r')
    ax.text(0.1,0.05,f'$\mathrm{{RMSE}}_\mathrm{{test}} = ${np.sqrt(lte):.3f}', transform=ax.transAxes, c='r')
    ax.set_xlabel('$x$')
    ax.set_ylabel('$y$')
    ax.legend(loc="upper right")
    
regularization_pb = ParametersBox(    
    noise=(5.0, 0.1,10,0.1, 'Noise'),
    hidden=(0.0, 0, 5,0.01, 'Hidden', {"readout_format" : ".2f"}),
    npoints=(10, 5, 100, 1, r'$n_\mathrm{train}$'),
    tgt=(True, r'Show true target'),
    fit=(True, r'Show best fit'),
    lam = (-5.0,-5,5,0.1, r'$\log_{10} \lambda$'))
    #fig_ax=fig_ax
regularization_demo = CodeDemo(
            input_parameters_box=regularization_pb,
            visualizers = [ClearedOutput(),regularization_pyplot_output],
            update_visualizers = regularization_plot)
display(regularization_demo)
regularization_demo.run_demo()

<span style="color:blue">**04a** Work with (noise, hidden, ntrain) = (5,0,10). What is the value of $\lambda$ that minimizes the _test_ error?  </span>

In [None]:
ex04a_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex04a-answer", ex04a_txt)
display(ex04a_txt)

<span style="color:blue">**04b** Working with the same parameters (most importantly, the number of train points), comment on the behavior of the best fit function (smoothness, maximum error from a training data point) and the various diagnostics as you vary the regularization away from the optimum value.  </span>

In [None]:
ex04b_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex04b-answer", ex04b_txt)
display(ex04b_txt)

<span style="color:blue">**04c** Increase the number of training points to 100. How does the behavior of ridge regression change? Is the same value of $\lambda$ still optimal? </span>

In [None]:
ex04c_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex04c-answer", ex04c_txt)
display(ex04c_txt)

The regularization $\lambda$ is one of the so-called _hyperparameters_ ("hypers"), that tune the behavior of the model but are not directly optimized on the train set. In this case, the number of polynomial terms is another hyperparameter. Optimizing the hyperparameters on the test set is bad practice, because this amounts to _data leakage_, and makes the test error less representative of the true generalization capabilities of the model. 

We won't get into details, but consider that the strategy to optimize the hyperparameters is to set aside a _validation_ set that is neither used for training nor for testing, but just to tune the hyperparameter values, or to perform _cross validation_. This implies splitting the training set into train/validation parts, and repeating the exercise over multiple _folds_ (subsets of the data).

# Fingerprints and descriptors

The first step in any data-driven study of materials involves codifying the structure and composition of the materials being studied into a mathematical form that is suitable to be used as the input of the subsequent steps. Here we focus in particular on the definition of _fingerprints_, or _descriptors_ - a vector of numbers that are associated with each structure, assembled into a _feature vector_ $\mathbf{x}_i$. 

In this module we are going to use a dataset of materials from the [materials project](https://materialsproject.org/). The dataset has been reformatted as an extended XYZ file, that can be read, as usual, with the ASE `ase.io.read` function. The target properties (mostly related to the elastic behavior) can be accessed from the `structure.info` dictionary of each frame.

```
data = read("filename.xyz", ":")
X = []
y = []
for structure in data:
    X.append(get_features(structure))
    y.append(structure.info['property_name'])
```

where `get_features` is a function that processes the structural information of each frame to convert it into a set of fingerprints. 

Descriptors can be either precise [_representations_](https://pubs.acs.org/doi/10.1021/acs.chemrev.1c00021) of the coordinates and chemical nature of all atoms in a structure (which are commonplace in the construction of machine-learning interatomic potentials) or fingerprints based on a combination of structural parameters and properties that can be computed easily. For instance, one could take the electronegativity of elements in combination with the point group of the crystal structure. 

Here we take a very simple (and somewhat crude) approach, describing each structure by its chemical composition - that is, the feature $x_{iZ}$ contains the fraction of the atoms of structure $i$ that has atomic number $Z$. This means that for instance $\mathrm{H_2}$ would have fingerprint `[1,0,0,0, ... ]`, $\mathrm{LiH}$ `[0.5,0,0.5,0,0, ... ]`, $\mathrm{Li_2HBe}$ `[0.5,0,0.25,0.25, ... ]`.

<span style="color:blue">**05** Write a function that takes structural information for each frame and returns a vector containing the fractional composition of each compound, to be used as descriptors. </span>

_Hint: ASE `Atoms` frames have a `numbers` member variable that holds an array with the atomic numbers of all atoms in the structure (e.g. a CH4 structure will have [6,1,1,1,1])._

In [None]:
ex05_wci = WidgetCodeInput(
        function_name="descriptor_base", 
        function_parameters="structures",
        docstring="""Computes a feature matrix for the structures given in input,
        which is a vector of the fractional composition of each structure in the dataset, 
        e.g. given [H2, He2, HHe, LiH] returns something like 
        [[1,0,0],[0,1,0],[0.5,0.5,0],[0.5,0,0.5]]. 
        The total vector size depends on how many elements are present in the data set,
        but it's OK if there are zeros. 
        
        :param structures: a list of ase.Atoms structures
        
        :return: a (nstructures x nspecies) feature matrix containing the composition fingerprints
""",
        function_body="""

import numpy as np
from ase.io import read

# allocates plenty of space for all elements
descriptor = np.zeros((len(structures), 100))

# fill the descriptor with composition features
# NB: structures[i].numbers contains a list of the atomic numbers of the atoms in the structure

return descriptor

"""
        )

In [None]:
def array_to_html_table(numpy_array, header):
    rows = ""
    for i in range(len(numpy_array)):
        rows += "<tr>" + functools.reduce(lambda x,y: x+y,
                             map(lambda x: "<td>" + str(x) + "</td>",
                                 numpy_array[i])
                            ) + "</tr>"

    return "<table>" + header + rows + "</table>"


def mk_table_05():
    structures=read('data/mp_elastic.extxyz',':')
    l = ex05_wci.get_function_object()(structures)

    x = []   
    for a,b in enumerate(l):
        s = structures[a].symbols
        x.append([s,np.round(b,2).tolist()])

    header = """<tr>
                  <th>Symbols <span style="padding-left:150px"></th>
                  <th>Fractional composition <span style="padding-left:150px"></th>
                </tr>"""
    demo_table_html = HTML(
        value=f"Table")
    demo_table_html.value = array_to_html_table(x, header)

    demo_table = HBox(layout=Layout(height='270px', overflow_y='auto'))
    demo_table.children += (demo_table_html,) 
    display(demo_table)
    


In [None]:
def ex05_equality(a,b):
    # checks if the Gram matrix is the same, so we avoid differences in order and zeros
    return np.allclose(a@a.T, b@b.T)
def table_updater(code_input,visualizer):
    print_output = visualizer[0]
    with print_output:
        mk_table_05()
        
ex05_code_demo = CodeDemo(
            code_input= ex05_wci,
            check_registry=check_registry,
            visualizers = [ClearedOutput()],
            update_visualizers = table_updater,
            merge_check_and_update_buttons=False)
ex05_reference01 = read('data/mp_elastic.extxyz','::100')
ex05_ref_input = [{"structures" : structure} for structure in ex05_reference01]
ex05_ref_output = np.loadtxt('data/mp_elastic_05ref.txt')

check_registry.add_check(ex05_code_demo,
                         inputs_parameters=ex05_ref_input,
                         reference_outputs = ex05_ref_output,
                         equal=ex05_equality)
answer_registry.register_answer_widget("ex05-function", ex05_code_demo)
display(ex05_code_demo)
ex05_code_demo.run_demo()

<span style="color:blue">**06** Extend the function above to also evaluate powers of the fractional compositions, and combine them in a single feature vector (i.e. to have $[x_\mathrm{H}, x_\mathrm{He}, \ldots x_\mathrm{U}, x_\mathrm{H}^2, x_\mathrm{He}^2 , \ldots]$).</span>

_Hint: you can use `np.hstack([x1, x2, x3])` to combine several matrices row-wise (the columns of `x2` and `x3` are pasted to the right of those of `x1`)._

In [None]:
ex06_wci = WidgetCodeInput(
        function_name="descriptor_poly", 
        function_parameters="structures, nmax",
        docstring="""compute the powers of the fractional composition and stack them 
        to form a larger feature vector
        
        structures: a list of structures in `ase.Atoms` format
        nmax : maximum order of the polynomial
""",
        function_body="""

import numpy as np
from ase.io import read

# allocates plenty of space for all elements
descriptor = np.zeros((len(structures), 100))

# fill the descriptor with composition features
# NB: structures[i].numbers contains a list of the atomic numbers of the atoms in the structure


# create a list containing the entry-wise powers of the compositions
# NB: if X is a numpy array, X**k computes the k-th entry-wise power
# NB: you can use `np.hstack` to combine different matrices into a bigger array. Read the doc!
descriptor = np.hstack( LIST_OF_MATRICES )
return descriptor
"""
        )



In [None]:
def mk_table_06():
    structures=read('data/mp_elastic.extxyz',':')
    l = ex06_wci.get_function_object()(structures, ex06_wp.value['n'])

    x = []   
    for a,b in enumerate(l):
        s = structures[a].symbols
        x.append([s,np.round(b,2).tolist()])

    header = """<tr>
                  <th>Symbols <span style="padding-left:150px"></th>
                  <th>Fractional composition upto order n <span style="padding-left:150px"></th>
                </tr>"""
    demo_table_html = HTML(
        value=f"Table")
    demo_table_html.value = array_to_html_table(x[::100], header)

    demo_table = HBox(layout=Layout(height='250px', overflow_y='auto'))
    demo_table.children += (demo_table_html,) 
    display(demo_table)
    
ex06_wp =  ParametersBox(n = (2,1,8,1,r'$n_{max}$'))    


In [None]:

def table_updater2(code_input,visualizer):
    print_output = visualizer[0]
    with print_output:
        mk_table_06()
ex06_code_demo = CodeDemo(
            code_input= ex06_wci,
            check_registry=check_registry,
            visualizers = [ClearedOutput()],
            update_visualizers = table_updater2)
ex06_reference01 = read('data/mp_elastic.extxyz','::100')
ex06_ref_input = [{"structures" : structure,"nmax":3} for structure in ex05_reference01]
ex06_ref_output = np.loadtxt('data/mp_elastic_06ref.txt')

check_registry.add_check(ex06_code_demo,
                         inputs_parameters=ex06_ref_input,
                         reference_outputs = ex06_ref_output,
                         equal=ex05_equality)
answer_registry.register_answer_widget("ex06-function", ex06_code_demo)
display(ex06_code_demo)

**OPTIONAL** 

Here you can define your own fingerprints. Number density? Mass? Element electronegativity? Use your imagination. You will be able to test the performance of these descriptors in the next sections. 

In [None]:
custom_wci = WidgetCodeInput(
        function_name="descriptor_custom", 
        function_parameters="structures",
        docstring="""computes a custom feature matrix for the structures given in input.
""",
        function_body="""

import numpy as np
from ase.io import read

descriptor = np.zeros((len(structures), 10))

return descriptor
"""
        )
def mk_table_custom(code_input,visualizers):
    print_output = visualizers[0]
    structures=read('data/mp_elastic.extxyz',':')
    l = code_input.get_function_object()(structures)

    x = []   
    for a,b in enumerate(l):
        s = structures[a].symbols
        x.append([s,np.round(b,2).tolist()])

    header = """<tr>
                  <th>Symbols <span style="padding-left:150px"></th>
                  <th>Fractional composition up to order n <span style="padding-left:150px"></th>
                </tr>"""
    demo_table_html = HTML(
        value=f"Table")
    demo_table_html.value = array_to_html_table(x[::100], header)

    demo_table = HBox(layout=Layout(height='250px', overflow_y='auto'))
    demo_table.children += (demo_table_html,) 
    with print_output:
        display(demo_table)
custom_demo = CodeDemo(custom_wci,            
            visualizers = [ClearedOutput()],
            update_visualizers = mk_table_custom)
answer_registry.register_answer_widget("custom-function", custom_demo)
display(custom_demo)
custom_demo.run_demo()

# Dimensionality reduction

We begin with a quick example of principal components analysis (PCA) one of the simplest _unsupervised_ learning algorithms - actually, one that can hardly be called machine learning. 
PCA involves processing a high-dimensional feature matrix $\mathbf{X}$ and projecting it _linearly_ into a _latent space_ $\mathbf{T}$ of reduced dimensionality. The problem can be formulated in different terms: as the identification of the directions with maximum variance in feature space, as the maximisation of the variance retained in the latent space or as the low-rank orthogonal projection of the feature matrix that minimizes the information loss. 

The figure below demonstrates the functioning of PCA on a simple 2D dataset: the principal axes of the data distribution are identified, making it possible to reduce the description to just 1D while losing the smallest possible amount of information on the relative position of the points

<img src="figures/pca.png" width="500"/>

In rigorous terms, PCA corresponds to determine the orthogonal projection matrix $\mathbf{P}_{{XT}}$ that minimizes the loss

$$
\ell = \|\mathbf{X}-\mathbf{X}\mathbf{P}_{XT} \mathbf{P}_{XT}^T\|^2_2
$$

To understand what this does, consider that $\mathbf{P}_{{XT}}$ is a $n_X\times n_T$ matrix; $\mathbf{X}\mathbf{P}_{XT}$ transforms the feature vector of each point into a $n_T$-dimensional compressed version, forming a latent-space matrix $\mathbf{T}$. 
Then $\mathbf{T}\mathbf{P}_{XT}^T$ lifts this to the $n_X$-dimensional space. Given the compression, however, information has been lost and the data lie in a subspace (the blue points in the figure above). 

The projector $\mathbf{P}_{XT}$ can be built using a [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) of $\mathbf{X}$ or, equivalently, by computing the eigenvalue decomposition of the covariance matrix $\mathbf{X}^T\mathbf{X}$. Note that usually the feature matrix is _centered_ before identifying the principal components, that is the column-wise average of the features is subtracted from each row. 

While it's really easy to implement PCA manually, it is even simpler to use one of the many open implementations available, that also take care of centering. We use the implementation in `scikit-learn`. You are encouraged to read the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html), but the key workflow is simple:

```python
from sklearn.decomposition import PCA
pca = PCA(n_components=4)   # n_components: dimension of the latent space
itrain = range(0,ntrain)  # list of indices used for training
pca.fit(x[itrain])      # x[itrain] is a ntrain x nx feature matrix
t = pca.transform(x)    # applies compression to the full feature vector. 
                        # t is nsamples x n_components 
```

In [None]:
ex08_wci = WidgetCodeInput(
        function_name="PCA_analysis", 
        function_parameters="structures, f_fingerprint, f_train",
        docstring="""takes the structures, and the a function that can compute fingerprints, and compute
        the principal component analysis of the dataset. also computes a train/test split assuming that 
        the first len(structures)*f_train configurations are used for training. 
        
        
        :param structures: a list of ase.Atoms structures
        :param f_fingerprint: a function that takes a list of structures and returns the feature matrix
        :param f_train: the fraction of the structures list to be used for training 
        
        :returns: the latent-space coordinates for ALL structures (use at least 4 components), and a list of the indices of the train structures.
""",
        function_body="""

import numpy as np
from ase.io import read
from sklearn.decomposition import PCA

# indices of the train set. NB: you can select rows from a numpy array X
# by writing X[itrain]
itrain = list(range(0,int(len(structures)*f_train)))

# compute the feature matrix for the full set
X = ...

# initializes the PCA object and calls fit 
# on the training set. use at least 4 components, 
# as they are needed for the visualization!

# computes the latent-space projection of the FULL feature matrix
t = ... 
return t, itrain
"""
        )


In [None]:
cs_stride = 3
def fun_ex08(feats, ftrain,code_input,visualizers):
    print_output = visualizers[0]
    print_output.clear_output()
    structures = read('data/mp_elastic.extxyz',':')
    with print_output:
        if feats == "composition":
            f_descriptor = lambda s: ex05_wci.get_function_object()(s)
        elif feats == "polynomial ($n_{max}$=2)":
            f_descriptor = lambda s: ex06_wci.get_function_object()(s, 2)
        elif feats == "polynomial ($n_{max}$=4)":
            f_descriptor = lambda s: ex06_wci.get_function_object()(s, 4)
        elif feats == "polynomial ($n_{max}$=8)":
            f_descriptor = lambda s: ex06_wci.get_function_object()(s, 8)
        elif feats == "custom":
            f_descriptor = lambda s: custom_wci.get_function_object()(s)
        xlatent, itrain = code_input.get_function_object()(structures, f_descriptor,  ftrain)
    
    ftype = np.asarray([ "test " ] * len(structures)); ftype[itrain] = "train"
    fname = [ str(s.symbols) for s in structures]
    frames=structures    
    properties={"pca[1]": xlatent[::cs_stride,0], "pca[2]" : xlatent[::cs_stride,1],
                 "pca[3]" : xlatent[::cs_stride,2],  "pca[4]" : xlatent[::cs_stride,3],
                "type": ftype[::cs_stride] , "name": fname[::cs_stride]
               }
    settings={'map': {'x': {'property': 'pca[1]'},
  'y': { 'property': 'pca[2]'},
  #'color': {'max': 1, 'min': 0, 'property': 'K_error', 'scale': 'linear'},
  'symbol': 'type',
  'palette': 'inferno',
  'size': {'factor': 40}},
 'structure': [{'bonds': True,
   'spaceFilling': False,
   'atomLabels': False,
   'unitCell': True,
   'rotation': False,
   'supercell': {'0': 2, '1': 2, '2': 2},}]}
    
    
    chemiscope.write_input("module_07-pca-analysis.chemiscope.json.gz", 
                           frames=frames[::cs_stride], properties=properties)
    with print_output:
        display(chemiscope.show(frames[::cs_stride], properties, settings=settings
                               ))
    
ex08_pb =  ParametersBox(feats = (
    "composition", ["composition", "polynomial ($n_{max}$=2)", "polynomial ($n_{max}$=4)", "polynomial ($n_{max}$=8)", "custom"], r"fingerprints"),
    ftrain = (0.5,0.05,0.9,0.05,r'$f_{train}$'),refresh_mode="click")    

<span style="color:blue">**08** Implement a function that computes the PCA analysis given a set of structures, a fingerprint function that returns the feature matrix, and a the fraction of the structures to be used for training.
</span>

In [None]:
np.seterr(divide='ignore', invalid='ignore')

def fingerprintf(*args):

    return ex05_wci.get_function_object()(*args)
def ex08_chk(student,reference):
    return np.allclose(reference[:2],student[0][:,:2])
def identity(x):
    return x
ex_08_ref_input = [{"structures": structure, "f_fingerprint": fingerprintf, "f_train": 0.5} for structure in read('data/mp_elastic.extxyz','::100')]
ex_08_ref_output = np.loadtxt('data/ex08-ref_values.txt')
ex08_code_demo = CodeDemo(
            code_input= ex08_wci,
            input_parameters_box= ex08_pb,
            check_registry=check_registry,
            visualizers = [ClearedOutput()],
            update_visualizers = fun_ex08,
            merge_check_and_update_buttons=False)
check_registry.add_check(ex08_code_demo,
                         inputs_parameters=ex_08_ref_input,
                         reference_outputs = ex_08_ref_output,
                         equal=ex08_chk,
                        fingerprint=identity)
answer_registry.register_answer_widget("ex08-function", ex08_code_demo)

display(ex08_code_demo)

[Download chemiscope datafile](./module_07-pca-analysis.chemiscope.json.gz)

<span style="color:blue">**09a** Run the PCA for a "composition" fingerprint and $f_{train}=0.5$. 
 How does the latent-space projection look like? What do the axes roughly correspond to? How can you explain the appearence of the map, given the way the fingerprint is constructed and the structures in the training set?
</span>

_Hint: think about what the structures that are shown close together have in common, chemically-speaking, see how the structures change as you examine those that lie at the oppsite edges of the map, and keep in mind what information is encoded in the starting fingerprints._

In [None]:
ex09a_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex09a-answer", ex09a_txt)
display(ex09a_txt)

<span style="color:blue">**09b** Change the fraction of  training structures down to 0.05; does the qualitative appearence of the map change much? Do the actual meaning of the axes change (look in particular at the most "extreme" structures, that occurr at the periphery of the map)?
</span>

In [None]:
ex09b_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex09b-answer", ex09b_txt)
display(ex09b_txt)

<span style="color:blue">**09c** What happens if you use another set of features (say polynomial features up to $n_\mathrm{max}=8$)? Go back to 50% training.
</span>

In [None]:
ex09c_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex09c-answer", ex09c_txt)
display(ex09c_txt)

# Regression

In this section we attempt to establish an explicit data-driven relationship between the fingerprints we use to describe the structures in the dataset and their physical properties - in particular the Young modulus $K$ (the shear modulus $G$ and the Poisson ratio $\nu$ can also be chosen, if you want to try). 

We will use Ridge Regression, which is exactly the same framework we discussed in [section 1](#data-driven) in general terms. The loss is defined as 

$$
\ell = \frac{1}{n_\mathrm{train}} \sum_i | \mathbf{w}\cdot \mathbf{x}_i - y_i | ^2 + \alpha |\mathbf{w}|^2.
$$

where $\mathbf{w}$ are the regression weights, $y_i$ the target property for each sample and $\mathbf{x}_i$ the features associated with that sample. The regularization term $\alpha |\mathbf{w}|^2$ penalizes solutions with large values of the weights, and usually leads to a smoother interpolant of the training data that is less susceptible to overfitting. 

Even though - much as with PCA - it is simple to implement a solver for ridge regression, we will use the subroutines implemented in `scikit-learn`, namely `sklearn.linear_model.Ridge` ([documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html))

Usage is simple 

```python
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1e-4)   # alpha: magnitude of ridge regularization 
itrain = range(0,ntrain)    # list of indices used for training
ridge.fit(x[itrain], y[itrain])  # x[itrain] is a ntrain x nx feature matrix. 
                                 # y[itrain] is a vector of targets
y_pred = ridge.predict(x)    # predicts based on the full feature vector                        
```

_NB: it is good practice to randomize the entries before splitting train and test set, which we skip here for simplicity. You can see [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) for a discussion and some utility functions from `sklearn`._

In [None]:
ex10_wci = WidgetCodeInput(
        function_name="ridge_regression", 
        function_parameters="structures, target, f_fingerprint, f_train, alpha",
        docstring="""takes the structures, and the a function that can compute fingerprints, and compute
        linear regression for the target. the target is given as a string, matching the name used in 
        the `info` field in the structures.
        also computes a train/test split assuming that the first len(structures)*f_train configurations are used for training. 
        takes the ridge regularization magnitude alpha as input
        
        :param structures: a list of ase.Atoms structures
        :param target: a string indicating which property to learn
        :param f_fingerprint: a function that takes a list of structures and returns the feature matrix
        :param f_train: the fraction of the structures list to be used for training 
        
        :returns: the latent-space coordinates for ALL structures (use at least 4 components), and a list of the indices of the train structures.
""",
        function_body="""

import numpy as np
from ase.io import read
from sklearn.linear_model import Ridge

# indices of the train set. NB: you can select rows from a numpy array X
# by writing X[itrain]
itrain = list(range(0,int(len(structures)*f_train)))

# compute the feature matrix for the full set
X = ...
# makes a list of the targets, by extracting .info[target] from each structure
y = ...
# NB: if you use a Python list, convert it to a numpy array so you can index it with itrain
# e.g. if y = [1,2,3,4] you can't do y[itrain], but you can if y = np.asarray([1,2,3,4])

# initializes the Ridge object and calls fit on the training set

# predicts the property for ALL structures 
y_pred = ... 
return y_pred, itrain
"""
        )

In [None]:
cs_stride = 3
def fun_ex10(tgt,feats,ftrain,log10alpha,code_input,visualizers):

    print_output = visualizers[0]
    print_output.clear_output()
    structures = read('data/mp_elastic.extxyz',':')
    y = np.asarray([f.info[tgt] for f in structures])
    with print_output:
        if feats == "composition":
            f_descriptor = lambda s: ex05_wci.get_function_object()(s)
        elif feats == "polynomial ($n_{max}$=2)":
            f_descriptor = lambda s: ex06_wci.get_function_object()(s, 2)
        elif feats == "polynomial ($n_{max}$=4)":
            f_descriptor = lambda s: ex06_wci.get_function_object()(s, 4)
        elif feats == "polynomial ($n_{max}$=8)":
            f_descriptor = lambda s: ex06_wci.get_function_object()(s, 8)
        elif feats == "custom":
            f_descriptor = lambda s: custom_wci.get_function_object()(s)
        
        yp, itrain = ex10_wci.get_function_object()(structures, tgt, f_descriptor,  
                                                    ftrain, 10**log10alpha)
        
        print("MAE train: ", np.mean(np.abs((y-yp)[itrain])))
        print("MAE test: ",(np.sum(np.abs(y-yp)) - np.sum(np.abs(y-yp)[itrain]))/(len(y)-len(itrain)) )
    
    ftype = np.asarray([ "test " ] * len(structures)); ftype[itrain] = "train"
    fname = [ str(s.symbols) for s in structures]
    frames=structures
    properties={tgt: y[::cs_stride], tgt+"_predicted" : yp[::cs_stride], tgt+"_error": np.abs(y-yp)[::cs_stride],
                "type": ftype[::cs_stride] , "name": fname[::cs_stride]}
    
    settings={'map': {'x': {'property': tgt},
  'y': { 'property': tgt+'_predicted'},
  'color': {'max': 1, 'min': 0, 'property': tgt+'_error', 'scale': 'linear'},
  'symbol': 'type',
  'palette': 'inferno',
  'size': {'factor': 40}},
 'structure': [{'bonds': True,
   'spaceFilling': False,
   'atomLabels': False,
   'unitCell': True,
   'rotation': False,
   'supercell': {'0': 2, '1': 2, '2': 2},}]}
                  
    chemiscope.write_input("module_07-ridge-regression.chemiscope.json.gz", frames=frames[::cs_stride], properties=properties)
                           
    with print_output:
        display(chemiscope.show(frames=structures[::cs_stride], 
                   properties=properties, settings=settings
                  ) )
        
ex10_pb =  ParametersBox(
    target=("K", ["K", "G", "nu"], r"target"),
    feats=("composition", ["composition", "polynomial ($n_{max}$=2)", "polynomial ($n_{max}$=4)", "polynomial ($n_{max}$=8)", "custom"], r"fingerprints"),
    ftrain = (0.5,0.05,0.9,0.05,r'$f_{train}$'),
    log10alpha=(-3., -8, 2, 0.2, r"$\log_{10}(\alpha)$")
                       )
def fingerprintf(*args):
    return ex05_wci.get_function_object()(*args)
def ex10_chk(a,b):
    return np.allclose(a,b[0])
ex10_reference_input = [{'structures':read('data/mp_elastic.extxyz','::100'),'target':"K", 'f_fingerprint' :fingerprintf,'f_train':0.5,'alpha':1e-3}]
ex10_reference_output = [np.loadtxt('data/ex10-ref_values.txt')]
ex10_code_demo = CodeDemo(
            code_input= ex10_wci,
            check_registry=check_registry,
            input_parameters_box= ex10_pb,
            visualizers = [ClearedOutput()],
            update_visualizers = fun_ex10,
            merge_check_and_update_buttons=False)
answer_registry.register_answer_widget("ex10-function", ex10_code_demo)
check_registry.add_check(ex10_code_demo,
                         inputs_parameters=ex10_reference_input,
                         reference_outputs =ex10_reference_output,
                         equal=ex10_chk)
display(ex10_code_demo)

<span style="color:blue">**10** Implement a function that fits and evaluates a ridge regression model for a set of structures. The string indicating which property should be fitted, a fingerprint function that returns the feature matrix, and a the fraction of the structures to be used for training, are also arguments of the function.
</span>

_NB: the widget computes the mean absolute error (MAE), $\sum_i |y_i - y(x_i)|$, which is a measure of error that is less sensitive to outlier values_

[Download chemiscope datafile](./module_07-ridge-regression.chemiscope.json.gz)

<span style="color:blue">**11a** Run regression for the Young modulus $K$, using `composition` fingerprints, 50% training structures and a regularization of $10^{-3}$. What are the train and test errors?
</span>

In [None]:
ex11a_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex11a-answer", ex11a_txt)
display(ex11a_txt)

<span style="color:blue">**11b** Change the train size to the minimum and maximum values allowed. How do test and train mean absolute errors (MAEs) change? Can you explain the trend? Repeat with very small and very large regularization. What do you observe, and how can you explain it?
</span>

In [None]:
ex11b_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex11b-answer", ex11b_txt)
display(ex11b_txt)

<span style="color:blue">**11c** Use polynomial ($n_{max}=8$) descriptors and repeat the experiments in the previous question. What do you observe, and how can you explain it? What is the best test set error you can obtain by adjusting the regularization with f_train=0.85?
</span>

_Hint: consider the number of features used in the description, and the flexibility of the model. Think also at what you observed for the 1D regression. NB: adjusting the regularization based on test set accuracy as we do here is not good practice, and we only do it to get a sense of the role of regularization._

In [None]:
ex11c_txt = TextareaAnswer("Enter your answer here")
answer_registry.register_answer_widget("ex11c-answer", ex11c_txt)
display(ex11c_txt)

**OPTIONAL** 

Try to improve the custom descriptors to achieve a more accurate prediction of the Young modulus.