<img align="right" src="images/DSApps_logo_small.jpg">

# DSApps 2023 @ TAU: Assignment 7

### Giora Simchoni

### Deep Neural Networks with Keras/Tensorflow

### Welcome

Welcome to Assignment 7 in Python!

Remember:

* You can play with the assignment in Playground mode, but:
* Only your private Github repository assigned to you by the course admin will be cloned and graded (Submission mode, see instructions [here](https://github.com/DSApps-2023/Class_Slides/blob/main/Apps_of_DS_HW.pdf))
* Like any other University assignment, your work should remain private
* You need to `git clone` your private Github repository locally as explained [here](https://github.com/DSApps-2023/Class_Slides/blob/main/Apps_of_DS_HW.pdf)
* You need to uncomment the starter code inside the chunk, replace the `### YOUR CODE HERE ###`, run the chunk and see that you're getting the expected result
* Pay attention to what you're asked to do and the required output
* For example, using a *different* function than the one you were specifically asked to use, will decrease your score (unless you amaze me)
* Your notebook should run smoothly from start to end if someone presses in the Jupyter toolbar Kernel --> Restart & Run All
* When you're done save the entire notebook into a html file, this is the file that would be graded
* You can add other files but do not delete any files
* Commit your work and push to your private Github repository as explained [here](https://github.com/DSApps-2023/Class_Slides/blob/main/Apps_of_DS_HW.pdf)

This assignemtnt is due: TBD

### Libraries

These are the libraries you will need. If you don't have them, you need to uncomment the `!pip install` line and install them first (you can also just copy this command to a terminal and do it there if you don't want all the output printed in this notebook).

In [None]:
#!pip install matplotlib numpy scipy pandas scikit-learn tensorflow

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, KFold
from sklearn.linear_model import LinearRegression

from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dense, Dropout, Flatten, Input, MaxPool2D
from tensorflow.keras.callbacks import EarlyStopping, Callback
from tensorflow.keras.preprocessing.image import ImageDataGenerator

### Where NN beats LM

##### (30 points)

After our "Logistic regression as neural network" section in class, hopefully you know why for a simple binary classification simulated task, logistic regression performs just as well as a neural network.

Let's see this in a regression setting!

The following function simulates a standard $y = f(X) + \epsilon$ data, where $\epsilon_i \sim N(0,1)$ i.i.d. Its params are:
* `N` - no. of observations
* `p` - no. of X features (columns), not including intercept
* `intercept` - the intercept (e.g. 2.0)
* `X_non_linear` - if `False`, then $f(x) = X\beta$ where $\beta_j = 1$ for all $j \in \{1, \dots p\}$ (which means...?), otherwise $f(x) = X\beta \cdot cos(X\beta) + 2 \cdot X_1 \cdot X_2$, which is a very not-linear relationship

Finally the function returns a testing/training 80/20 split.

Go over it, see that you get what it does.

In [None]:
def easy_data(N, p, intercept, X_non_linear):
    X = np.random.uniform(-1, 1, N * p).reshape((N, p))
    betas = np.ones(p)
    Xbeta = intercept + X @ betas
    epsilon = np.random.normal(0, 1.0, N)
    if X_non_linear:
        fX = Xbeta * np.cos(Xbeta) + 2 * X[:, 0] * X[:, 1]
    else:
        fX = Xbeta
    y = fX + epsilon
    X_df = pd.DataFrame(X)
    x_cols = ['X' + str(i) for i in range(p)]
    X_df.columns = x_cols
    df = pd.concat([pd.DataFrame({'y': y}), X_df], axis=1)
    X_train, X_test, y_train, y_test = train_test_split(df.drop('y', axis=1), df['y'], test_size=0.2)
    return X_train, X_test, y_train, y_test

Use `easy_data()` to generate a **linear** relation between $X$ and $y$, with `N = 10000, p = 10, intercept = 1.0`.

In [None]:
X_train, X_test, y_train, y_test = ### YOUR CODE HERE ###

Use [sklearn](https://scikit-learn.org/stable/) to fit a linear model to `X_train, y_train` (Look at the imports). Use this model to predict on `X_test`.

In [None]:
lm = ### YOUR CODE HERE ###
y_pred = ### YOUR CODE HERE ###

Use this `mse()` function to get the test MSE on `y_test, y_pred`.

In [None]:
def mse(y_test, y_pred):
    return np.mean((y_test - y_pred) ** 2)

In [None]:
### YOUR CODE HERE ###

Use this `plot_reg()` function to plot predicted vs. true $y$s.

In [None]:
def plot_reg(y_test, y_pred):
    min_y = np.min([y_test.min(), y_pred.min()])
    max_y = np.max([y_test.max(), y_pred.max()])
    plt.scatter(y_test, y_pred, alpha=0.5)
    plt.xlabel('true')
    plt.ylabel('pred')
    plt.xlim((min_y, max_y))
    plt.ylim((min_y, max_y))
    plt.axline((min_y, min_y), (max_y, max_y), color='grey') # this line may not work on some versions
    plt.gca().set_aspect('equal')
    plt.show()

In [None]:
### YOUR CODE HERE ###

Write a `mlp(n_neurons)` function which gets a list of integers `n_neurons` and returns a regular Multi-Layer Perceptron (MLP), with a `Dense()` layer for each number of neurons in `n_neurons`, with a ReLU activation. The final layer is a single neuron layer without activation (or `activation='linear'`). Compile the model with a MSE loss and the Adam optimizer.

In [None]:
def mlp(n_neurons):
    model = Sequential()
    ### YOUR CODE HERE (maybe more than 1 line) ###
    model.add(Dense(1))
    model.compile(### YOUR CODE HERE ###)
    return model

This should give us a 2-hidden-layer network with 10 and 5 neurons at each layer:

In [None]:
model = mlp([10, 5])

Fit `model` on `X_train, y_train`, with 10% validation split, and a `EarlyStopping()` callback where if the validation loss has not decreased in 5 epochs learning is stopped. Use a `batch_size` of 30 and maximum 100 `epochs`.

In [None]:
callbacks = [### YOUR CODE HERE ###]
history = model.fit(### YOUR CODE HERE ###)

Use the `history` object in the `plot_loss()` function to see that indeed the loss has decreased through the `epochs`:

In [None]:
def plot_loss(history):
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.xlabel('epochs')
    plt.ylabel('mse')
    plt.show()

In [None]:
### YOUR CODE HERE ###

This is how we predict on `X_test` to get `y_pred`. Use `y_pred` in `mse()` and `plot_reg()` to print the test MSE and plot the $y$s.

In [None]:
y_pred = model.predict(X_test).reshape(y_test.shape)

### YOUR CODE HERE ###

So, hopefully you can see we didn't get any advantage by using a neural network, with a simple linear relationship.

BUT!

**Repeat everything** we just did only start `easy_data()` asking for a **non-linear** relation between $X$ and $y$ (all else may remain the same). Our DNN, which is a simple MLP, should shine bright (like a diamond).

In [None]:
X_train, X_test, y_train, y_test = ### YOUR CODE HERE ###

In [None]:
### YOUR CODE HERE (obviously you will need more than 1 cell to repeat everything) ###

### Moustache!

##### (60 points)

#### Part A - Getting the data ready

The CelebA dataset ([Liu et al. 2015](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) contains 202,599 cropped facial images from 10,177 celebrities, where each celebrity has between 1 and 35 images, annotated for various attributes (e.g. young or not) and landmarks (e.g. the location of the tip of the nose and mouth).

Let's work with the first 10,000 images, you can get them at my Google Drive [here](https://drive.google.com/drive/folders/1bxlahIPmYENc83PXfPBG0t1SFTCy7QIc?usp=sharing). You need the `celeba_small.zip` (570 MB) and you'll also need the `celeba_small.csv` table which is already in your `data` folder.

**NOTE 1:** If you're using Google Colab with your own Google identity, you don't actually need to download/upload the `celeba_small.zip` to the notebook environment. You can simply create a shortcut (right-mouse click on file, "Add a shortcut to drive"), then you have your own pointer to the zip in *your* drive. Then you can mount your drive to Google Colab, copy the zip to the environment and unzip it there. See e.g. [this](https://stackoverflow.com/a/52300696/4095235) answer.

**NOTE 2:** If you're using Google Colab, you might want to change your notebook settings to run on GPU, it will make things faster.

Once you get the zip file in this notebook's environment or in the HW7 folder, you can unzip it to get the 10K images in the `img_align_celeba_png` folder with this:

In [None]:
!unzip celeba_small.zip

See for example all of [Michael Schumacher](https://en.wikipedia.org/wiki/Michael_Schumacher)'s images (`celeb` no. 1). As you can see all images are pretty much aligned, they have 218 X 178 pixels, 3 color channels.

In [None]:
images_df = pd.read_csv('data/celeba_small.csv')

images_df.head()

In [None]:
# Michael's images, celeb no. 1
images_df[images_df['celeb'] == 1]['img_file'].to_list()

In [None]:
query = images_df[images_df['celeb'] == 1]['img_file'].to_list()

for i, img_file in enumerate(os.listdir('img_align_celeba_png')):
    if img_file in query:
        img = plt.imread('img_align_celeba_png/' + img_file)
        plt.imshow(img)
        plt.show()

As you can see in the `images_df` table there are a few landmarks like `nose_x` and `nose_y`.

In [None]:
landmarks = images_df.columns[1:-1].values
landmarks

Now let's see some of those on Michael.

In [None]:
michael_img_file = '000023.png'
michael_img = plt.imread('img_align_celeba_png/' + michael_img_file)
michael_landmarks = images_df[images_df['img_file'] == img_file][landmarks].values[0]
michael_landmarks = np.vstack(np.split(michael_landmarks, 5))
x = michael_landmarks[:, 0]
y = michael_landmarks[:, 1]
plt.imshow(michael_img)
plt.scatter(x, y, color='black', marker='+', s=100)
plt.show()

And here's an image of a moustache...

In [None]:
moustache = plt.imread('images/moustache.png')
plt.imshow(moustache)
plt.show()

Can we, using his landmarks, put a moustache on Michael?

In [None]:
michael_landmarks = dict(zip(landmarks, images_df[images_df['img_file'] == img_file][landmarks].values[0]))
michael_landmarks

In [None]:
moustach_borders = [michael_landmarks['leftmouth_x'], michael_landmarks['rightmouth_x'],
                      michael_landmarks['nose_y'], michael_landmarks['leftmouth_y']]
plt.imshow(moustache, extent=moustach_borders, zorder=2)
plt.imshow(michael_img, zorder=1)
plt.show()

Put it in a function:

In [None]:
person_landmarks = dict(zip(landmarks, images_df[images_df['img_file'] == michael_img_file][landmarks].values[0]))

def draw_moustache(img_file, person_landmarks, images_dir='img_align_celeba_png/'):
    person_img = plt.imread(images_dir + img_file)
    moustach_borders = [person_landmarks['leftmouth_x'], person_landmarks['rightmouth_x'],
                      person_landmarks['nose_y'], person_landmarks['leftmouth_y']]
    plt.imshow(moustache, extent=moustach_borders, zorder=2)
    plt.imshow(person_img, zorder=1)
    plt.show()

In [None]:
for i in [8, 5, 9]:
    img_file = '00000' + str(i) + '.png'
    person_landmarks = dict(zip(landmarks, images_df[images_df['img_file'] == img_file][landmarks].values[0]))
    draw_moustache(img_file, person_landmarks)

Notice in the `draw_moustache()` function, we only need 4 landmarks: `'leftmouth_x', 'rightmouth_x', 'nose_y', 'leftmouth_y'`.

What I want you to do is train a convolutional neural network (CNN) which would predict these 4 landmarks from an image, and will allow us to automatically put this Snapchat-like moustache mask on unseen faces!

Now to get all images for training, if you have enough RAM, you can actually do a similar procedure to what  we did in class with the `malaria` dataset. You will get a 4.65 gigabytes `X` float `np.array()`...

In [None]:
# images = []

# for img_file in images_df['img_file']:
#     images.append(plt.imread('img_align_celeba_png/' + img_file))

# X = np.array(images)

# required_landmarks = ['leftmouth_x', 'rightmouth_x', 'nose_y', 'leftmouth_y']
# y = images_df[required_landmarks]

# print(X.shape) # (10000, 218, 178, 3)
# print(y.shape) # (10000, 4)

But there's no need for that, and I want you to be ready to train on millions of images. Remember the real CelebA dataset has over 200K!

So we need a `ImageDataGenerator()`. Notice here we do not perform any image augmentation, only rescale the image arrays to be between 0 and 1, and asking the `train_datagen` to have a `validation_split`.

In [None]:
IMG_HEIGHT = 218
IMG_WIDTH = 178
batch_size = 30
epochs = 100
images_dir = 'img_align_celeba_png/'
required_landmarks = ['leftmouth_x', 'rightmouth_x', 'nose_y', 'leftmouth_y']

train_datagen = ImageDataGenerator(validation_split=0.1, rescale = 1. / 255)
test_datagen = ImageDataGenerator(rescale = 1. / 255)

Now, in class you've seen the `flow_from_directory()` method of the generator, uploading from train/valid/test directory each time the required batch of images, preprocessing it and training/predicting on it.

Let's use the `flow_from_dataframe()` method, which is more suitable when you have a DataFrame of additional data like we do, and your images aren't divided into separate directories.

Here we need to give each generator:
* the `dataframe` (which we get by properly filtering the train/test samples)
* the `directory` in which to find the images
* `x_col` which is the column in the DataFrame in which to find the images names
* `y_col` which is 1 or more label columns (we have 4!)
* `target_size`, the required dimensions of the image
* `class_mode`, we use `"raw"` for regression
* `batch_size`
* `shuffle` - notice there's no need to shuffle validation or testing sets
* `subset` - this allows to not define separate `datagen`s for training/validation

In [None]:
train_samp, test_samp = train_test_split(np.arange(images_df.shape[0]), test_size=0.2)

In [None]:
train_generator = train_datagen.flow_from_dataframe(
    dataframe = images_df[images_df.index.isin(train_samp)],
    directory = images_dir,
    x_col = 'img_file',
    y_col = required_landmarks,
    target_size = (IMG_HEIGHT, IMG_WIDTH),
    class_mode = 'raw',
    batch_size = batch_size,
    shuffle = True,
    subset = 'training',
    validate_filenames = False
)
valid_generator = train_datagen.flow_from_dataframe(
    dataframe = images_df[images_df.index.isin(train_samp)],
    directory = images_dir,
    x_col = 'img_file',
    y_col = required_landmarks,
    target_size = (IMG_HEIGHT, IMG_WIDTH),
    class_mode = 'raw',
    batch_size = batch_size,
    shuffle = False,
    subset = 'validation',
    validate_filenames = False
)

test_generator = test_datagen.flow_from_dataframe(
    dataframe = images_df[images_df.index.isin(test_samp)],
    directory = images_dir,
    x_col = 'img_file',
    y_col = required_landmarks,
    target_size = (IMG_HEIGHT, IMG_WIDTH),
    class_mode = 'raw',
    batch_size = batch_size,
    shuffle = False,
    validate_filenames = False
)

Put it in a function, `get_generators()` which would get `train_samp` (sample of training indices), `test_samp` (sample of testing indices) and `validation_split`, we'll use it later (Yes, it's basically copy-paste).

In [None]:
def get_generators(train_samp, test_samp, validation_split = 0.1):
    ### YOUR CODE HERE ###
    return train_generator, valid_generator, test_generator

#### Part B - Bad CNN!

Finally, you get to do something!

You are going to use the Functional API to implement the `cnn()` which should return a compiled CNN, with the following architecture:
* `Input()` layer, where you only need to specify the input shape (it is the shape of an image, `(IMG_HEIGHT, IMG_WIDTH, 3)`)
* `Conv2D()` layer, with 32 kernels, (5, 5) strides, padding valid, ReLU activation
* `MaxPool2D()` layer, with (2, 2) pool size
* `Conv2D()` layer, with 64 kernels, (5, 5) strides, padding valid, ReLU activation
* `MaxPool2D()` layer, with (2, 2) pool size
* `Conv2D()` layer, with 32 kernels, (5, 5) strides, padding valid, ReLU activation
* `MaxPool2D()` layer, with (2, 2) pool size
* `Conv2D()` layer, with 16 kernels, (5, 5) strides, padding valid, ReLU activation
* `MaxPool2D()` layer, with (2, 2) pool size
* `Flatten()` layer
* `Dropout()` layer of 50%
* Fully connected layer of 100 neurons and ReLU activation
* Fully connected layer with 4 neurons and no activation. This is the output layer.
* Use optimizer `'adam'` and loss `'mse'`

Why this model? I literally just copy-pasted it from some blog, as you might be inclined to do (it will get better).

In [None]:
def cnn():
    ### YOUR CODE HERE ###
    return model

Instantiate the model:

In [None]:
model = cnn()

Before you fit the model, show me a good baseline test MSE, above it you could say our network is quite bad:

(Hint: `test_generator.labels`)

In [None]:
### YOUR CODE HERE ###

Now, fit the model. Notice how we're using the same `fit()` method of the `model` which works the same on generators. I also want you to keep `verbose=1` to see the somewhat annoying print.

This should run for 20-50 epochs, each epoch should take ~30-60 seconds on a decent GPU. You should get a `val_loss` of about 10-20 pixels MSE. Is this a "good" loss?

In [None]:
callbacks = [EarlyStopping(monitor='val_loss', patience=10)]

history = model.fit(train_generator, 
                    validation_data = valid_generator,
                    epochs=epochs, callbacks=callbacks, verbose=1)

Plot the loss:

In [None]:
### YOUR CODE HERE ###

Predict $y$, notice we get a 4-column prediction array.

In [None]:
### YOUR CODE HERE ###

Calculate the test MSE (use `model.evaluate(...)`, it would be easiest).

In [None]:
### YOUR CODE HERE ###

Plot one of `y_pred` against the relevant `y_test` (see hint above). You should be very not-impressed :(

In [None]:
### YOUR CODE HERE ###

Apparently, the CNN predicted for each $y$ no better than the mean, and even slightly worse!

But you know what? Run this to get some moustaches and see that the mean isn't actually that bad.

In [None]:
for i in range(10):
    img_file = test_generator.filenames[i]
    person_landmarks = dict(zip(required_landmarks, y_pred[i, :]))
    draw_moustache(img_file, person_landmarks)

Now, the default printing with `verbose=1` is a bit annoying. You are going to implement a custom callback `BetterLossPrint()` which would print the epoch number, only the monitred loss (`'loss'` or `'val_loss'`), whether it has `DECREASED` or `INCREASED` from the last epoch, the best loss so far and how many epochs have passed since we have not seen a decrease in best loss.

You don't get any hints beside the proper [documentation](https://www.tensorflow.org/guide/keras/custom_callback), because that's life. See below how it is called.

In [None]:
class BetterLossPrint(Callback):
    """
    Prints at each epoch's end the epoch number, monitored loss, whether it has DECREASED/INCREASED from last epoch,
    best loss so far and no. of epochs since we've seen it
    """
    def __init__(self, monitor):
        super(BetterLossPrint, self).__init__()
        self.monitor = monitor
        
    ### YOUR CODE HERE ###

Now re-instantiate the model and fit it with `verbose=0` and use your custom callback **only**, meaning this would run for `epochs=10` iterations.

In [None]:
model = cnn()

epochs = 10

callbacks = [BetterLossPrint(monitor = 'val_loss')]

history = model.fit(train_generator, 
                    validation_data = valid_generator,
                    epochs=epochs, callbacks=callbacks, verbose=0)

#### Part C - Tuning CNN

So our model is pretty bad. It desparately needs tuning.

If it were a simple Sequential API without generators, we could use `GridSearchCV()` from sklearn to do something similar to what we did in class:

In [None]:
# from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
# from sklearn.model_selection import GridSearchCV

# def cnn_with_tuning(n_conv_layers, n_kernels, stride, dropout):
#     ### define model
#     return model

# keras_reg = KerasRegressor(cnn_with_tuning)

# params = {
#     'n_conv_layers': [1,3,5],
#     'n_kernels': [10,20,30],
#     'stride': [2,5],
#     'dropout': [True, False]
# }

# grid_search_cv = GridSearchCV(keras_reg, params, cv=5, verbose=4)

# grid_search_cv.fit(
#     X_train,
#     y_train,
#     validation_split = 0.1,
#     epochs=10, callbacks=callbacks, verbose=0
# )

Unfortunately `GridSearchCV()` does not work well with generator, without a proper wrapper.

Let's go manual!

Implement `cnn_with_tuning()` which receives the `n_conv_layers, n_kernels, stride, dropout` params. It should be very similar to `cnn()` only you start with a pair of `Conv2D(...)` and `MaxPool2D((2, 2))`, then add `n_conv_layers - 1` more pairs of these, before compiling and returning the model as before.

* You can assume `n_conv_layers` >= 1
* Notice the `Conv2D()` params no. of kernels/features and stride size are no longer fixed, they're being tuned, yet **use the same in each `Conv2D()`**
* And `dropout` is either `True` or `False`, i.e. it should be an optional layer

In [None]:
def cnn_with_tuning(n_conv_layers, n_kernels, stride, dropout):
    ### YOUR CODE HERE ###
    return model

This is a possible `dict()` of params to try at each combination.

In [None]:
params = {
    'n_conv_layers': [1,3,5],
    'n_kernels': [10,20,30],
    'stride': [2,5],
    'dropout': [True, False]
}

This is how we get a `KFold()` object from sklearn. We're asking for 5 folds without shuffling. We also make `epochs` to a fixed 10 (no `EarlyStopping()`) and initialize an empty DataFrame of results.

In [None]:
kf = KFold(n_splits=5)

epochs = 10
res_df = pd.DataFrame(columns = list(params.keys()) + ['fold', 'mse'])
counter = 0

So we're taking all combinations in `params` across 5 folds. Calculate how many `model`s are going to run:

In [None]:
### YOUR CODE HERE ###

Do the quick math and see how long this is going to take. If it is absurd, you can decrease no. of epochs or remove some values in `params`, but specify what you're doing. I want to see you know how to do this, not that you have a fancy GPU or can wait a long time...

Now go ahead and fill in the blanks to run this grid search manually.

**Note:**
* Run the grid search only on the training data! The training data has indices `train_samp`, 8000 rows.
* You run for a fixed no. of `epochs` always, you don't need `validation_data`!
* If you're using the `BetterLossPrint()` you will need to monitor `'loss'`, not `'val_loss'`

In [None]:
for n_conv_layers in params['n_conv_layers']:
    for n_kernels in params['n_kernels']:
        for stride in params['stride']:
            for dropout in params['dropout']:
                print()
                print(f'n_conv_layers: {n_conv_layers}; n_kernels: {n_kernels}; stride: {stride}; dropout: {dropout}')
                for fold, (train_index, test_index) in enumerate(kf.split(train_samp)):
                    print(f'  Fold: {fold}')
                    model = ### YOUR CODE HERE ###
                    train_generator, _, test_generator = ### YOUR CODE HERE ###
                    history = model.fit(### YOUR CODE HERE ###)
                    mse_fold = model.evaluate(test_generator, verbose=0)
                    res_df.loc[counter] = [n_conv_layers, n_kernels, stride, dropout, fold, mse_fold]
                    counter += 1           

Now look at `res_df`, show me by either a good summary, a visualization, or both, what are the best params, expected to reach the lowest MSE.

In [None]:
### YOUR CODE HERE ###

BTW, you can look at *my* resulting DataFrame in file `data/moustache_cv.csv`.

Finally, choose the best params and run on the entire training set, predict on the test set, show us proper plots to see that indeed we get a much better model.

In [None]:
### YOUR CODE HERE ###

Now, draw someone a moustache! No, seriously, take Amir Peretz's image in `images/peretz.png`, read it, predict landmarks, give him back his moustache and see that it makes sense.

Notice that when you read an image you get a 3D array, but `model` expects a 4D array.

In [None]:
### YOUR CODE HERE ###

### Paper questions

##### (10 points)

Read the first 7 pages of [LassoNet](https://www.jmlr.org/papers/v22/20-848.html) (Lemhadri et al., JMLR 2021). Of course you're welcome to read the whole thing!

In class we mentioned regularizing the network's weights using e.g. `kernel_regularizer=regularizers.l1(0.01)` inside a `Dense()` layer. But this! This is something entirely different, isn't it? Explain in your own words what is LassoNet.

In [2]:
### Your Explanation

What is the **main** difference between the loss optimized in the following model and the LassoNet loss (p. 6 eq. (2))?

In [None]:
def notLassoNet(p):
    input_layer = Input((p,))
    x1 = Dense(1, activation='linear', kernel_regularizer= regularizers.l1(0.01))(input_layer)
    x2 = Dense(p, activation='relu')(input_layer)
    x2 = Dense(1)(x2)
    output_layer = Add()([x1, x2])
    model = Model(input_layer, output_layer)
    model.compile(loss='mse', optimizer='adam')
    return model

### Wrap up

And that's it, you've implemented common DNNs such as a MLP and a CNN. You've seen where NN beats a simple linear model, you've implemented a custom callback, you tuned params - you did quite a lot! Good luck with the Final Project!