In [None]:
{
  "nbformat": 4,
  "nbformat_minor": 2,
  "metadata": {
    "colab": {
      "name": "Introduction_to_Tensorflow.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3.7.9 64-bit ('venv': venv)"
    },
    "language_info": {
      "name": "python",
      "version": "3.7.9",
      "mimetype": "text/x-python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "pygments_lexer": "ipython3",
      "nbconvert_exporter": "python",
      "file_extension": ".py"
    },
    "interpreter": {
      "hash": "6ca9d643f7bb6934fb11056c6bfcbba6fffd6f45119b8d99fc29db1b9c71c5b0"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# **NEUROENGINEERING AY 2021-22**\n",
        "## INTRODUCTION TO TENSORFLOW\n",
        "---\n",
        "### Matteo Rossi\n",
        "8 October 2021"
      ],
      "metadata": {
        "id": "PbPhwFNnoCGn"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Introduction**"
      ],
      "metadata": {
        "id": "uZiSAoF_rZVt"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "In this notebook, we will try to teach a neural network to distinguish between paintings made by two painters with very different styles: Rembrandt (XVII century) and Pollock (XX century).\n",
        "\n",
        "We will use a custom made dataset, assembled by automatically scraping Google Images using *rembrandt painting* and *pollock painting* as search keys.\n",
        "Since this is only for demostration purposes, no control was done on the dataset, so some paintings could be of other artists, but still the two classes should be different enough to be distinguishable.\n",
        "\n",
        "The dataset is provided as a folder of images and a CSV (Comma Separated Values) file containing the labels for each image (0 for a Rembrandt, 1 for a Pollock).\n",
        "The dataset is available [here](https://drive.google.com/drive/folders/1PQd8lnioqx6cMMLXg42iv5cj7mzB6RZX?usp=sharing): as explained in the previous notebook, add it to your *My Drive* folder to make it accessible from Colab.\n",
        "\n",
        "The network we will build will be a simple convolutional neural network, made of the following layers:\n",
        "* 2D convolution\n",
        "  * 16 filters\n",
        "  * 3&times;3 kernel\n",
        "  * `same` padding\n",
        "  * hyperbolic tangent activation function\n",
        "* Max pooling\n",
        "    * 2&times;2 pool size\n",
        "* Rectified Linear Unit ([`keras.layers.ReLU`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU))\n",
        "* Repeat 2D convolution + Max pooling + ReLU\n",
        "* Fully connected\n",
        "  * 128 units\n",
        "  * hyperbolic tangent activation function\n",
        "* Output layer\n",
        "  * ? units\n",
        "  * ? activation function\n",
        "\n",
        "The network will receive as input 100&times;100 pixels RGB images."
      ],
      "metadata": {
        "id": "G3uRMy-ooQok"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Set up the environment**"
      ],
      "metadata": {
        "id": "RmJwm0OCv1hM"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "First of all, you need to access the [dataset folder](https://drive.google.com/drive/folders/1PQd8lnioqx6cMMLXg42iv5cj7mzB6RZX?usp=sharing) and add it to *My Drive*.\n",
        "Once you have done that, you can mount the *My Drive* folder to make it accessible in Colab."
      ],
      "metadata": {
        "id": "0TATFIYav4GY"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "from google.colab import drive\r\n",
        "drive.mount('/content/drive')"
      ],
      "outputs": [],
      "metadata": {
        "id": "ODtwKDSYwNn9",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "ebc9c74b-9484-4b4a-cfef-8968e99b94b1"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "If you did everything correctly, the dataset should be available at the following path:"
      ],
      "metadata": {
        "id": "4YjUN0TcxFC3"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "NEUROART_FOLDER = \"/content/drive/My Drive/workshop_neuroengineering/neuroart\""
      ],
      "outputs": [],
      "metadata": {
        "id": "N6EUH-QZhh_O"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Check that the folder was correctly loaded by listing its content."
      ],
      "metadata": {
        "id": "7fkOkqfWxWKU"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "import os\r\n",
        "os.listdir(NEUROART_FOLDER)"
      ],
      "outputs": [],
      "metadata": {
        "id": "GxnK3c7xwRDi",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "42992b29-1aff-48a6-a046-08718dc5af8e"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "*Note: [`os.path.join`](https://docs.python.org/3/library/os.path.html#os.path.join) is used to combine multiple relative paths in a single one*"
      ],
      "metadata": {
        "id": "1qKFAo2IyHmI"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "LABELS_FILE   = os.path.join(NEUROART_FOLDER, 'labels.csv')\r\n",
        "IMAGES_FOLDER = os.path.join(NEUROART_FOLDER, 'images')"
      ],
      "outputs": [],
      "metadata": {
        "id": "RmDJ_z8-yKZB"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now we are ready to load the dataset, create the neural network and train it.\n",
        "Below are the modules we will need:"
      ],
      "metadata": {
        "id": "LP1QhgKi1BbF"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "import tensorflow as tf         \r\n",
        "import numpy as np              # useful for managin multidimensional arrays\r\n",
        "import matplotlib.pyplot as plt # to display images and plots\r\n",
        "from PIL import Image           # to read, write and manipulate images"
      ],
      "outputs": [],
      "metadata": {
        "id": "5gbUYe4IsnqS"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "And here some definitions we will need later on:"
      ],
      "metadata": {
        "id": "Z2TA2Jto2Im6"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "LABELS = ['Rembrandt', 'Pollock'] # 0 = Rembrandt, 1 = Pollock\r\n",
        "IMAGE_WIDTH  = 100\r\n",
        "IMAGE_HEIGHT = 100\r\n",
        "IMAGE_DEPTH  = 3 # RGB image"
      ],
      "outputs": [],
      "metadata": {
        "id": "eEMKmNk8huh4"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Load dataset**"
      ],
      "metadata": {
        "id": "vYfnNtZN2Suw"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "It is now time to load the data from the dataset and convert it to a format that is suitable for Keras (i.e. a [`numpy.ndarray`](https://docs.scipy.org/doc/numpy-1.17.0/reference/generated/numpy.ndarray.html#numpy.ndarray) of images as input and a [`numpy.ndarray`](https://docs.scipy.org/doc/numpy-1.17.0/reference/generated/numpy.ndarray.html#numpy.ndarray) of labels as output).\n",
        "\n",
        "The following code reads the content of the file containing the labels:"
      ],
      "metadata": {
        "id": "Etwaczz62Vg-"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "csvfile = open(LABELS_FILE, 'r')\r\n",
        "content = csvfile.read()\r\n",
        "csvfile.close()"
      ],
      "outputs": [],
      "metadata": {
        "id": "acdAUNsl3Ewf"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "The string method [`split`](https://docs.python.org/3/library/stdtypes.html#str.split) allows us to divide a string in pieces based on a separator.\n",
        "In particular, we can use `.split('\\n')` to split based on the newline separator, hence obtaining a list of file lines.\n",
        "\n",
        "Using it, try to print out the first 10 lines of the file to better understand its structure."
      ],
      "metadata": {
        "id": "OZWzbcma3HBm"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Print first 10 lines of the CSV file [solution]\r\n",
        "lines = content.split(\"\\n\")\r\n",
        "for line in lines[:10]:\r\n",
        "    print(line)"
      ],
      "outputs": [],
      "metadata": {
        "id": "ko7j-AB24gGv",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "f9451027-2ec1-4cd3-f4c3-d52bda12b4e4"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "As you can see, each line contains two values separated by a comma.\n",
        "The first one is the path of an image and the second one is the label associated to it (either `0` or `1`).\n",
        "\n",
        "Using the [`split`](https://docs.python.org/3/library/stdtypes.html#str.split) method again, we can now completely convert the CSV file into a table:"
      ],
      "metadata": {
        "id": "SWGUhdEi45R8"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "table = []\r\n",
        "for line in content.split(\"\\n\"):\r\n",
        "    if line == \"\": # Skip empty lines\r\n",
        "        continue\r\n",
        "    path, label = line.split(\",\")\r\n",
        "    table.append([path, label])\r\n",
        "\r\n",
        "print(table)"
      ],
      "outputs": [],
      "metadata": {
        "id": "qEhtiNji6BSQ",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "3f000b11-c7f6-4442-ae39-8127ab80b3bd"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can now check if the two classes are balanced by counting the images in each class, displaying a bar plot of the distribution and printing the count."
      ],
      "metadata": {
        "id": "Q9oVHNlaB0kP"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Check class balance\r\n",
        "rembrandt_count = 0\r\n",
        "pollock_count   = 0\r\n",
        "for row in table:\r\n",
        "    path, label = row\r\n",
        "    if label == \"0\":\r\n",
        "        rembrandt_count += 1\r\n",
        "    else:\r\n",
        "        pollock_count += 1\r\n",
        "\r\n",
        "plt.bar([0,1], [rembrandt_count, pollock_count])\r\n",
        "plt.xticks([0,1], labels=LABELS)\r\n",
        "print(\"Rembrandt: {} images\".format(rembrandt_count))\r\n",
        "print(\"Pollock: {} images\".format(pollock_count))"
      ],
      "outputs": [],
      "metadata": {
        "id": "Gf8E_44aCCS5",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 298
        },
        "outputId": "76cb5a2c-1cba-49c5-bbcc-45e0727e9c74"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "For a correct evaluation of our network, we should not use the whole dataset for training and instead keep a part of it for validation.\n",
        "As an example, we will split the data in this way:\n",
        "* **80%** of the images in the training set\n",
        "* **20%** of the images in the validation set\n",
        "\n",
        "We need to find the number of examples in the training and validation set."
      ],
      "metadata": {
        "id": "wN1vydtV6l5u"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Compute size of training and validation set\r\n",
        "TOTAL_EXAMPLES      = len(table)\r\n",
        "TRAINING_EXAMPLES   = int(0.8*TOTAL_EXAMPLES)\r\n",
        "VALIDATION_EXAMPLES = TOTAL_EXAMPLES - TRAINING_EXAMPLES\r\n",
        "\r\n",
        "print(TOTAL_EXAMPLES)\r\n",
        "print(TRAINING_EXAMPLES)\r\n",
        "print(VALIDATION_EXAMPLES)"
      ],
      "outputs": [],
      "metadata": {
        "id": "yFvRByD29vDl",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "0212e232-69c8-4768-ad20-3ef8e966e4b7"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "We could just split the whole table at 80% of its length and assign each part to the training and validation set, but there is a risk of having unbalanced classes in the two sets.\n",
        "It is better to first assign the images to two different groups (one for each class) and then divide each group in the training and validation set."
      ],
      "metadata": {
        "id": "R8SOVIlk_RJZ"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Divide table based on label\r\n",
        "rembrandt_examples = []\r\n",
        "pollock_examples = []\r\n",
        "for row in table:\r\n",
        "    path, label = row\r\n",
        "    if label == \"0\":\r\n",
        "        rembrandt_examples.append(path)\r\n",
        "    else:\r\n",
        "        pollock_examples.append(path)"
      ],
      "outputs": [],
      "metadata": {
        "id": "toAHiqggBHe9"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Since the two classes have balanced, we can get the number of examples for each label as half the total number of examples in a set:"
      ],
      "metadata": {
        "id": "uVEddbVrBsDs"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# two slashes are used for integer division\r\n",
        "TRAINING_REMBRANDT_EXAMPLES = TRAINING_EXAMPLES//2\r\n",
        "TRAINING_POLLOCK_EXAMPLES = TRAINING_EXAMPLES - TRAINING_REMBRANDT_EXAMPLES\r\n",
        "\r\n",
        "VALIDATION_POLLOCK_EXAMPLES = VALIDATION_EXAMPLES//2\r\n",
        "VALIDATION_REMBRANDT_EXAMPLES = VALIDATION_EXAMPLES - VALIDATION_POLLOCK_EXAMPLES\r\n",
        "\r\n",
        "print(f\"Training Rembrandt examples: {TRAINING_REMBRANDT_EXAMPLES}\")\r\n",
        "print(f\"Validation Rembrandt examples: {VALIDATION_REMBRANDT_EXAMPLES}\")\r\n",
        "print(f\"Training Pollock examples: {TRAINING_POLLOCK_EXAMPLES}\")\r\n",
        "print(f\"Validation Pollock examples: {VALIDATION_POLLOCK_EXAMPLES}\")"
      ],
      "outputs": [],
      "metadata": {
        "id": "NtGnP7A3Bf4X",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "1f760308-b043-4a61-afc2-b0f63ecda2f7"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can now actually create the training and validation set in a form that is suitable for Keras.\n",
        "Since this part is a little more complex, the code is provided for the Rembrandt class in the training set.\n",
        "\n",
        "Some information about the code:\n",
        "* [`numpy.empty`](https://docs.scipy.org/doc/numpy-1.17.0/reference/generated/numpy.empty.html#numpy.empty) creates a multidimensional array (which shape must be specified without initializing its content\n",
        "* [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate) takes a list and converts it to a list of pairs `(index, list_element)`; it is useful to iterate through a list while keeping track of the iteration number\n",
        "* [`Image.open`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.open) creates a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html) object from a given path; this object has some useful methods:\n",
        "  * [`.convert`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.convert) allows to set the color mode of the image (e.g. grayscale `L`, true color `RGB`, true color with transparency mask `RGBA`)\n",
        "  * [`.resize`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.resize) allows to shrink or expand an image to a specific size\n",
        "* [`numpy.asarray`](https://docs.scipy.org/doc/numpy-1.17.0/reference/generated/numpy.asarray.html#numpy.asarray) converts (among other things) [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html) objects to [`numpy.ndarray`](https://docs.scipy.org/doc/numpy-1.17.0/reference/generated/numpy.ndarray.html#numpy.ndarray) objects"
      ],
      "metadata": {
        "id": "1BVL18ThFAoM"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Initialize arrays\r\n",
        "train_images = np.empty((TRAINING_EXAMPLES, IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_DEPTH))\r\n",
        "valid_images = np.empty((VALIDATION_EXAMPLES, IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_DEPTH))\r\n",
        "train_labels = np.empty(TRAINING_EXAMPLES)\r\n",
        "valid_labels = np.empty(VALIDATION_EXAMPLES)"
      ],
      "outputs": [],
      "metadata": {
        "id": "8pS_zxYnxeqS"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now let's fill the array for every case.\n",
        "\n",
        "*Hint: [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate) accepts an optional second paramenter `start` that indicates which should be the starting value of the iteration counter*"
      ],
      "metadata": {
        "id": "KEpognKDIcyl"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Load training images"
      ],
      "metadata": {
        "id": "NfMv6R5xIz2q"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Load training set - Rembrandt class\r\n",
        "for i, path in enumerate(rembrandt_examples[:TRAINING_REMBRANDT_EXAMPLES]):\r\n",
        "    print(\r\n",
        "        \"\\rTraining Rembrandt - Loaded: {}/{}\"\r\n",
        "            .format(i+1, TRAINING_REMBRANDT_EXAMPLES),\r\n",
        "        end=\"\"\r\n",
        "    )\r\n",
        "    #print(os.path.join(IMAGES_FOLDER, path))\r\n",
        "\r\n",
        "    image = Image.open(os.path.join(IMAGES_FOLDER, path)).convert(\"RGB\")\r\n",
        "    image = image.resize((IMAGE_WIDTH, IMAGE_HEIGHT))\r\n",
        "    train_images[i] = np.asarray(image)\r\n",
        "    train_labels[i] = 0"
      ],
      "outputs": [],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "EM5zK8udIz2r",
        "outputId": "8ca52581-ae7e-409f-dceb-82b0667676d4"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Load training set - Pollock class\r\n",
        "for i, path in enumerate(pollock_examples[:TRAINING_POLLOCK_EXAMPLES], start=TRAINING_REMBRANDT_EXAMPLES):\r\n",
        "    print(\r\n",
        "        \"\\rTraining Pollock - Loaded: {}/{}\"\r\n",
        "            .format(i+1, TRAINING_EXAMPLES),\r\n",
        "        end=\"\"\r\n",
        "    )\r\n",
        "    #print(os.path.join(IMAGES_FOLDER, path))\r\n",
        "\r\n",
        "    image = Image.open(os.path.join(IMAGES_FOLDER, path)).convert(\"RGB\")\r\n",
        "    image = image.resize((IMAGE_WIDTH, IMAGE_HEIGHT))\r\n",
        "    train_images[i] = np.asarray(image)\r\n",
        "    train_labels[i] = 1"
      ],
      "outputs": [],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "4yY1XgsAIz2r",
        "outputId": "a8e6d047-78f5-47ae-daba-6bfe528308b2"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Load Validation images"
      ],
      "metadata": {
        "id": "chCXPtsaIz2r"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Load validation set - Rembrandt class\r\n",
        "for i, path in enumerate(rembrandt_examples[TRAINING_REMBRANDT_EXAMPLES:]):\r\n",
        "    print(\r\n",
        "        \"\\rValidation Rembrandt - Loaded: {}/{}\"\r\n",
        "            .format(i+1, VALIDATION_REMBRANDT_EXAMPLES),\r\n",
        "        end=\"\"\r\n",
        "    )\r\n",
        "    #print(os.path.join(IMAGES_FOLDER, path))\r\n",
        "\r\n",
        "    image = Image.open(os.path.join(IMAGES_FOLDER, path)).convert(\"RGB\")\r\n",
        "    image = image.resize((IMAGE_WIDTH, IMAGE_HEIGHT))\r\n",
        "    valid_images[i] = np.asarray(image)\r\n",
        "    valid_labels[i] = 0"
      ],
      "outputs": [],
      "metadata": {
        "id": "EeTT5NnnIiyk",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "18a05707-4177-45b0-c9f0-e1406fc2d2dc"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Load validation set - Pollock class\r\n",
        "for i, path in enumerate(pollock_examples[TRAINING_POLLOCK_EXAMPLES:], start=VALIDATION_REMBRANDT_EXAMPLES):\r\n",
        "    print(\r\n",
        "        \"\\rValidation Pollock - Loaded: {}/{}\"\r\n",
        "            .format(i+1, VALIDATION_EXAMPLES),\r\n",
        "        end=\"\"\r\n",
        "    )\r\n",
        "    #print(os.path.join(IMAGES_FOLDER, path))\r\n",
        "\r\n",
        "    image = Image.open(os.path.join(IMAGES_FOLDER, path)).convert(\"RGB\")\r\n",
        "    image = image.resize((IMAGE_WIDTH, IMAGE_HEIGHT))\r\n",
        "    valid_images[i] = np.asarray(image)\r\n",
        "    valid_labels[i] = 1"
      ],
      "outputs": [],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yhc42ujgIz2s",
        "outputId": "c76a6e29-2de0-429d-f0c1-f6ba90453788"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "To see if the data was loaded correctly, we can display some images from the training and test set (say 5 and 5) with their respective labels in a grid."
      ],
      "metadata": {
        "id": "qpv8ivQrOLO0"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Display examples from the datasets\r\n",
        "plt.figure(figsize=(20,8))\r\n",
        "for i in range(5):\r\n",
        "    plt.subplot(2,5,i+1)\r\n",
        "    image = train_images[i].astype(np.uint8)\r\n",
        "    plt.imshow(image)\r\n",
        "    plt.xlabel(LABELS[train_labels.astype(int)[i]])\r\n",
        "    plt.xticks([])\r\n",
        "    plt.yticks([])\r\n",
        "\r\n",
        "    plt.subplot(2,5,i+6)\r\n",
        "    image = valid_images[i].astype(np.uint8)\r\n",
        "    plt.imshow(image)\r\n",
        "    plt.xlabel(LABELS[valid_labels.astype(int)[i]])\r\n",
        "    plt.xticks([])\r\n",
        "    plt.yticks([])"
      ],
      "outputs": [],
      "metadata": {
        "id": "QAd3GTvzx_tC",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 455
        },
        "outputId": "9a8ef00c-879d-40a0-be86-a7e483bee615"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Since we loaded all Rembrandt paintings first and then the Pollock paintings, we only display the Rembrandt ones here.\n",
        "It is good practice to randomize the order of the examples before feeding the to the neural network.\n",
        "\n",
        "The [`numpy.random.shuffle`](https://docs.scipy.org/doc/numpy-1.17.0/reference/random/generated/numpy.random.mtrand.RandomState.shuffle.html#numpy.random.mtrand.RandomState.shuffle) function randomizes the order of a list-like object in-place. In cases like this where we want to shuffle two lists at the same time (we want to keep the link between images and labels) it is better to use [`numpy.random.permutation`](https://docs.scipy.org/doc/numpy-1.17.0/reference/random/generated/numpy.random.mtrand.RandomState.permutation.html#numpy.random.mtrand.RandomState.permutation):"
      ],
      "metadata": {
        "id": "mmhcmOzpNgaE"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "new_order = np.random.permutation(TRAINING_EXAMPLES)\r\n",
        "train_images = train_images[new_order]\r\n",
        "train_labels = train_labels[new_order]\r\n",
        "\r\n",
        "new_order = np.random.permutation(VALIDATION_EXAMPLES)\r\n",
        "valid_images = valid_images[new_order]\r\n",
        "valid_labels = valid_labels[new_order]"
      ],
      "outputs": [],
      "metadata": {
        "id": "X05SSdZ7NlSf"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "If we display the images again, they will be mixed up."
      ],
      "metadata": {
        "id": "q9NcB91TQrLC"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "plt.figure(figsize=(20,8))\r\n",
        "for i in range(5):\r\n",
        "    plt.subplot(2,5,i+1)\r\n",
        "    image = train_images[i].astype(np.uint8)\r\n",
        "    plt.imshow(image)\r\n",
        "    plt.xlabel(LABELS[train_labels.astype(int)[i]])\r\n",
        "    plt.xticks([])\r\n",
        "    plt.yticks([])\r\n",
        "\r\n",
        "    plt.subplot(2,5,i+6)\r\n",
        "    image = valid_images[i].astype(np.uint8)\r\n",
        "    plt.imshow(image)\r\n",
        "    plt.xlabel(LABELS[valid_labels.astype(int)[i]])\r\n",
        "    plt.xticks([])\r\n",
        "    plt.yticks([])"
      ],
      "outputs": [],
      "metadata": {
        "id": "duGO8IUHQu-s",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 455
        },
        "outputId": "ba779fc2-dc73-4b11-dfa8-904ca28630f9"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Build model**"
      ],
      "metadata": {
        "id": "x-NJT7UWP_Xl"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "At this point we can create the actual neural network.\n",
        "Remember that we want a simple convolutional neural network with the following layers:\n",
        "* 2D convolution\n",
        "  * 16 filters\n",
        "  * 3&times;3 kernel\n",
        "  * `same` padding\n",
        "  * hyperbolic tangent activation function\n",
        "* Max pooling\n",
        "    * 2&times;2 pool size\n",
        "* Rectified Linear Unit ([`keras.layers.ReLU`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU))\n",
        "* Repeat 2D convolution + Max pooling + ReLU\n",
        "* Fully connected\n",
        "  * 128 units\n",
        "  * hyperbolic tangent activation function\n",
        "* Output layer\n",
        "  * ? units\n",
        "  * ? activation function\n",
        "\n",
        "How many outputs should the network have and which activation function is most suitable?"
      ],
      "metadata": {
        "id": "UEZ5f3P8QITm"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Build the model [solution]\r\n",
        "def build_model(image_width, image_height, image_depth):\r\n",
        "    input_layer = tf.keras.layers.Input(shape=[image_width, image_height, image_depth])\r\n",
        "\r\n",
        "    layers = tf.keras.layers.Conv2D(filters=16,\r\n",
        "                                kernel_size=(3,3),\r\n",
        "                                padding=\"same\",\r\n",
        "                                activation=\"tanh\")(input_layer)\r\n",
        "\r\n",
        "    layers = tf.keras.layers.MaxPool2D(pool_size=(2,2))(layers)\r\n",
        "\r\n",
        "    layers = tf.keras.layers.ReLU()(layers)\r\n",
        "\r\n",
        "    layers = tf.keras.layers.Conv2D(filters=16,\r\n",
        "                                kernel_size=(3,3),\r\n",
        "                                padding=\"same\",\r\n",
        "                                activation=\"tanh\")(layers)\r\n",
        "\r\n",
        "    layers = tf.keras.layers.MaxPool2D(pool_size=(2,2))(layers)\r\n",
        "    layers = tf.keras.layers.ReLU()(layers)\r\n",
        "\r\n",
        "    layers = tf.keras.layers.Flatten()(layers)\r\n",
        "\r\n",
        "    layers = tf.keras.layers.Dense(128, activation=\"tanh\")(layers)\r\n",
        "\r\n",
        "    output_layer = tf.keras.layers.Dense(1, activation=\"sigmoid\")(layers)\r\n",
        "\r\n",
        "    model = tf.keras.Model(inputs=input_layer,\r\n",
        "                        outputs=output_layer,\r\n",
        "                        name=\"neuroart_model\"\r\n",
        "                        )\r\n",
        "    \r\n",
        "    return model\r\n",
        "\r\n",
        "# Build a new model\r\n",
        "model = build_model(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_DEPTH)"
      ],
      "outputs": [],
      "metadata": {
        "id": "8mzyNZExnVUx"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "model.summary()"
      ],
      "outputs": [],
      "metadata": {
        "id": "OLJd98cFpv1e",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "997fbe19-cd02-4404-9fd3-48e086d1200e"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "tf.keras.utils.plot_model(model, show_shapes=True)"
      ],
      "outputs": [],
      "metadata": {
        "id": "DkQV0glYpxTi",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "outputId": "8549160e-6149-4735-ae0d-ff3c8242ffbb"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Compile the model**"
      ],
      "metadata": {
        "id": "6cXt-7joRFQs"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Always remember to compile the model before training it.\n",
        "Since we are performing a binary classification, we use the `binary_crossentropy` loss function."
      ],
      "metadata": {
        "id": "SGSjK_xwRHuf"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "model.compile(\r\n",
        "    optimizer = tf.keras.optimizers.Adam(),\r\n",
        "    loss = 'binary_crossentropy',\r\n",
        "    metrics = ['acc']\r\n",
        ")"
      ],
      "outputs": [],
      "metadata": {
        "id": "7CDRR032RKw3"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Train the model**"
      ],
      "metadata": {
        "id": "jsPpI0oJRUnG"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Use the [`.fit`](https://keras.io/models/model/#fit) method of the model to train it.\n",
        "\n",
        "In addition to the training data, you can pass a pair of `(images, labels)` as the `validation_data` argument to the [`.fit`](https://keras.io/models/model/#fit) method in order to monitor the loss function and accuracy on validation data that is not used for training and this way detect if overfitting is occurring."
      ],
      "metadata": {
        "id": "-7HYE-xMReCP"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Train the model [solution]\r\n",
        "history = model.fit(train_images, \r\n",
        "                    train_labels, \r\n",
        "                    batch_size=256, \r\n",
        "                    epochs=30, \r\n",
        "                    validation_data=(valid_images, valid_labels))"
      ],
      "outputs": [],
      "metadata": {
        "id": "8lO66pqijkS6",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "d06611b3-fd5c-49fd-e438-f802dbc9a420"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "The `history` variable that we saved stores how the network evolved during training.\n",
        "\n",
        "We can, for example, see how the loss changed epoch by epoch:"
      ],
      "metadata": {
        "id": "j1VVufliTIDQ"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "plt.plot(history.history['loss'], label=\"training loss\")\r\n",
        "plt.plot(history.history['val_loss'], label=\"validation loss\")\r\n",
        "plt.legend()"
      ],
      "outputs": [],
      "metadata": {
        "id": "6rse7MTFkY3I",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 282
        },
        "outputId": "bd074f01-ad5f-4214-abc9-2098ee7918d6"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Do the same for the accuracy."
      ],
      "metadata": {
        "id": "fZZaIZpFTa6o"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Plot accuracy epoch by epoch\r\n",
        "plt.plot(history.history['acc'], label=\"training accuracy\")\r\n",
        "plt.plot(history.history['val_acc'], label=\"validation accuracy\")\r\n",
        "plt.legend()"
      ],
      "outputs": [],
      "metadata": {
        "id": "kDnTND8rt4p0",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 282
        },
        "outputId": "adbc4ab7-48de-45c1-b584-f76f45a94540"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Make predictions**"
      ],
      "metadata": {
        "id": "Dc4X-88xUHEp"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now that we have a working network, we can try to predict the label of some image."
      ],
      "metadata": {
        "id": "4QQUB0aAUNMb"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "images = np.concatenate([train_images[:5], valid_images[:5]])\r\n",
        "labels = np.concatenate([train_labels[:5], valid_labels[:5]])\r\n",
        "predictions = model.predict(images)"
      ],
      "outputs": [],
      "metadata": {
        "id": "UUIayYDQWBYZ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "And now we can show it."
      ],
      "metadata": {
        "id": "ULgq-xINWR96"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "plt.figure(figsize=(20,8))\r\n",
        "for i in range(10):\r\n",
        "    plt.subplot(2,5,i+1)\r\n",
        "    image = images[i].astype(np.uint8)\r\n",
        "    true  = int(labels[i])\r\n",
        "    pred  = int(np.round(predictions[i])) # Take closest integer\r\n",
        "    plt.imshow(image)\r\n",
        "    plt.xlabel(\r\n",
        "        \"True {}\\nPred. {}\".format(\r\n",
        "            LABELS[true],\r\n",
        "            LABELS[pred]\r\n",
        "        )\r\n",
        "    )\r\n",
        "    plt.xticks([])\r\n",
        "    plt.yticks([])"
      ],
      "outputs": [],
      "metadata": {
        "id": "2ya5VhSYWLN4",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 466
        },
        "outputId": "f4d5396d-f84e-4fba-b49e-54c33e8cfef2"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Advanced settings**"
      ],
      "metadata": {
        "id": "KNfuN9-IjSyo"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "\n",
        "In this section we will see how to:\n",
        "- use Tensorboard\n",
        "- save model checkpoints\n",
        "- restore a model previously saved"
      ],
      "metadata": {
        "id": "xzhnUBzt4mbh"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "from datetime import datetime"
      ],
      "outputs": [],
      "metadata": {
        "id": "vpFEnWjUG9cO"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Tensorboard\n",
        "This is a very useful tool to keep track of your loss and metrics at runtime. With respect to the history variable you don't need to wait the end of the training. \n",
        "\n",
        "For a detailed explanation of Tensorboard see this [link](https://www.tensorflow.org/tensorboard/get_started)"
      ],
      "metadata": {
        "id": "cz1CNDPz9-e5"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "TB_FOLDER =  os.path.join(NEUROART_FOLDER, 'tensorboard_logs')\r\n",
        "try:\r\n",
        "    os.mkdir(TB_FOLDER)\r\n",
        "except:\r\n",
        "    pass"
      ],
      "outputs": [],
      "metadata": {
        "id": "-LzMt7OV-Ifb"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "logdir = os.path.join(TB_FOLDER, datetime.now().strftime(\"%Y%m%d-%H%M%S\"))\r\n",
        "tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)"
      ],
      "outputs": [],
      "metadata": {
        "id": "JnL5JayF-UmL"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# run and copy the output\r\n",
        "TB_FOLDER"
      ],
      "outputs": [],
      "metadata": {
        "id": "dfy-4iz6_Ha4",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "outputId": "0b72e2a5-775f-43ac-db0f-67656b8db0f2"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Load the TensorBoard notebook extension\r\n",
        "%load_ext tensorboard\r\n",
        "%tensorboard --logdir '/content/drive/My Drive/workshop_neuroengineering/neuroart/tensorboard_logs' # paste here the output of the previews cell\r\n",
        "# !!!!!! pay attention to have your path between quotation marks 'path/to/logs_folder'"
      ],
      "outputs": [],
      "metadata": {
        "id": "nvGQ6K-s-djl"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Build a new model\r\n",
        "modelA = build_model(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_DEPTH)"
      ],
      "outputs": [],
      "metadata": {
        "id": "Nli_yefnd4eK"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "modelA.compile(\r\n",
        "    optimizer = tf.keras.optimizers.Adam(),\r\n",
        "    loss = 'binary_crossentropy',\r\n",
        "    metrics = ['acc']\r\n",
        ")"
      ],
      "outputs": [],
      "metadata": {
        "id": "XUmPYNns-kdS"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "historyA = modelA.fit(train_images, \r\n",
        "                    train_labels, \r\n",
        "                    batch_size=256, \r\n",
        "                    epochs=100,\r\n",
        "                    validation_data=(valid_images, valid_labels),\r\n",
        "                    callbacks=[tensorboard_callback])"
      ],
      "outputs": [],
      "metadata": {
        "id": "_b8FJAkB-s3i",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "3100d7d1-b833-420c-fe84-d1a4a9592a55"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Model checkpoint\n"
      ],
      "metadata": {
        "id": "8tmRGL2V-97q"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "CKPT_FOLDER =  os.path.join(NEUROART_FOLDER, 'model_checkpoints')\r\n",
        "try:\r\n",
        "    os.mkdir(CKPT_FOLDER)\r\n",
        "except:\r\n",
        "    pass"
      ],
      "outputs": [],
      "metadata": {
        "id": "wKxEzPVrL1NW"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Build a new model\r\n",
        "modelB = build_model(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_DEPTH)"
      ],
      "outputs": [],
      "metadata": {
        "id": "CIGE5ajEgE6H"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "modelB.compile(\r\n",
        "    optimizer = tf.keras.optimizers.Adam(),\r\n",
        "    loss = 'binary_crossentropy',\r\n",
        "    metrics = ['acc']\r\n",
        ")"
      ],
      "outputs": [],
      "metadata": {
        "id": "Pgyu5Ung_V1X"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "checkpointer_val_loss = tf.keras.callbacks.ModelCheckpoint(filepath=os.path.join(CKPT_FOLDER,\r\n",
        "                                                                                 \"best_min_val_loss.h5\"), \r\n",
        "                                                           monitor='val_loss',\r\n",
        "                                                           verbose=1,\r\n",
        "                                                           save_best_only=True,\r\n",
        "                                                           mode='min',         \r\n",
        "                                                           save_freq=\"epoch\")\r\n",
        "\r\n",
        "checkpointer_val_acc = tf.keras.callbacks.ModelCheckpoint(filepath=os.path.join(CKPT_FOLDER,\r\n",
        "                                                                                \"best_max_val_acc.h5\"),\r\n",
        "                                                          monitor='val_acc',\r\n",
        "                                                          verbose=1,\r\n",
        "                                                          save_best_only=True,\r\n",
        "                                                          mode='max',\r\n",
        "                                                          save_freq=\"epoch\")"
      ],
      "outputs": [],
      "metadata": {
        "id": "pKYUkRZ0_Dra"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "historyB = modelB.fit(train_images,\r\n",
        "                      train_labels,\r\n",
        "                      batch_size=256,\r\n",
        "                      epochs=50,\r\n",
        "                      validation_data=(valid_images, valid_labels),\r\n",
        "                      callbacks=[\r\n",
        "                                 checkpointer_val_loss,\r\n",
        "                                 checkpointer_val_acc\r\n",
        "                                 ]\r\n",
        "                    )"
      ],
      "outputs": [],
      "metadata": {
        "id": "YgiRyT90_Kzl",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "faa66e9d-82e4-4f68-cea6-30c87acda46b"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Restore a previously saved model\n",
        "It can be used to rebuild the same model in a second moment."
      ],
      "metadata": {
        "id": "bbL-EnocGdBf"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "# Load the model that reached the highest accuracy\r\n",
        "modelPath = os.path.join(CKPT_FOLDER, \"best_max_val_acc.h5\")\r\n",
        "print(modelPath)\r\n",
        "\r\n",
        "# if you want just to make prediction\r\n",
        "reconstructed_model = tf.keras.models.load_model(modelPath, compile=False)\r\n",
        "\r\n",
        "#reconstructed_model = build_model(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_DEPTH)\r\n"
      ],
      "outputs": [],
      "metadata": {
        "id": "nsoMpKXZGjCV",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "07458124-ae62-4143-ddf8-8bef66e3d4a7"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "predictions = reconstructed_model.predict(images)"
      ],
      "outputs": [],
      "metadata": {
        "id": "op5o1Fg8IZlB"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [
        "plt.figure(figsize=(20,8))\r\n",
        "for i in range(10):\r\n",
        "    plt.subplot(2,5,i+1)\r\n",
        "    image = images[i].astype(np.uint8)\r\n",
        "    true  = int(labels[i])\r\n",
        "    pred  = int(np.round(predictions[i])) # Take closest integer\r\n",
        "    plt.imshow(image)\r\n",
        "    plt.xlabel(\r\n",
        "        \"True {}\\nPred. {}\".format(\r\n",
        "            LABELS[true],\r\n",
        "            LABELS[pred]\r\n",
        "        )\r\n",
        "    )\r\n",
        "    plt.xticks([])\r\n",
        "    plt.yticks([])"
      ],
      "outputs": [],
      "metadata": {
        "id": "WbSLOq2iIkWG",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 466
        },
        "outputId": "4376ad66-1a94-4cd6-bccc-02d894ac7b0e"
      }
    }
  ]
}