<a href="https://colab.research.google.com/github/dimitree54/metalearning/blob/master/metalearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Metalearning

---
## Introduction

This document contains research journal and code for metalearning project. The main idea is to replace classical gradient optimizers with some neural-network optimizer (teacher network) to achieve some of this advantages:


*   Increase optimizers quality (in speed or final quality)
*   Remove some classic optimizer boundaries such as requirement of loss differentiability or backpropogation issues (as vanishing or exploading gradients)
*   Application optimization network to itself recoursively to research some unexpected results of such deep self optimization


For the education purposes all the code will be written on tesorflow 2.0 and will support all available computation units: CPU, GPU and TPU.

---
## Definitions and abbreviations

*   Teacher-network or TN - neural network to replace classical gradient optimizers. 
*   Student-network or SN - neural network which will be trained by teacher-network


---

## Setting up environment
Connecting to the Google Drive and move to working directory to get access to custom script files. Installing packages (needed for current hardware) and importing mailn packages.

In [0]:
import sys
import os
from random import randint

# setting up working directory
#from google.colab import drive
#drive.mount('/content/drive/')
#%cd /content/drive/My\ Drive/Colab\ Notebooks/metalearning

# getting device type
if 'COLAB_TPU_ADDR' in os.environ:
    device = "TPU"
    tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
elif ('COLAB_GPU' in os.environ and os.environ['COLAB_GPU'] == '1'):
    device = "GPU"  # TODO support local gpu
else:
    device = "CPU"
print("Working with {} device".format(device))

# installing correct version of tensorflow 2.0
if device == "GPU":
    !pip install -q tensorflow-gpu==2.0.0-beta1
else:
    !pip install -q tensorflow==2.0.0-beta1
    
import tensorflow as tf

Working with CPU device
[K     |████████████████████████████████| 87.9MB 273kB/s 
[K     |████████████████████████████████| 501kB 42.1MB/s 
[K     |████████████████████████████████| 3.1MB 37.3MB/s 
[?25h

---
## Data representation
Here we will describe data pipeline for both teacher and student neural networks. The purpose of the research is to build unviersal optimizer (teacher-network) supporting different learning tasks and different student-network architectures. So different training tasks and student-network architectures will be considered for teacher-network training and testing. On the other hand, for research purposses, complexity of considering tasks and architectures should be limited.

As tasks we will use some classical problems available in module tf.keras.datasets (without NLP datasets):
*   MNIST
*   CIFAR 10
*   CIFAR 100
*   Fashion MNIST

UPDATE: regression removed
Despite most of datasets are image-oriented we will use fully conncted layers because of its simplisty. We will generate random SN architectures and train them in parallel by TN and gradient descent for results comparison. Activation function of all hidden layers will be sigmoid, for last layer activation function fill be task-dependent: softmax for classificaion and

The input of the training pairs will be feed forward to SN as is. The TN will take as input state of SN and training pair. Which exactly information about SN will be feed to TN is important question which will be considered below.

To make TN applicable for every SN network architecture, input of TN should not be dependent of SN dimensions (such as hidden layer sizes). To achieve such universality we consider all SN weights independently. I.e. TN will take as input local weight information (weight value and input value) and some global information (loss...) and will output new weight value.

We will consider as TN inputs folowing candidates in different combinations:
*   Weight input and maybe its encoded history
*   Weight value and maybe its encoded history
*   Encoded information about other neurons in this layer (inputs and values)
*   Encoded information about neuron type (for example its activation function)
*   Loss value and maybe its encoded history

In [0]:
# preparing data
# for SN
def get_boston_housing_data():
    pass

def get_mnist_data():
    pass

def get_cifar10_data():
    pass

def get_cifar100_data():
    pass

def get_fashion_mnist_data():
    pass

# for TN
class DataPool:
    def __init__(self):
        pass
    
    def add(self):
        pass

    def get(self):
        pass

To simplify the TN optimization, SN will folow a lot of restrictions:
* only fully connected layers
* only sigmoid activation function
* bias will be emulated constant on layer input

In [0]:
# creating networks
MAX_LAYER_SIZE = 128
MAX_NUM_LAYERS = 10

# net description is a list of num_units in hidden layers
def get_tn_description():
    return [128, 128, 128]

def get_random_net_description():
    net_description = []
    num_layers = random.randint(1, MAX_NUM_LAYERS)
    for _ in num_layers:
        net_description.append(random.randint(1, MAX_LAYER_SIZE))
    return net_description

def fc_layer(inputs, layer_size, weights=None):
    """
    inputs shape is [bs, n]
    weights shape is [n + 1, m]
    """
    bs, n = inputs.shape
    if weights is None:
        weights = tf.Variable(
            initial_value=tf.initializers.GlorotNormal()(shape=[n + 1, layer_size]),
            trainable-True,
            dtype=tf.float32
        )
    else:
        w_n, w_size = weights.shape
        assert wn == n and w_size == layer_size
    # appending 1 to input to emulate bias:
    biased_inputs = tf.pad(inputs,[[0,0],[0,1]],constant_values=1)
    outputs = tf.matmul(inputs, weights)
    return tf.sigmoid(outputs)

def get_sn(inputs, net_description, weights_set=None):
    net = inputs
    for weights in weights_set:
        net = fc_layer(net, weights)
    return net

def tn(local_state, global_state):
    net = tn_body(local_state, global_state)
    previous_layer_size = inputs_size
    for layer_size in net_description:
        tn_head = 



def create_sn(output_size: int, layer_sizes: list):
    """
    Create random student network.
    :param output_size: desired number of output units
    :param regression: boolean flag to make model for regression instead of
     classification
    :max_layer_size: max neurons in hidden layer
    :max_layers: max density of network
    """
    model = tf.keras.Sequential()
    for i in range(randint(0, max_layers)):
        model.add(layers.Dense(randint(1, max_layer_size), activation='sigmoid'))
    model.add(layers.Dense(output_size, 
                           activation='linear' if regression else 'sigmoid'))
    return

def encode_local_state():
    pass

def encode_global_state():
    pass

def create_tn():
    pass