# Multi-Input Models

A multimodal or multiple input model is a neural network which has two or more independent inputs which are processed by different neural layers. At some point, the outputs of these different layers are combined into a single tensor using a `keras` **merge** operation such as `keras.layers.add`, `keras.layers.concatenate`, and so on.

# Example 1 - Question Answering Model
- Inputs are 
    - A text snippet such as a paragraph or passage containing some information.
    - A question posed as a sentence.
- These two inputs are independent, but their outputs must be combined. 
- In the simplest formulation of this problem, the output is a one-word answer obtained via a softmax layer over some predefined vocabulary.

In [2]:
from tensorflow.keras.models import Model
from tensorflow.keras import layers
from tensorflow.keras import Input

## Defining Constants for the QA Problem

In [3]:
# How many unique words in the information snippet?
text_vocabulary_size = 10000    

# How many unique words in the question?
question_vocabulary_size = 10000

# How many unique possible one-word answers?
answer_vocabulary_size = 500

## Text Input  

The text input will be a variable-length sequence of integers, where each integer encodes one of the unique words in the `text_vocabulary_size` bag of words available. 

### Steps Involved
1. Instantiate a variable-length `text_input` vector which will be composed on integers. Each integer represents on the unique words that can be used to make the information snippet.
2. Convert the variable-length `text_input` tensor to a 64-dimensional `embedded_text` tensor using an `Embedding` layer.
3. Encode vectors in a single vector using `LSTM`.

In [4]:
# Instantiating variable-length tensor of integers called `text`
text_input = Input(shape=(None, ), dtype='int32', name='text')

In [5]:
# Converting variable-length question tensors to a fixed, 64-dimensional vector embedding
# This is done by passing the `text_input` tensor as an argument to an `Embedding` layer
embedded_text = layers.Embedding(64, text_vocabulary_size)(text_input)

In [6]:
# Encode the vectors in a single vector via an LSTM
encoded_text = layers.LSTM(32)(embedded_text)

## Question Input 

We will follow the same steps for converting a variable-length numeric representation of a question (`question_tensor`) into a 32-dimensional vector through an `Embedding` layer, and then encoding all vectors using an `LSTM`.

In [7]:
# Instantiating variable-length question tensor
question_input = Input(shape=(None, ), dtype='int32', name='question')

In [8]:
# Transform to 32-dimensional vector embedding
embedded_question = layers.Embedding(32, question_vocabulary_size)(question_input)

In [9]:
# Encoding
encoded_question = layers.LSTM(16)(embedded_question)

## Concatenating the Encoded Vectors

In [10]:
concatenated = layers.concatenate([encoded_text, encoded_question], 
                                 axis=1)

## Answer

The answer will be the result of passing the concatented text and question embeddings to a `Softmax` layer which will predict the probability that that answer to the `question` posed, given the `text`, belongs to one of `answer_vocabulary_size` different words. 

In [11]:
answer = layers.Dense(units=answer_vocabulary_size, activation='softmax')(concatenated)

## Building a Model
We have transformed a `question_input` and `text_input` tensor into an `answer` tensor by passing them through separate combinations of layers an concatenating their results before passing them to a `concatenate` layer that combines them for prediction over a `Softmax` layer. 

`keras` will now be able to build a model that links these inputs to our desired output.

In [12]:
model = Model([text_input, question_input],    # Input tensors - 2
              answer)                          # Output tensor - 1

## Model Summary

In [13]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
text (InputLayer)               [(None, None)]       0                                            
__________________________________________________________________________________________________
question (InputLayer)           [(None, None)]       0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, None, 10000)  640000      text[0][0]                       
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 10000)  320000      question[0][0]                   
______________________________________________________________________________________________

## Compile Model

In [14]:
model.compile(optimizer='rmsprop', 
             loss='categorical_crossentropy', 
             metrics=['acc'])

## Training Model

Can either feed the model a list of `numpy` arrays or a dictionary which maps input names - defined with the `name` argument when instantiating `Input`s - to specific `numpy` arrays. 

### Creating Dummy Data

In [15]:
import numpy as np

In [16]:
num_samples = 1000
max_length = 100

In [17]:
# Create text vectors - `num_samples` vectors, each of size `max_length` i.e the 
# largest allowable length of words in a `text` input, where each value in the vector
# is an integer between 1 and `text_vocabulary_size`
text = np.random.randint(1, text_vocabulary_size, 
                        size=(num_samples, max_length))

In [18]:
# Do the same for questions
questions = np.random.randint(1, question_vocabulary_size, 
                             size=(num_samples, max_length))

In [19]:
# Do the same for answers - answers are one-hot encoded, not integers
answers = np.random.randint(0, 1, size=(num_samples, answer_vocabulary_size))

### Fitting to Training Data

In [None]:
# Using a list of numpy tensors as input
model.fit([text, question], answers, epochs=10, batch_size=128)

In [None]:
# Using a named dictionary that maps a numpy array to the appropriate `Input` tensors
model.fit({'text': text, 'question': question}, answers, 
         epochs=10, batch_size=128)