# Ideas of R-Net

[R-NET: Machine Reading Comprehension with Self-matching Networks](https://www.microsoft.com/en-us/research/publication/mrc/#!related_info)



# Motivation

### What problem are we facing?  
The whole pipeline of our system would be:  
> documents -> information retrival -> rerank and get top(10) documents -> **ANSWER EXTRACTION (for AMM part1 at least)** -> return  
  
E.g.  
* **Question**: What are the three operation modes for engine bleed air system?  
* **Answer**: Digital (Primary mode), Analog (Backup mode) and Pneumatic  
* **From document**:  
    36-11-00 - General Description  
    The engine air supply system has three levels of control:  
    • Digital (primary mode)  
    • Analog (backup mode)  
    • Pneumatic.  
    The ASCPC supplies the primary and backup modes. The high pressure fan air controller (HPFAC) and the pressure regulating and shutoff valve controller (PRSOVC) are set to let the engine air supply system operate
    without ASCPC control. This is the pneumatic mode.  
    For usual operation, all functions that have a relation to the primary mode and some functions for the backup mode operate at the same time. The primary mode and backup modes work together to supply the most efficient control for the engine air supply system.  
    If the primary mode fails, all functions for the backup and the pneumatic modes operate together. The backup and the pneumatic modes give a limited amount of control, protection, and indications for the engine air supply system.  
    If the primary and backup modes fail, the pneumatic mode sets the engine air supply system to the default condition. In the default condition, the engine air supply system supplies air in the pneumatic mode with no protection or indications.  
    The primary mode is described below. See engine air supply functional description section for more information.

### What is Reading Comprehension (type of question answering)?
Given the content of a passage, find/choose the best answer to each question, which is related to the given passage, instead of asking a general question without context.

### What is R-Net and why R-Net?
> R-Net is an end-to-end neural networks model for reading comprehension style question answering, which aims to answer questions from a given passage.  

1. problem similarity: after retrival and reranking documents from our search engine, we need to generate the target answer from the document(s) based on the question from the user.
+ performance: one of the top in models of reading comprehension on both SQuAD and MSMARCO datasets.

### cons:
1. lack domain-related datasets -> so we use pseudo questions generated by both dependency parsing method (based on language model and handcrafted rules) and deep learning approach (to specify, a feature-rich seq2seq model with attention) as our training datasets
+ 

# Approach
1. BiRNN 
+ Gated Attention
+ Self-matching Mechanism
+ Pointer Network

![structure](https://user-images.githubusercontent.com/16559097/28744062-fbd2b67e-7476-11e7-8640-5f0e1491bc62.jpg)



# Prior Knowledge
## RNN ( Recurrent Neural Networks )
RNNs is a special kind of neural network, usually focus on analyze sequential (or temporal) data. While standard feedforward neural networks (multi-layer perceptron) do not have a concept of memory, RNNs incorporate the past context into current input. Thus, the output at any time step t is a function of the past context and the current input.
![a typical RNN](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Recurrent_neural_network_unfold.svg/1024px-Recurrent_neural_network_unfold.svg.png)

## BiRNN (Bidirectional RNN)
BiRNN is a special type of RNN. While standard RNNs remember the context by "remembering" historical/past data, BiRNNs traverse in a reverse direction as well, to understand/remember the context from the "future".
![structure of BiRNN](https://cdn-images-1.medium.com/max/800/1*6QnPUSv_t9BY9Fv8_aLb-Q.png)

**Note:**  
Theoretically RNNs can remember any length of history (it cannot in practice, but LSTM/GRU RNNs can), they are usually much better at incorporating short-term context rather than long-term information ( > 20~30 steps apart ).

**Note:**  
R-Net mainly utilizes RNNs, more specifically, GRU(Gated Recurrent Units) to simulate the action of "reading" a passage of text.

## Attention
**Attention** in Neural Networks is modeled after the way humans focus on a particular subset of their sensory input, and tune out the rest.

![structure of one Attention Model](http://fr.opennmt.net/OpenNMT/img/global-attention-model.png)

It is employed in applications where we have a collection of data points, all of which may not be pertinent to the task at hand. In such cases, attention is computed as a softmax-weighted average of all points in the collection. The weight itself is computed as some non-linear function of
1. vector-set
+ some context

In the following example, under the context "frisbee", the network will focus on the actual frisbee and objects dealing with it, and tune out the rest. Reference: [link](https://codeburst.io/understanding-r-net-microsofts-superhuman-reading-ai-23ff7ededd96)
![frisbee attention](https://cdn-images-1.medium.com/max/800/1*FYo13y5OZrRk8dMLEg_1dw.png)

**Note**  
R-Net utilizes Attention to highlight some part of the text, under the context of another.

# Intuition of the R-Net
R-Net performs reading comprehension in a way similar to how we do:  
By "reading" (applying RNNs) the text multiple times (3 times exactly), and "fine-tuning" (using Attention) the vectorial representations of the terms better and better in each iteration.  
  
The following is each pass of "reading" individually.

## First Reading: Cursory Glance -> Question & Passage Encoder

1. Convert all words (in both questions and passages) to respective **word-level** and **character-level** embeddings
- Use **BiRNN** to produce new **context aware word representation** of all words (uQ and uP).

>Note:  
We start off with standard token (word, term of character) vectors, using word embeddings from [GloVe](https://nlp.stanford.edu/projects/glove/). However, humans usually understand the exact meaning of a word in the context of the terms surrounding it.  
Consider the example:  
"May happen" and "the fourth of May", where the meaning of "May" depends on the surrounding terms "happen" and "the fourth of". Also note that background could come from the *forward* or *backward* direction. So, we use a **BiRNN** over standard word embeddings, to come up with better vectors (word representations).

## Second Reading: Question-based Analysis -> Gated Attention-based Recurrent Networks for Question-Passage Matching
1. Use Attention Layer (from Match-LSTM) to get an attention-pooling vector of the whole question.
+ Use gated attention-based RNN to get **question aware word representation** (vP).

>Note:  
In the second pass, the network tunes word representations from the passage in the context of the question itself.  
>>E.g. we have the highlighted location in the passage:  
*“…had a talent for **making** home craft tools, mechanical appliances, and the ability to memorize Serbian epic poems. Đuka had never received a formal education…” *  

>Given **"making"**, if we were to apply **Attention** over the qeustion-tokens, we would probably highlight:  
*“What were Tesla’s mother’s special **abilities**?”  *
The network adjusts the vector for **"making"** to get it closer to **"abilities"** in a semantic sense.  

R-Net is forming links between the needs of the question, and relevant parts of the passage. It's called **"Gated Attention-based RNNs"**.

## Third Reading: Self-aware, Complete Passage Understanding -> Self-Matching Attention

>Note:  
1. In the 1st pass, we understood tokens in the context of their nearby surrounding terms.  
+ In the 2nd pass, we improved our understanding with respect to the question at hand. (**question aware word representation** is poor at catching long-term dependency)

1. **[KEY in R-Net]** Use Attention( to specify, self-matching attention ) to compare **far-off** terms in the same passage.

>Note:  
To pinpoint those sections that actually help in answering the question, we have to have a long-term contextual view of the entire passage instead of short-term one surrounding terms.  
>>E.g. we have:
Tesla’s mother, Đuka Tesla (née Mandić), whose father was also an Orthodox priest,:10 had a talent for making home craft tools, mechanical appliances, and the **ability** to memorize Serbian epic poems. Đuka had never received a formal education. Nikola credited his eidetic memory and creative **abilities** to his mother’s genetics and influence.  
1. Both terms refer to abilities possessed by Tesla's mother.
+ The former occurs around text that describes the said abilities - what we want.
+ the second term links them to Tesla's talents. - what we don't want  

>While applying (usual) Attention, we usually use some data (like a passage term) to weigh a set of vectors (like the question terms). With Self-Mtching Attention, we are using the **current passage term to weigh tokens from the passage itself**, helping us differentiate the current term from similar-meaning terms in the rest of the passage. **BiRNN** is used in this phrase of reading to enforce the effect.

## Last: Output Layer for Answer Prediction -> Pointer Networks
1. The last layer using **Pointer Networks-like** is connected to predict start postion and end postion.
+ How we get the **starting index**: first, compute an Attention vector over the question text; using this starting context to compute each term weights in the passage; the term that gets the highest weight is considered the starting point of the answer.
+ How we get the **ending index**: the above steps returns a "starting point" context as well; we compute each term weights again in the passage using "starting point" context instead of question context; the highest-weight term is considered the ending point of the answer.


# Conclusion
1. R-Net uses both **WORD** and **CHAR** embeddings for question and passage encodings. 
- Then it applies **two different gated-attention RNNs** to learning a representation, which seems like that it reads the passage with questions and gets a first understanding of the passage (question aware representation), using question-passage attention, and then it reads the passage again with this "understanding", using passage-passage attention to update the understanding of each parts in the passage. 
- Finally it connects what it has learned to the **Pointer Networks** to predict the start and end position of the answer.