# Machine learning models for Name-Entity-Recgnition
Summary of the paper [https://arxiv.org/pdf/1812.09449.pdf]

The goal is to summarize machine learning algorithms used so far to tackle N.E.R tasks. The corresponding models vary from earlier ones to state-of-art models such as Generative adversarial Network, Transformers, Reinforcement learning etc...  . Then, we propose an implementation of the models and compare their performance on a N.E.R dataset taken from "kaggle". 

The idea behind this undeavor is to go in depth on the architecture of N.E.R models, so that we will be able to make educative decision on to what model to choose given a particular dataset. 

### What is Name Entity Recognition? 

Name-Entity-Recognition(N.E.R) consists in associating entities(semantic types) to words in sentence. The entities could be: person, organisation, location, etc...  N.E.R plays an essential role in a variety of natural language processing applications such as information retrieval, automatic text summarization, question answering, machine translation, knowledge base contruction etc... 

### N.E.R techniques 

Name entities are generally classfied in two categories : `generic NEs` (eg: person and location) and `domain-specific NEs` (e.g: protein, enzymes, and genes). The focus here will be on generic english based NEs.

As to the `techniques` applied in NER, there are four main streams: 

- Rule-based approaches : Which do not need annotated data as they rely on hand-crafted rules;

- Unsupervised learning approches : which rely on unsupervised learning approches without hand-labeled training samples;

- Feature-based supervised learning approches: which rely on supervised learning algorithms ;

- Deep-learning based approaches: which automatically discover representations needed for the classification.

### The Structure of the Paper. 

- 1. Background:  definition , resources , evaluation metrics, and traditional approaches of NER;

- 2. Deep learning techniques for NER; 

- 3. Summarizes recent applied deep learning techniques; 

- 4. Implementation of the models;

- 5. Challenges and misconceptions. 


### 1 - Background

Before delving into the how deep learning is used in NER field, we first explain the NER concept. We then introduce the widely used `NER datasets and tools`. Next, we detail the evaluation metrics and summarize the traditional approaches to NER .  

An illustration of the `Named-entity-Recognition task`:



<img src="images/N.E.R" height=40, width=400> 

#### 1.1 N.E.R 
See above for more detail . 

#### 1.2 NER Resources :Datasets and tools 

Here is presented the widely used datasets and off-the-shelf tools for English NER.

In table1 are presented some of the annotated corpus used for training NER models.  

<img src="images/corpus" height=50, width=600> 


In table2 are presented some of the tools used to annotate corpus . 

<img src="images/tools" height=50, width=300> 

#### 1.3 NER Evaluation Metrics 

NER models are usually evaluated by comparing their outputs against human annotations. The comparison can quantify either as `exact-match` or `relaxed match`. 

##### 1.3.1-Exact-match Evaluation

NER involves identifying both entity boundaries and entity types. In `Exact math Evaluation`, a name entity is considered to be correctly recognized if both its boundary and type match ground truth. The performance metrics `Precision, Recall , F-score` are computed on the number of `True positives(TP), False Positive(FP), False Negatives (FN)`. 

- `TP` : entities recognised by the NER and match the ground truth.  
- `FP` : entities recgonised by the NER but not match the ground truth 
- `FN` : entities not recgonised by the NER but annotated in the ground truth .  


**Precision** = Among the outputs recognized as entities what is the proportion matching the ground truth. It measures the ability to present correct entities  = $\frac{TP}{TP + FP}$

**Recall** = measures the ability to recognize all entities in the corpus $\frac{TP}{TP+FN}$

**F-score** = harmonic mean between Recall and Precision= $2\times{\frac{Precision\times Recall}{Precision + Recall}}$


##### 1.3.2- Relaxed-match Evaluation 

A well known definition of `Relaxed-match evaluation` is that it consideres an entity as true if its type matches with that of the ground truth regarless of the boundaries as long as the latter overlaps with the ground truth boundaries. 


#### 1-4 Traditional Approaches to NER 

Traditional approaches to NER include : `rule-based approaches`, `unsupervised learning approches`, and `feature-based supervised learning approaches`. 

- **Rule-based approaches** : 
They are mainly hand-crafted based approaches. Some examples include: LaSIE, NetOwl, Facile, SAR, FASTUS, and LTG systems. The Rule based systems work very well when lexicon is exhaustive. With incomplete dictionaries, it leads to high precision and low recall . 

- **Unsupervised Learning Approaches** : 
A typical approach of unsupervised learning is clustering-based NER. The key idea is that `lexical patterns, lexical ressources and statistics` computed on a large corpus can be used to infer mentions of named entities. 


- **Feature-based Supervised Learning Approaches**: 
Given annotated data sample, features are designed to represent each training example.  Machine learing algorithms are then developed to recognize similar pattern from unseen data. 

### 2-Deep learning techniques for NER

#### 2-1 Why Deep learning for NER? 
Compared to feature based approaches, deep learning is beneficial in discovering hidden features automatically.Its key advantage is its capacity of representation learning and semantic composition through both vector representation and neural processing. There are three core strengths of applying deep learning techniques to NER. `First`, NER benefits from the `non-linear transformation`, which generates non-linear mapping from inputs to outputs. Compared to liner models (e.g: log-linear HMM and linear chain CRF), deep learning models are able to learn complexe and intricate features from data via non-linear activation functions. `Second`, Deeplearning saves significant efforts on designing NER features, contrary to feature based models which requiere considerable amount of engineering skills and domaine expertise. `Third`, thanks to gradient descent, complexe deep learning NER based models can be built. 

**The Taxonomy of DL-based NER**

<img src="Images/taxonomy.png"  width=300, height=40 >

- `Distributed representation for input`: 
The distributed representation for input consider word- and character-level embeddings as well as incorporatation of additional features like POS tag and "gazeteer" that have been effective in feature-based approaches.

- `Context encoder`:
Context encoder is used to capture the context dependencies between tokens of the input sequence using CNN, RNN, etc...  

- `tag decoder `:
Tag decoder predicts tag of tokens. 

#### 2-2 Distributed Representations for Input 

Contrary to one-hot-encoding representation of words which represents words by sparse vectors , the distributed representations is a low dimentional, real-value, dense representation of words.  Ditributed representation capture from the text , the semantic and the syntatic properties of words. There are `three types of distributed representations` used in NER tasks: `word level`, `character level`, `hybrid representation`. 

###### `2.2.1 Word-level Representation` 




###### `2.2.2 Character-level Representation` 



###### `2.2.3 hybrid Representation`



#### 2-3 Context Encoder Architectures 

###### `2.3.1 Convolutional Neural Networks `

###### `2.3.2 Recurrent Neural Networks `

###### `2.3.3 Recursive Neural Networks `

###### `2.3.4 Neural Language Models `

###### 2.3.5 Deep Transformer 

#### 2.4 Tag Decoder Architectures 

###### `2.4.1 Multi-layer Perceptron + softmax `

###### `2.4.2 Conditional Random Fields `

###### `2.4.3 Recurrent Neural Networks` 

###### `2.4.4 Pointer Networks `

#### 2.5 Summary of DL-based NER 

### 3-Recent Deep -Learning Models for NER 

##### `3-1 Deep Multi-task learning for NER `

##### `3-1 Deep Transfer learning for NER `

##### `3-2 Deep Active learning for NER` 

##### `3.3 Deep Reinforcement learning for NER `

##### `3.4 Deep Adversarial learning for NER `

##### `3.5 Neural Attention for NER` 