# Named Entity Recognition

<img src = "http://174.138.108.232/wp-content/uploads/2017/08/entities.png">

### What's that?
<hr style="height:2px;border:none;color:#333;background-color:#333;" />

<p> <font style="FONT-SIZE:15px; COLOR:#000000; LINE-HEIGHT:25px; FONT-FAMILY:Arial,Helvetica,sans-serif">
Named Entity Recognition (NER) is the process of extracting relevant terms (or words) from text- sentences, paragraphs, or articles. Typically in NER, one scans through all the words in a text, tagging each with the most relevant tag from a given dictionary of tags. If a person does this manually, it is very difficult to keep track of the number of times a particular tag has been assigned, to remember the portion of the text where a particular tag appears the most number of times, let alone assess similarity of  texts based on the frequency of these tags. For this reason, NER is done on a large scale using pre-trained algorithms. <br> <br>
    &emsp;&emsp; For example, consider the foloowing sentence:<br><br>
<mark>The European Union, with U.S. backing,has threatened to refer Iran to the U.N. Security Council, which could impose sanctions if it finds Tehran has violated the Nuclear Non-Proliferation treaty.</mark> <br><br>
    
When this sentence is run through an NER algorithm, the most likely output is the following: <br> <br>
        Organization: <mark>European Union</mark>, <mark>U.N. Security Council</mark><br>
        Geopolitical Entity: <mark>U.S.</mark>, <mark>Iran</mark>,<mark>Tehran</mark> <br>
        Artifact: <mark>Nuclear Non-Proliferation treaty</mark>
<br><br>
<a href="https://en.wikipedia.org/wiki/Named-entity_recognition">This</a> wikipedia write-up on NER sums up all the basics of NER as well as some application areas pretty well.

<hr style="height:4px;border:none;color:#333;background-color:#333;" />

### Where is it used?
<hr style="height:2px;border:none;color:#333;background-color:#333;" />

<p> <font style="FONT-SIZE:15px; COLOR:#000000; LINE-HEIGHT:25px; FONT-FAMILY:Arial,Helvetica,sans-serif">
Some applications of NER as listed in <a href="https://towardsdatascience.com/named-entity-recognition-applications-and-use-cases-acdbf57d595e"> this article</a> are: <br><br>
    1. Classifying content for news providers <br>
    2. Efficient Search Algorithms<br>
    3. Powering Content Recommendations<br>
    4. Customer support<br>
    5. Research Papers. <br> <br>
    
   There are many more applications. NER can be used very effectively wherever information has to be selectively extracted from text.
   <hr style="height:4px;border:none;color:#333;background-color:#333;" />

### How does it work?
<hr style="height:2px;border:none;color:#333;background-color:#333;" />

<p> <font style="FONT-SIZE:15px; COLOR:#000000; LINE-HEIGHT:25px; FONT-FAMILY:Arial,Helvetica,sans-serif">
Basically, the algorithms that power an NER product are classifiers. Classifiers are the algorithms that use statistical or machine learning tools to come up with scores (usually probability) for each tag from a pre-specified pool of tags. A word is assigned a tag with the maximum score (or probability).
<hr style="height:4px;border:none;color:#333;background-color:#333;" />

# About this work
<hr style="height:2px;border:none;color:#333;background-color:#333;" />

<p> <font style="FONT-SIZE:15px; COLOR:#000000; LINE-HEIGHT:25px; FONT-FAMILY:Arial,Helvetica,sans-serif">
    In this work, a huge dataset of about <b>50,000</b> sentences has been analyzed for making a Named Entity Recognition product. This length is about 10x as much as the length of a mid-size book (<a href="https://www.quora.com/How-many-sentences-are-in-a-mid-length-book">ref</a>). The final aim is to assign tags to each and every word. The dataset has been obtained from <a href="https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus">this</a> Kaggle database. We selectively use the first column, which gives the individual words and the third column, which gives the corresponding tags for our feature engineering experiments.<br><br>
    
Even though many deep learning models and standard libraries like <a href="https://spacy.io/">Spacy</a> are availabe, we have primarily chosen a  <a href="https://en.wikipedia.org/wiki/Logistic_regression"><b>Logistic Regression</b></a> classifier because our primary intention is to study how a small change in the selection of features affects classification accuracy. All the codes and metric files can be found in <a href="https://github.com/dkedar7/Named_Entity_Recognition">this</a> Github repository. All codes are written in Python using the <a href="http://www.numpy.org/">Numpy</a> module.

We are going to consider the following tags:<br>
<table style="widtd:100%" align="left">
  <tr>
    <th>Tag</th>
    <th>Description</th> 
  </tr>
    
  <tr>
    <td>art</td>
    <td>Artifact</td> 
  </tr>

  <tr>
    <td>eve</td>
    <td>Event</td> 
  </tr>
  
  <tr>
    <td>geo</td>
    <td>Geographical Entity</td> 
  </tr>
  
  <tr>
    <td>gpe</td>
    <td>Geopolitical Entity</td> 
  </tr>
  
  <tr>
    <td>nat</td>
    <td>Natural Phenomenon</td>
  </tr>
  
  <tr>
    <td>org</td>
    <td>Organization</td> 
  </tr>
  
  <tr>
    <td>per</td>
    <td>Person</td> 
  </tr>
  
  <tr>
    <td>tim</td>
    <td>Time indicator</td> 
  </tr>
</table>

### Model 1
<hr style="height:2px;border:none;color:#333;background-color:#333;" />

#### Data Matrices

<p> <font style="FONT-SIZE:15px; COLOR:#000000; LINE-HEIGHT:25px; FONT-FAMILY:Arial,Helvetica,sans-serif">
Model 1 is the simplest model imaginable. For any given datapoint, an individual word corresponding to that datapoint makes up the feature. In other words, the feature matrix is a single column matrix of dimension N x 1, where N is the total number of words in the entire dataset.

Feature matrix X: <br><br>
$\begin{pmatrix}word \hspace{0.1cm} 1\\ word\hspace{0.1cm}2\\ word\hspace{0.1cm}3\\ ...\\ word\hspace{0.1cm}N\\\end{pmatrix}$

Label matrix Y: <br> <br>
    $\begin{pmatrix}tag \hspace{0.1cm} 1\\ tag\hspace{0.1cm}2\\ tag\hspace{0.1cm}3\\ ...\\ tag\hspace{0.1cm}N\\\end{pmatrix}$

#### Results

<table style="width:80%">
  <tr>
    <th>Tag</th>
    <th>Precision</th> 
    <th>Recall</th>
    <th>F1 Score</th>
    <th>Number of datapoints under this</th>
  </tr>
    
  <tr>
    <td>b-art</td>
    <td>0.33333</td> 
    <td>0.04651</td>
    <td>0.08163</td>
    <td>43</td>
  </tr>
    
  <tr>
    <td>b-eve</td>
    <td>0.66667</td> 
    <td>0.29630</td>
    <td>0.41026</td>
    <td>27</td>
  </tr>
    
  <tr>
    <td>b-geo</td>
    <td>0.79778</td> 
    <td>0.76673</td>
    <td>0.78195</td>
    <td>3751</td>
  </tr>
  
  <tr>
    <td>b-gpe</td>
    <td>0.92446</td> 
    <td>0.95269</td>
    <td>0.93837</td>
    <td>1670</td>
  </tr>
  
   <tr>
    <td>b-nat</td>
    <td>1.00000</td> 
    <td>0.20000</td>
    <td>0.33333</td>
    <td>15</td>
  </tr>
  
   <tr>
    <td>b-org</td>
    <td>0.72855</td> 
    <td>0.38951</td>
    <td>0.50763</td>
    <td>2136</td>
  </tr>
  
   <tr>
    <td>b-per</td>
    <td>0.76718</td> 
    <td>0.36022</td>
    <td>0.49024</td>
    <td>1674</td>
  </tr>
  
   <tr>
    <td>b-tim</td>
    <td>0.88656</td> 
    <td>0.69536</td>
    <td>0.77940</td>
    <td>2068</td>
  </tr>
  
   <tr>
    <td>i-art</td>
    <td>0.00000</td> 
    <td>0.00000</td>
    <td>0.00000</td>
    <td>22</td>
  </tr>
  
   <tr>
    <td>i-eve</td>
    <td>0.00000</td> 
    <td>0.00000</td>
    <td>0.00000</td>
    <td>25</td>
  </tr>
  
   <tr>
    <td>i-geo</td>
    <td>0.66972</td> 
    <td>0.53441</td>
    <td>0.59446</td>
    <td>683</td>
  </tr>
  
   <tr>
    <td>i-gpe</td>
    <td>1.00000</td> 
    <td>0.18750</td>
    <td>0.31579</td>
    <td>32</td>
  </tr>
  
   <tr>
    <td>i-nat</td>
    <td>0.00000</td> 
    <td>0.00000</td>
    <td>0.00000</td>
    <td>4</td>
  </tr>
  
   <tr>
    <td>i-org</td>
    <td>0.63462</td> 
    <td>0.01887</td>
    <td>0.03665</td>
    <td>1749</td>
  </tr>
  
   <tr>
    <td>i-per</td>
    <td>0.76443</td> 
    <td>0.56616</td>
    <td>0.65052</td>
    <td>1708</td>
  </tr>
  
   <tr>
    <td>i-tim</td>
    <td>0.49796</td> 
    <td>0.19396</td>
    <td>0.27918</td>
    <td>629</td>
  </tr>
  
   <tr>
    <td>o</td>
    <td>0.94022</td> 
    <td>0.99789</td>
    <td>0.96820</td>
    <td>85194</td>
  </tr>
    
</table>

Weighted average F1-score = <b>0.90987</b>

<hr style="height:4px;border:none;color:#333;background-color:#333;" />

### Model 2
<hr style="height:2px;border:none;color:#333;background-color:#333;" />

<p> <font style="FONT-SIZE:15px; COLOR:#000000; LINE-HEIGHT:25px; FONT-FAMILY:Arial,Helvetica,sans-serif">
In model 2, model 1 feature matrix has been altered as shown. For any given datapoint, instead of considering only the given word as the feature, one word before the given word and one word after the given word have been considered. In this manner, the feature matrix is modified to account for all the words. The terms 'BOS' and 'EOS' indicating 'Beginning of Sentence' and 'End of Sentence' have been added at the beginning and end of each sentence.

Feature matrix X: <br><br>
$\begin{pmatrix}
BOS \hspace{0.1cm}  \hspace{0.3cm}word\hspace{0.1cm}1\hspace{0.3cm}word\hspace{0.1cm}2\\ word\hspace{0.1cm}1\hspace{0.3cm}word\hspace{0.1cm}2\hspace{0.3cm}word\hspace{0.1cm}3\\
word\hspace{0.1cm}2\hspace{0.3cm}word\hspace{0.1cm}3\hspace{0.3cm}word\hspace{0.1cm}4\\
 ...\\
word\hspace{0.1cm}N-1 \hspace{0.3cm}word\hspace{0.1cm}N\hspace{0.3cm}EOS\\\end{pmatrix}$

Label matrix Y: <br> <br>
    $\begin{pmatrix}tag \hspace{0.1cm} 1\\ tag\hspace{0.1cm}2\\ tag\hspace{0.1cm}3\\ ...\\ tag\hspace{0.1cm}N\\\end{pmatrix}$

#### Results

<table style="width:80%">
  <tr>
    <th>Tag</th>
    <th>Precision</th> 
    <th>Recall</th>
    <th>F1 Score</th>
    <th>Number of datapoints under this</th>
  </tr>
    
  <tr>
    <td>b-art</td>
    <td>0.30769</td> 
    <td>0.09302</td>
    <td>0.14286</td>
    <td>43</td>
  </tr>
    
  <tr>
    <td>b-eve</td>
    <td>0.80000</td> 
    <td>0.29630</td>
    <td>0.43243</td>
    <td>27</td>
  </tr>
    
  <tr>
    <td>b-geo</td>
    <td>0.86643</td> 
    <td>0.77493</td>
    <td>0.81813</td>
    <td>3750</td>
  </tr>
  
  <tr>
    <td>b-gpe</td>
    <td>0.94223</td> 
    <td>0.94787</td>
    <td>0.94504</td>
    <td>1669</td>
  </tr>
  
   <tr>
    <td>b-nat</td>
    <td>1.00000</td> 
    <td>0.33333</td>
    <td>0.50000</td>
    <td>15</td>
  </tr>
  
   <tr>
    <td>b-org</td>
    <td>0.78608</td> 
    <td>0.50773</td>
    <td>0.61696</td>
    <td>2135</td>
  </tr>
  
   <tr>
    <td>b-per</td>
    <td>0.84945</td> 
    <td>0.64115</td>
    <td>0.73074</td>
    <td>1674</td>
  </tr>
  
   <tr>
    <td>b-tim</td>
    <td>0.90988</td> 
    <td>0.81084</td>
    <td>0.85751</td>
    <td>2067</td>
  </tr>
  
   <tr>
    <td>i-art</td>
    <td>0.65841</td> 
    <td>0.73939</td>
    <td>0.69655</td>
    <td>683</td>
  </tr>
  
   <tr>
    <td>i-eve</td>
    <td>1.00000</td> 
    <td>0.71875</td>
    <td>0.83636</td>
    <td>32</td>
  </tr>
  
   <tr>
    <td>i-geo</td>
    <td>1.00000</td> 
    <td>0.25000</td>
    <td>0.40000</td>
    <td>4</td>
  </tr>
  
   <tr>
    <td>i-gpe</td>
    <td>0.78629</td> 
    <td>0.39337</td>
    <td>0.52439</td>
    <td>1749</td>
  </tr>
  
   <tr>
    <td>i-nat</td>
    <td>0.87068</td> 
    <td>0.75381</td>
    <td>0.80804</td>
    <td>1706</td>
  </tr>
  
   <tr>
    <td>i-org</td>
    <td>0.80802</td> 
    <td>0.60890</td>
    <td>0.69447</td>
    <td>1749</td>
  </tr>
  
   <tr>
    <td>i-per</td>
    <td>0.76443</td> 
    <td>0.56616</td>
    <td>0.65052</td>
    <td>629</td>
  </tr>
  
   <tr>
    <td>i-tim</td>
    <td>0.80802</td> 
    <td>0.60890</td>
    <td>0.69447</td>
    <td>629</td>
  </tr>
  
   <tr>
    <td>o</td>
    <td>0.96149</td> 
    <td>0.99603</td>
    <td>0.97846</td>
    <td>85134</td>
  </tr>
    
</table>

Weighted Average F1 score = <b>0.94240</b>

<hr style="height:4px;border:none;color:#333;background-color:#333;" />