# Named Entity Recognition aka NER

> spaCy can recognise various types of named entities in a document, by asking the model for a prediction. Because models are statistical and strongly depend on the examples they were trained on, this doesn't always work perfectly and might need some tuning later, depending on your use case.
> 
>  -- from the [spaCy docs](https://spacy.io/usage/linguistic-features#section-named-entities)

In [58]:
import spacy
from tqdm import tqdm # progress bar
import pandas as pd

In [59]:
spacy.__version__

'3.1.1'

In [60]:
# import the dataset
df = pd.read_csv('shanghai_2020.csv')

In [67]:
df.sample(3)

Unnamed: 0,world_rank,institution,national_rank,total_score,country_iso_code,country,year
76,77,Uppsala University,3,28.4,SE,Sweden,2020
233,201-300,Newcastle University,21-28,,GB,United Kingdom,2020
225,201-300,KTH Royal Institute of Technology,6,,SE,Sweden,2020


In [61]:
# Create instance of spcay and load pre-trained model
nlp = spacy.load('en_core_web_sm')

In [62]:
def get_cat(df, col):
    '''Function called for getting 
    lables in dedicated dataframe column
    input: dataframe, name column of dataframe
    output: list of all labels in column
    '''
    
    labels = []
    # save unique content from the column
    col_content = str(df[col].unique())
    doc = nlp(col_content)
    for token in doc.ents:
        #take and save all labels in the list
        labels.append(token.label_)

    return labels

In [63]:
def most_frequent(List):
    '''Function is called for 
    getting most common value in the list'''
    
    return max(set(List), key = List.count)

In [64]:
# Runn the model in a loop for each column
for i,col in enumerate(df.columns):
    col_value = most_frequent(get_cat(df,col))
    print(f'Column No. {i} has preferd category {col_value}')

Column No. 0 has preferd category CARDINAL
Column No. 1 has preferd category ORG
Column No. 2 has preferd category CARDINAL
Column No. 3 has preferd category CARDINAL
Column No. 4 has preferd category ORG
Column No. 5 has preferd category GPE
Column No. 6 has preferd category DATE


## Entities Explained

| Type | 	Description|
|:---|:---
| PERSON |	People, including fictional. |
| NORP | Nationalities or religious or political groups.| 
| FAC|  	Buildings, airports, highways, bridges, etc.| 
| ORG|  	Companies, agencies, institutions, etc.| 
| GPE|  	Countries, cities, states.| 
| LOC|  	Non-GPE locations, mountain ranges, bodies of water.| 
| PRODUCT|  	Objects, vehicles, foods, etc. (Not services.)| 
| EVENT|  	Named hurricanes, battles, wars, sports events, etc.| 
| WORK_OF_ART|  	Titles of books, songs, etc.| 
| LAW|  	Named documents made into laws.| |
| LANGUAGE|  	Any named language.| 
| DATE|  	Absolute or relative dates or periods.| 
| TIME|  	Times smaller than a day.| 
| PERCENT|  	Percentage, including "%".| 
| MONEY|  	Monetary values, including unit.| 
| QUANTITY|  	Measurements, as of weight or distance.| 
| ORDINAL|  	"first", "second", etc.| 
| CARDINAL|  	Numerals that do not fall under another type.| 