## Building a Named Entity Recognition App

In [1]:
!pip install -q gradio

In [2]:
import gradio as gr
from transformers import pipeline

In [3]:
NER= pipeline("ner", model="dslim/bert-base-NER")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertFor

In [4]:
text = "My name is Andrew, I'm building DeepLearningAI and I live in California"
NER(text)

[{'entity': 'B-PER',
  'score': 0.9990625,
  'index': 4,
  'word': 'Andrew',
  'start': 11,
  'end': 17},
 {'entity': 'B-ORG',
  'score': 0.9927856,
  'index': 10,
  'word': 'Deep',
  'start': 32,
  'end': 36},
 {'entity': 'I-ORG',
  'score': 0.99677867,
  'index': 11,
  'word': '##L',
  'start': 36,
  'end': 37},
 {'entity': 'I-ORG',
  'score': 0.9954496,
  'index': 12,
  'word': '##ear',
  'start': 37,
  'end': 40},
 {'entity': 'I-ORG',
  'score': 0.9959293,
  'index': 13,
  'word': '##ning',
  'start': 40,
  'end': 44},
 {'entity': 'I-ORG',
  'score': 0.8917463,
  'index': 14,
  'word': '##A',
  'start': 44,
  'end': 45},
 {'entity': 'I-ORG',
  'score': 0.50361186,
  'index': 15,
  'word': '##I',
  'start': 45,
  'end': 46},
 {'entity': 'B-LOC',
  'score': 0.99969244,
  'index': 20,
  'word': 'California',
  'start': 61,
  'end': 71}]

**Token Merging**

In [5]:
def merge_tokens(tokens):
    merged_tokens = []
    for token in tokens:
        if (
            merged_tokens
            and token['entity'].startswith('I-')
            and merged_tokens[-1]['entity'].endswith(token['entity'][2:])
        ):
            last_token = merged_tokens[-1]
            last_token['word'] += token['word'].replace('##', '')
            last_token['end'] = token['end']
            last_token['score'] = (last_token['score'] + token['score']) / 2
        else:
            merged_tokens.append(token)
    return merged_tokens

In [6]:
def ner(input_text):
    output = NER(input_text)
    merged_tokens = merge_tokens(output)
    return {
        "text": input_text,
        "entities": [
            {"start": t["start"], "end": t["end"], "entity": t["entity"]}
            for t in merged_tokens
        ]
    }

In [7]:
demo = gr.Interface(fn=ner,
                    inputs=[gr.Textbox(label="Text to find entities", lines=2)],
                    outputs=[gr.HighlightedText(label="Text with entities")],
                    title="NER with dslim/bert-base-NER",
                    description="Find entities using the `dslim/bert-base-NER` model under the hood!",
                    allow_flagging="never",
                     examples=[
                           "My name is Olfat, and I live in Giza",
                           "Dr. John Smith works at Stanford University, and his research focuses on Artificial Intelligence. He lives in Palo Alto, California."
                           "Amal moved to Paris in 2015 and started working at UNESCO as a project manager. Her brother, Karim, is studying medicine in Cairo."
                           "The Eiffel Tower is located in Paris, France, and attracts millions of tourists every year. Marie and Jean visited it during their honeymoon."
                           "Microsoft Corporation, headquartered in Redmond, Washington, was founded by Bill Gates and Paul Allen in 1975. It is one of the largest tech companies in the world."
                           "Barack Obama, the 44th President of the United States, was born in Honolulu, Hawaii, and graduated from Harvard Law School."
                            ],
                     css="""
                          .gradio-container {background-color: MediumAquaMarine;}
                           button.gr-button-primary {background-color: green; color: white; border-radius: 8px; border: none;}
                        """
)

demo.launch()



Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://da518d90a2e57b06f8.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


