## Named Entity Recognition

In [1]:
from transformers import pipeline
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
nlp = pipeline('ner')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [3]:
result = nlp("I am a Hamoye instructor and my name is Nnaemeka")
print("result:",result)

result: [{'entity': 'I-ORG', 'score': 0.64403105, 'index': 4, 'word': 'Ham', 'start': 7, 'end': 10}, {'entity': 'I-ORG', 'score': 0.8081638, 'index': 5, 'word': '##oy', 'start': 10, 'end': 12}, {'entity': 'I-ORG', 'score': 0.9412854, 'index': 6, 'word': '##e', 'start': 12, 'end': 13}, {'entity': 'I-PER', 'score': 0.99075085, 'index': 12, 'word': 'N', 'start': 40, 'end': 41}, {'entity': 'I-PER', 'score': 0.98543763, 'index': 13, 'word': '##nae', 'start': 41, 'end': 44}, {'entity': 'I-PER', 'score': 0.9662203, 'index': 14, 'word': '##me', 'start': 44, 'end': 46}, {'entity': 'I-PER', 'score': 0.88381165, 'index': 15, 'word': '##ka', 'start': 46, 'end': 48}]


In [5]:
df = pd.DataFrame(result)
df

Unnamed: 0,entity,score,index,word,start,end
0,I-ORG,0.644031,4,Ham,7,10
1,I-ORG,0.808164,5,##oy,10,12
2,I-ORG,0.941285,6,##e,12,13
3,I-PER,0.990751,12,N,40,41
4,I-PER,0.985438,13,##nae,41,44
5,I-PER,0.96622,14,##me,44,46
6,I-PER,0.883812,15,##ka,46,48


In [6]:
result = nlp("I love Coldstone Icecream and Domino's Pizza")
pd.DataFrame(result)

Unnamed: 0,entity,score,index,word,start,end
0,I-ORG,0.971832,3,Cold,7,11
1,I-ORG,0.96339,4,##stone,11,16
2,I-ORG,0.954457,5,Ice,17,20
3,I-ORG,0.913391,6,##cre,20,23
4,I-ORG,0.967593,7,##am,23,25
5,I-ORG,0.901036,9,Dom,30,33
6,I-ORG,0.874147,10,##ino,33,36
7,I-ORG,0.958965,11,',36,37
8,I-ORG,0.958817,12,s,37,38
9,I-ORG,0.962574,13,Pizza,39,44


## Question and Answering

In [4]:
qna = pipeline('question-answering')

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [5]:
context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question.
An example of a question answering dataset is the SQuAD dataset, which is entirely based on that task. 
If you would like to fine-tune a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
"""

In [6]:
result = qna(question="What is a good example of a question answering dataset?",context=context)

In [7]:
print(result)

{'score': 0.5152311325073242, 'start': 147, 'end': 160, 'answer': 'SQuAD dataset'}
