O NER (Named Entity Recognition) é uma tarefa de Processamento de Linguagem Natural cujo objetivo é identificar e classificar automaticamente “entidades nomeadas” em um texto.

Essas entidades são palavras ou expressões que representam objetos do mundo real, como:

- Pessoas
- Locais
- Organizações
- Datas
- Valores
- Produtos, eventos etc.

Exemplo:

“A Apple lançou o iPhone 15 em setembro de 2023 nos Estados Unidos.”

→ O modelo NER pode reconhecer:

- Apple → ORGANIZAÇÃO
- iPhone 15 → PRODUTO
- setembro de 2023 → DATA
- Estados Unidos → LOCAL

In [1]:
pip install spacy

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import en_core_web_sm
import spacy
nlp = spacy.load("en_core_web_sm")

ModuleNotFoundError: No module named 'en_core_web_sm'

In [None]:
import random
from spacy import util
from spacy.tokens import Doc
from spacy.training import Example
from spacy.language import Language
from spacy import displacy

def customizing_pipeline_component(nlp: Language):
    train_data = [
        ('Apple is releasing a new iPhone next month', [(0, 5, "BRAND"), (25, 31, "PRODUCT"), (32, 42,"DATE")]),
        ("Nike is the world's leading sportswear brand", [(0, 4, "BRAND")]),
        ("Coca-Cola's revenue has increased by 5% this year", [(0, 10, "BRAND")]),
        #("Estamos no E-info, melhor evento de Informática, com Luirys o melhor professor.",[(11, 17, "EVENT"), (48, 54, "PERSON")])
    ]

    # Result before training
    print(f"\nResult BEFORE training:")
    doc = nlp(u'Steve Jobs was fired by Apple.')
    displacy.render(doc, style="ent", jupyter=True)

    # Disable all pipe components except 'ner'
    #disabled_pipes = []
    #for pipe_name in nlp.pipe_names:
    #    if pipe_name != 'ner':
    #        nlp.disable_pipes(pipe_name)
    #        disabled_pipes.append(pipe_name)

    print("   Training ...")
    optimizer = nlp.create_optimizer()
    for _ in range(25):
        random.shuffle(train_data)
        for raw_text, entity_offsets in train_data:
            doc = nlp.make_doc(raw_text)
            example = Example.from_dict(doc, {"entities": entity_offsets})
            nlp.update([example], sgd=optimizer)

    # Enable all previously disabled pipe components
    #for pipe_name in disabled_pipes:
    #    nlp.enable_pipe(pipe_name)

    # Result after training
    print(f"Result AFTER training:")
    doc = nlp(u'Steve Jobs was fired by Apple')
    displacy.render(doc, style="ent", jupyter=True)

def main():
    nlp = spacy.load('en_core_web_sm')
    customizing_pipeline_component(nlp)


if __name__ == '__main__':
    main()


Result BEFORE training:


   Training ...




Result AFTER training:
