<a href="https://colab.research.google.com/github/GusdPaula/postgraduation_fiap/blob/main/FASE-5/NLP_Aplicacoes_HuggingFace_spaCy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Aplicações Práticas de NLP com Hugging Face e spaCy

Este notebook demonstra diversas aplicações de Processamento de Linguagem Natural (NLP) utilizando modelos prontos da Hugging Face e spaCy.

In [None]:
# !pip install transformers

## 1. Análise de Sentimentos

In [None]:
from transformers import pipeline

analyzer = pipeline(
    "sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english"
)
analyzer("Este produto é excelente, recomendo muito!")

Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.990990161895752}]

In [None]:
analyzer("Este produto não é recomendado, não recomendo!")

[{'label': 'POSITIVE', 'score': 0.5387459993362427}]

## 2. Classificação de Texto


In [None]:
classifier = pipeline(
    "text-classification", model="distilbert-base-uncased-finetuned-sst-2-english"
)
classifier("I loved the movie, it was amazing!")

Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9998828172683716}]

## 3. Chatbot / Assistente Virtual

In [None]:
chatbot = pipeline("text-generation", model="microsoft/DialoGPT-medium")
chatbot("Usuário: Olá, tudo bem?\nBot:", max_new_tokens=50)

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Usuário: Olá, tudo bem?\nBot: Ol, bem.'}]

In [None]:
chatbot("Hello, how are you? ")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, how are you? ive been thinking about getting a new job.'}]

## 4. Reconhecimento de Entidades Nomeadas (NER)


In [None]:
!pip install spacy
!python -m spacy download en_core_web_sm

In [None]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
[(ent.text, ent.label_) for ent in doc.ents]

[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

## 5. Tradução Automática

In [None]:
translator = pipeline("translation_en_to_fr")
translator("Machine learning is revolutionizing industries.", max_length=40)

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'translation_text': "L'apprentissage automatique révolutionne les industries."}]

## 6. Resumo Automático

In [None]:
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = """Natural Language Processing is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language."""
summarizer(text, max_length=30, min_length=5, do_sample=False)

Device set to use cuda:0
Your max_length is set to 30, but your input_length is only 22. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=11)


[{'summary_text': 'Natural Language Processing is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language.'}]

In [None]:
summarizer(text, max_length=11, min_length=5, do_sample=False)

[{'summary_text': 'Natural Language Processing is a branch of artificial'}]

## 7. Resposta a Perguntas (QA)

In [None]:
qa = pipeline("question-answering", model="deepset/roberta-base-squad2")
context = "Albert Einstein was a theoretical physicist who developed the theory of relativity."
question = "Who developed the theory of relativity?"
qa(question=question, context=context)

Device set to use cuda:0


{'score': 0.9415659308433533,
 'start': 0,
 'end': 15,
 'answer': 'Albert Einstein'}

In [None]:
question = "Who are you?"
qa(question=question, context=context)

{'score': 0.18265752494335175,
 'start': 22,
 'end': 43,
 'answer': 'theoretical physicist'}

## 9. Busca Semântica com Embeddings

In [None]:
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')
query = "Como melhorar a produtividade no trabalho"
docs = [
    "Técnicas de produtividade",
    "Receitas de bolo",
    "Organização pessoal no trabalho",
]

In [None]:
query_emb = model.encode(query, convert_to_tensor=True)
docs_emb = model.encode(docs, convert_to_tensor=True)

In [None]:
scores = util.cos_sim(query_emb, docs_emb)
docs[scores.argmax()]

'Organização pessoal no trabalho'

In [None]:
scores

tensor([[0.6547, 0.4059, 0.7353]], device='cuda:0')

In [None]:
db = sorted(
    [
        (doc, float(score))
        for doc, score in zip(docs, scores[0])
    ],
    key=lambda x: x[1],
    reverse=True
)

db = [record[0] for record in db]

In [None]:
db

['Organização pessoal no trabalho',
 'Técnicas de produtividade',
 'Receitas de bolo']