# Traduction de texte et analyse de sentiment #

In [1]:
from transformers import pipeline
import pandas as pd
import warnings

warnings.filterwarnings('ignore')

In [2]:
# Initialisation du texte à traiter



text = """I ordered a used book from your online site. \

The book is Les mystères de Paris from Victor Hugo. \

It was indicated on the site that it was in good condition. \

I received it a week after ordering. \

And while unpacking the package I realized that it was damaged (damaged pages, writings). \

I contacted customer service who proceeded to the immediate refund."""


In [3]:
# Traduction

translator = pipeline('translation_en_to_fr')
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)

print(outputs[0]['translation_text'])

No model was supplied, defaulted to google-t5/t5-base and revision 686f1db (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

J'ai commandé un livre d'occasion à partir de votre site en ligne. Le livre est Les mystères de Paris de Victor Hugo. Il était indiqué sur le site qu'il était en bon état. J'ai reçu le livre une semaine après avoir commandé et alors que je déballais le paquet, je me suis rendu compte qu'il était endommagé (pages endommagées, écrits).


In [4]:
# Text Classification

classifier = pipeline("text-classification")
outputs = classifier(text)

pd.DataFrame(outputs)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Unnamed: 0,label,score
0,NEGATIVE,0.998112


In [5]:
# Name Entity Recognition

ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger("Donald Trump, the former president, is now playing golf all days in Florida with his friend Bill")

pd.DataFrame(outputs)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Unnamed: 0,entity_group,score,word,start,end
0,PER,0.999558,Donald Trump,0,12
1,LOC,0.999614,Florida,68,75
2,PER,0.999508,Bill,92,96


In [7]:
# Question Answering

reader = pipeline("question-answering")
results = []
questions = ["What was ordered?", "What did the customer service?"]

for question in questions:
    outputs = reader(question=question, context=text)
results.append(outputs)

pd.DataFrame(results)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

In [None]:
# Summarization

summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=150, clean_up_tokenization_spaces=True)

print(outputs[0]['summary_text'])

In [None]:
# Text Generation

generator = pipeline("text-generation")
response = "Dear friends. Let me tell you my last experience with this online store where I ordered a book."
prompt = text + "\n\nStory I told to my friends:\n\n" + response
outputs = generator(prompt, max_length=200)

print(outputs[0]['generated_text'])