## Task 1: Sentiment Analysis
* Install the transformers library if you haven't already: pip install transformers
* Import the pipeline function from the transformers library.
* Initialize a sentiment analysis pipeline.
* Use the pipeline to classify the sentiment of the following text samples:
    * "I love using the Hugging Face library!"
    * "I'm not very fond of this movie."
    * "The weather is terrible today."
* Print the results, including the label (positive or negative) and the confidence score.

In [1]:
pip install transformers



In [2]:
from transformers import pipeline

In [3]:
sentiment_analysis = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [20]:
texts = ["I love using the Hugging Face library!", "I'm not very fond of this movie.", "The weather is terrible today."]
sentiment_results = sentiment_analysis(texts)

print("Sentiment Analysis")
for text, result in zip(texts, sentiment_results):
    print(f"{text}: {result['label']} ({result['score']:.2f})")


Sentiment Analysis
I love using the Hugging Face library!: POSITIVE (1.00)
I'm not very fond of this movie.: NEGATIVE (1.00)
The weather is terrible today.: NEGATIVE (1.00)


## Task 2: Question Answering
* Initialize a question answering pipeline.

* Use the pipeline to answer the following question given the context:

  * Context: "Hugging Face is a company based in New York City. Its headquarters are in DUMBO, Brooklyn."
  * Question: "Where is Hugging Face's headquarters located?"

* Print the result, including the answer and the confidence score.

In [6]:
question_answering = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [8]:
context = "Hugging Face is a company based in New York City. Its headquarters are in DUMBO, Brooklyn."
question = "Where is Hugging Face's headquarters located?"
answer = question_answering(question=question, context=context)


In [19]:
print("Question Answering")
print(f"Answer: {answer['answer']} (confidence: {answer['score']:.2f})")

Question Answering
Answer: DUMBO, Brooklyn (confidence: 0.49)


## Task 3: Named Entity Recognition
* Initialize a named entity recognition pipeline.

* Use the pipeline to extract entities from the following text:
  * "Elon Musk is the CEO of Tesla, Inc., an American electric vehicle and clean energy company based in Palo Alto, California."

* Print the result, including the entities and their types (e.g., person, organization, location).

In [11]:
ner = pipeline("ner")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

In [12]:
text = "Elon Musk is the CEO of Tesla, Inc., an American electric vehicle and clean energy company based in Palo Alto, California."
entities = ner(text)

In [18]:
print("Named Entity Recognition:")
for entity in entities:
    print(f"{entity['word']}: {entity['entity']} ({entity['score']:.2f})")

Named Entity Recognition:
El: I-PER (1.00)
##on: I-PER (1.00)
Mu: I-PER (1.00)
##sk: I-PER (1.00)
Te: I-ORG (1.00)
##sla: I-ORG (1.00)
,: I-ORG (0.99)
Inc: I-ORG (1.00)
American: I-MISC (1.00)
Pa: I-LOC (1.00)
##lo: I-LOC (0.99)
Alto: I-LOC (1.00)
California: I-LOC (1.00)


## Task 4: Text Summarization
* Initialize a text summarization pipeline.

* Use the pipeline to generate a summary of the following text:
  * "Natural language processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans through natural language. The goal of NLP is to enable computers to understand, interpret, and generate human language in a way that is both meaningful and useful. NLP techniques are used in a wide range of applications, including text analysis, sentiment analysis, machine translation, and chatbot development. Recent advances in deep learning have led to significant improvements in the performance of NLP models, making it possible to tackle complex language tasks with greater accuracy and efficiency."

* Print the generated summary.


In [14]:
summarization = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [15]:
text = "Natural language processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans through natural language. The goal of NLP is to enable computers to understand, interpret, and generate human language in a way that is both meaningful and useful. NLP techniques are used in a wide range of applications, including text analysis, sentiment analysis, machine translation, and chatbot development. Recent advances in deep learning have led to significant improvements in the performance of NLP models, making it possible to tackle complex language tasks with greater accuracy and efficiency."
summary = summarization(text)[0]['summary_text']


Your max_length is set to 142, but your input_length is only 116. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=58)


In [17]:
print("Text Summarization:")
print(summary)

Text Summarization:
 Natural language processing is a subfield of artificial intelligence that focuses on the interaction between computers and humans through natural language . The goal of NLP is to enable computers to understand, interpret, and generate human language in a way that is both meaningful and useful . NLP techniques are used in a wide range of applications, including text analysis, sentiment analysis, machine translation and chatbot development .


## Task 5: Text Translation
* Initialize a text translation pipeline that translates from English to French.

* Use the pipeline to translate the following sentence from English to French:

  * "Hugging Face provides state-of-the-art NLP models and tools."

* Print the translated text.

In [21]:
translation = pipeline("translation_en_to_fr")

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [22]:
text = "Hugging Face provides state-of-the-art NLP models and tools."
translated_text = translation(text, max_length=40)[0]['translation_text']

In [23]:
print("Text Translation:")
print(translated_text)

Text Translation:
Hugging Face fournit des modèles et des outils de pointe en LNP.


## Task 6: Zero-shot Classification
* Initialize a zero-shot classification pipeline.

* Use the pipeline to classify the following text into one of these categories: "sports", "technology", "politics", "entertainment", or "finance":

* "Tesla unveils its latest electric vehicle, the Cybertruck."

* Print the result, including the predicted category and confidence score.

In [24]:
zero_shot = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [25]:
text = "Tesla unveils its latest electric vehicle, the Cybertruck."
categories = ["sports", "technology", "politics", "entertainment", "finance"]
result = zero_shot(text, candidate_labels=categories)

In [27]:
print("Zero-shot Classification: ")
print(f"Predicted category: {result['labels'][0]} (confidence: {result['scores'][0]:.2f})")

Zero-shot Classification: 
Predicted category: technology (confidence: 0.87)
