# LAB 1 Notebook

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

### 1. 
What are the sentiment labels and their associated confidence scores provided by the sentiment analysis pipeline for the following text samples:

* 'The new restaurant in town exceeded all my expectations, I loved it!'
* 'This movie is incredibly captivating and heartwarming.'

Please provide the sentiment labels and their respective confidence scores for each text sample.

In [None]:
from transformers import pipeline

# Load the sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")

# Different text samples for sentiment analysis
different_text_samples = [
    "The new restaurant in town exceeded all my expectations, I loved it!",
    "This movie is incredibly captivating and heartwarming.",
]

# Perform sentiment analysis on the new text samples
different_sentiment_results = classifier(different_text_samples)

print(different_sentiment_results)


### 2. 
What are the scores, rounded to four decimal places, assigned to the candidate labels when using zero-shot classification with the given text: 'The latest scientific research suggests a breakthrough in renewable energy sources.'? The candidate labels are 'technology', 'environment', and 'health'.

In [None]:
from transformers import pipeline

# Load the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification")

# Text to classify
text_to_classify = "The latest scientific research suggests a breakthrough in renewable energy sources."

# Candidate labels/categories
candidate_labels = ["technology", "environment", "health"]

# Perform zero-shot classification
result = classifier(text_to_classify, candidate_labels)

print(result)

### 3. 
What are three different texts generated by a text generation model using the prompt 'Exploring the vast universe beyond our solar system'? The model used is based on the distilgpt2 architecture, with a maximum length of 30 tokens for each generated sequence.

In [None]:
from transformers import pipeline

# Create a text generation pipeline using the distilgpt2 model
generator = pipeline("text-generation", model="distilgpt2")

# Change the context for text generation
generated_text = generator(
    "Exploring the vast universe beyond our solar system",
    max_length=30,
    num_return_sequences=3,
)

# Print the generated texts
for sequence in generated_text:
    print(sequence['generated_text'])


### 4.

What are three different words suggested to fill in the mask in the sentence 'Discovering new <mask> is an exciting adventure.'? The model used provides the top three predicted words to fill the mask.

In [None]:
from transformers import pipeline

# Create a pipeline for masked text filling
unmasker = pipeline("fill-mask")

# Provide a different context for masked text filling
filled_masks = unmasker("Discovering new <mask> is an exciting adventure.", top_k=3)

# Display the filled mask results
for result in filled_masks:
    print(result['sequence'])


### 5.

What are the identified named entities, their corresponding entity types, and the associated confidence scores in the sentence _'The company SpaceX, founded by Elon Musk, is known for its ambitious projects.'_ according to the Named Entity Recognition (NER) model?"

In [None]:
from transformers import pipeline

# Create a Named Entity Recognition pipeline
ner = pipeline("ner", grouped_entities=True)

# Analyze entities in the text
entities = ner("The company SpaceX, founded by Elon Musk, is known for its ambitious projects.")

# Print identified entities
for entity in entities:
    print(f"Entity: {entity['entity_group']}, Text: {entity['word']}, Score: {entity['score']}")


### 6.

What information does the question-answering model provide when asked, _'What is the capital of France?'_ within the context of _'Paris is the capital city of France, known for its art, fashion, and culture.'?_

In [None]:
from transformers import pipeline

# Create a question-answering pipeline
question_answerer = pipeline("question-answering")

# Provide a different context and ask a question
answer = question_answerer(
    question="What is the capital of France?",
    context="Paris is the capital city of France, known for its art, fashion, and culture.",
)

# Print the answer
print(answer['answer'])


### 7.

What is the summarization provided by the summarization pipeline for the following text:

'Artificial intelligence (AI) has been rapidly evolving in recent years, impacting various industries and sectors worldwide. Its applications span from healthcare to finance, revolutionizing how tasks are performed and problems are solved. However, the rapid advancement of AI also raises ethical concerns regarding privacy, bias, and job displacement. As AI technology continues to develop, it becomes crucial to address these ethical implications and ensure responsible use. While AI offers immense potential for innovation and efficiency, it must be guided by ethical frameworks to mitigate risks and maximize benefits.'

In [None]:
from transformers import pipeline

# Initialize the summarization pipeline
summarizer = pipeline("summarization")

# Provide a different text for summarization
summary = summarizer(
    """
   Artificial intelligence (AI) has been rapidly evolving in recent years, impacting various industries and sectors worldwide. 
    Its applications span from healthcare to finance, revolutionizing how tasks are performed and problems are solved. 
    However, the rapid advancement of AI also raises ethical concerns regarding privacy, bias, and job displacement. 
    As AI technology continues to develop, it becomes crucial to address these ethical implications and ensure responsible use. 
    While AI offers immense potential for innovation and efficiency, it must be guided by ethical frameworks to mitigate risks and maximize benefits.
    """
)

# Print the generated summary
print(summary[0]['summary_text'])


### 8.

What is the English translation of the following French sentence using the **Helsinki-NLP/opus-mt-fr-en** translation model: _'La ville de Paris est célèbre pour sa tour Eiffel.'?_

In [None]:
from transformers import pipeline

# Translation pipeline for French to English
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")

# Translate a French sentence to English
translated_text = translator("La ville de Paris est célèbre pour sa tour Eiffel.")

# Display the translated text
print(translated_text[0]['translation_text'])


### 9. 
Why might the following code not work when using the fill-mask pipeline from the Transformers library?

```python
from transformers import pipeline

# Load the fill-mask pipeline
unmasker = pipeline("fill-mask")

# Text with a mask to fill
text_with_mask = "Exploring the depths of  is an exciting adventure."

# Perform the masked language modeling
filled_mask = unmasker(text_with_mask, top_k=3)
```

### ANSWER
"a) The text provided for filling the mask does not contain a valid mask token ("\<mask\>")."

### 10.

 What is the result of the below code?
 
```python
from transformers import pipeline

# Load the Named Entity Recognition (NER) pipeline
ner = pipeline("ner", grouped_entities=True)

# Text for named entity recognition
new_text_to_analyze = "During the space mission, Neil Armstrong landed on the moon in 1969."

# Perform Named Entity Recognition (NER)
new_ner_result = ner(new_text_to_analyze)

print(new_ner_result)
```

### ANSWER 
" C) {'entity_group': 'PER', 'score': 0.99813056, 'word': 'Neil Armstrong', 'start': 26, 'end': 40}"