# NLP 303 - Natural Language Processing
## Task 1
### By: Michael Cuffe
### Assessment 1
### Due: 20/10/2024 23:59

# Instructions

### Install Transformers

In [17]:
!pip install transformers





### Install TensorFlow

In [18]:
!pip install tensorflow





### Fix a Compatibility Error

In [19]:
!pip install tf-keras





# Testing The Installation of Transformers

In [20]:
from transformers import pipeline

## Check that transformers is functional.

#### Added default models to all pipeline declarations in this case "google/t5-base" was added to the list of known models.
#### As im running this locally i also had to define the device as 0 to enable GPU usage.
#### All pipelines use the recommended default models for the task.

In [21]:
translator = pipeline("translation_en_to_de", model="google-t5/t5-base", device=0)
print(translator("The magic of transformers lies in pre-trained models"))

[{'translation_text': 'Die Magie der Transformatoren liegt in vorgeschulten Modellen'}]


# NER Pipeline
### Named Entity Recognition

ORG = Organizations <br>
LOC = Locations <br>
PER = Persons <br>
MISC = Miscellaneous <br>

This node displays an error however the error can safely be ignored as the code is functional and the error is expected.

In [22]:
# Initialize the NER pipeline
ner_pipeline = pipeline("ner", grouped_entities=True, model="dbmdz/bert-large-cased-finetuned-conll03-english",
                        device=0)

# Specify a sequence of text
text = "Steve Irwin was born in Australia. He was a leading nature conservationist and led and curated Taronga Zoo with the help of the NSW Government."

# Perform NER
ner_results = ner_pipeline(text)
#I used a for loop here to print the results in a more readable format.
for entity in ner_results:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Score: {entity['score']:.4f}")

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Entity: Steve Irwin, Type: PER, Score: 0.9997
Entity: Australia, Type: LOC, Score: 0.9995
Entity: Taronga Zoo, Type: LOC, Score: 0.8920
Entity: NSW Government, Type: ORG, Score: 0.8671


# 

# Sentiment Analysis

In [23]:
# Initialize the sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", device=0)

# Give the strings to be analysed.
sequence1 = "I love the coding in python!"
sequence2 = "Using 200 nested if statements is terrible."

# Perform sentiment analysis
sentiment_results1 = sentiment_pipeline(sequence1)
sentiment_results2 = sentiment_pipeline(sequence2)

# Print the results
print(sentiment_results1)
print(sentiment_results2)

[{'label': 'POSITIVE', 'score': 0.9996840953826904}]
[{'label': 'NEGATIVE', 'score': 0.9977462887763977}]


# Summarization Pipeline

In [24]:
# Initialize the summarization pipeline
summarization_pipeline = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6", device=0)

# Specify a sequence of text (350-500 words)
long_text = """
The young fox went along the river until he met a large old toad. Upon meeting the toad the fox was baffled.
The toad was not afraid of the fox and the fox was not afraid of the toad. They both sat down and without a word conversed.
The fox tilted his head and the toad blinked. The fox blinked and the toad tilted his head.
The toad with her wide eyes spotted a fly on the fox and the fox with his keen eyes spotted a mouse under the toad. 
They both looked at each-other for a moment as if to request permission for what each wanted permission neither of them knew.
They did however feel their stomachs grumble and their mouths water. They both still sat and without a sound conversed.
The fly and the mouse were both aware of the fox and the toad. Neither dared to move in terror of their impending doom.
When the fox and the toad made eye contact once more they both knew what the other wanted.
At the very same moment the fox lunged at the toad and the toad lunged at the fox. The fly and the mouse at this point had also had their own conversation wordlessly. With the time allowed to them the fly hurried toward the fox and the mouse dashed toward the toad.
The fox and the toad collided with a large resounding crash, while the fly and the mouth made their escape in the confusion.
The fox and the toad both lay on the ground dazed and confused. They both looked at each-other and without a sound conversed.
Until both their bellies grumbled loudly and they both knew what the other wanted. They both stood up and walked away in shame of their poor communication skills. 
If only they were not so shy and had shown their cards sooner they both would have had a meal.
"""

# Perform summarization
summary = summarization_pipeline(long_text, max_length=100, min_length=30, do_sample=False)
print(summary)

[{'summary_text': ' The young fox went along the river until he met a large old toad . Upon meeting the toad the fox was baffled . The toad was not afraid of the fox . They both sat down and without a word conversed. They both looked at each-other for a moment as if to request permission for what each wanted .'}]


# Text Generation

In [25]:
# Initialize the text generation pipeline with GPU and truncation
text_generation_pipeline = pipeline("text-generation", model="gpt2", device=0, truncation=True)

# Specify a starting sequence of text
starting_text = "Upon the spire, above the clouds rested a scintillating dragon,"

# Generate text
generated_text = text_generation_pipeline(starting_text, max_length=500, pad_token_id=50256)
print(generated_text[0]['generated_text'])

Upon the spire, above the clouds rested a scintillating dragon, a man of pure white hair, of a golden face of red color, and a thick beard on his back. A voice came out of the mist: "Greetings! This is your servant Ailon Norsk." "Your servant to the Windhelm." Then the dragon disappeared, bringing all the light to the city, where the citizens would once again rejoice, and a new day should soon come. The fire that consumed it reached its height and consumed the whole ground, and as water vapor and ice engulfed the clouds, the city was turned into a lake, where every single family, city, and city began anew, and in the midst of that there arose an army, led by Ailon Norsk's son Ailon I the High Lord of the High Winds. Within the city they found the dragon that came before them! And Ailon I's men were brought to his throne, where they placed them in his command and the high lords of the city heard about his victory. Ailon I said, "Now we will wait in ambush!" so the city's inhabitants wer

# Question Answering

In [26]:
# Initialize the question answering pipeline
qa_pipeline = pipeline("question-answering", model="distilbert/distilbert-base-cased-distilled-squad", device=0)

# Specify a context and a question
context = """
Python is a programming language. It is used for web development, data analysis, artificial intelligence, and scientific computing.
Python is easy to learn and has a large community. Python is an interpreted language, which means that it is executed line by line.
It is known for its readability and simplicity. Python is an object-oriented language, which means that it can model real-world entities.
Python doesnt require a compiler, which makes it easier to debug. Python is a high-level language, which means that it is closer to human language.
Python has its own memory management, which means that it automatically allocates and deallocates memory.
"""
question = "What is the main topic of the text?"

# Perform question answering
qa_result = qa_pipeline(question=question, context=context)
print(qa_result)

{'score': 0.06401262432336807, 'start': 50, 'end': 105, 'answer': 'web development, data analysis, artificial intelligence'}


<br>
<br>
<br>
End of File