### Introduction

Leveraging the Hugging Face Transformer library, we perform various NLP tasks, showcasing the versatility and efficacy of modern NLP models. The tasks include:

1 - Text Classification: Classifying text into predefined categories.

2 - Named Entity Recognition (NER): Identifying and classifying named entities in text.

3 - Question Answering: Providing precise answers to questions based on context.

4 - Text Generation: Generating coherent and contextually relevant text.

5 - Text Summarization: Summarizing lengthy text into concise summaries.

6 - Machine Translation: Translating text from one language to another.

7 - Fill-Mask: Predicting masked words in a given text.

8 - Chatbot: Implementing conversational agents that interact naturally with users.

9 - Zero-Shot Learning Classification: Classifying text into categories without prior training on those specific categories.


This comprehensive exploration aims to demonstrate the power and flexibility of Hugging Face Transformer models in addressing a wide array of problems within the realm of NLP.

### Import Libraries

In [2]:
from transformers import pipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

### 1- Text Classification

In [3]:
# Load the sentiment-analysis pipeline
sentiment_analysis = pipeline("sentiment-analysis")

# Analyze sentiment of a text
result = sentiment_analysis("I love using Hugging Face Transformers!")
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9971315860748291}]


In [4]:
result = sentiment_analysis("Hugging Face Transformers is amazing!")
print(result)

result = sentiment_analysis("Hugging Face Transformers is not amazing!")
print(result)

[{'label': 'POSITIVE', 'score': 0.9998788833618164}]
[{'label': 'NEGATIVE', 'score': 0.9996373653411865}]


### 2 - Named Entity Recognition ( NER )

In [5]:
# Load the named entity recognition pipeline
ner_pipeline = pipeline("ner", grouped_entities=True)

# Analyze named entities in a text about Neymar
text = """
Neymar da Silva Santos Júnior, commonly known as Neymar,
is a Brazilian professional footballer who plays as a forward for Paris Saint-Germain and the Brazil national team. He was born on February 5,
 1992, in Mogi das Cruzes, São Paulo, Brazil. Neymar is considered one of the best players in the world and has won numerous titles,
  including the Copa Libertadores, La Liga, and the UEFA Champions League. In 2017,
   he transferred from Barcelona to Paris Saint-Germain for a world-record transfer fee of €222 million.
"""

ner_result = ner_pipeline(text)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]



In [6]:
# Print the results
print("Named Entity Recognition Result:", ner_result)

Named Entity Recognition Result: [{'entity_group': 'PER', 'score': 0.98450935, 'word': 'Neymar da Silva Santos Júnior', 'start': 1, 'end': 30}, {'entity_group': 'PER', 'score': 0.9985981, 'word': 'Neymar', 'start': 50, 'end': 56}, {'entity_group': 'MISC', 'score': 0.9974752, 'word': 'Brazilian', 'start': 63, 'end': 72}, {'entity_group': 'ORG', 'score': 0.9823148, 'word': 'Paris Saint - Germain', 'start': 124, 'end': 143}, {'entity_group': 'LOC', 'score': 0.99936813, 'word': 'Brazil', 'start': 152, 'end': 158}, {'entity_group': 'LOC', 'score': 0.9801959, 'word': 'Mogi das Cruzes', 'start': 211, 'end': 226}, {'entity_group': 'LOC', 'score': 0.9988634, 'word': 'São Paulo', 'start': 228, 'end': 237}, {'entity_group': 'LOC', 'score': 0.99942905, 'word': 'Brazil', 'start': 239, 'end': 245}, {'entity_group': 'PER', 'score': 0.9982283, 'word': 'Neymar', 'start': 247, 'end': 253}, {'entity_group': 'MISC', 'score': 0.99454343, 'word': 'Copa Libertadores', 'start': 350, 'end': 367}, {'entity_grou

In [7]:
# Analyze named entities in a text about Egypt
text = """
Egypt, officially the Arab Republic of Egypt,
is a transcontinental country spanning the northeast corner of Africa and southwest corner of Asia by a land bridge formed by the Sinai Peninsula.
 Its capital city is Cairo, which is one of the largest cities in the world.
  The country is famous for its ancient civilization and some of the world's most famous monuments, including the Giza pyramids,
   the Great Sphinx, and the ancient temples of Luxor dating back to thousands of years.
    Egypt is bordered by the Mediterranean Sea to the north, the Gaza Strip and Israel to the northeast,
    # the Red Sea to the east, Sudan to the south, and Libya to the west. The current president of Egypt is Abdel Fattah el-Sisi.
"""

ner_result = ner_pipeline(text)

# Print the results
print("Named Entity Recognition Result:", ner_result)

Named Entity Recognition Result: [{'entity_group': 'LOC', 'score': 0.9996828, 'word': 'Egypt', 'start': 1, 'end': 6}, {'entity_group': 'LOC', 'score': 0.9801058, 'word': 'Arab Republic of Egypt', 'start': 23, 'end': 45}, {'entity_group': 'LOC', 'score': 0.9993819, 'word': 'Africa', 'start': 110, 'end': 116}, {'entity_group': 'LOC', 'score': 0.99935454, 'word': 'Asia', 'start': 141, 'end': 145}, {'entity_group': 'LOC', 'score': 0.9968093, 'word': 'Sinai Peninsula', 'start': 177, 'end': 192}, {'entity_group': 'LOC', 'score': 0.99959904, 'word': 'Cairo', 'start': 215, 'end': 220}, {'entity_group': 'LOC', 'score': 0.99105203, 'word': 'Giza', 'start': 385, 'end': 389}, {'entity_group': 'LOC', 'score': 0.59441733, 'word': 'Great S', 'start': 407, 'end': 414}, {'entity_group': 'LOC', 'score': 0.86501056, 'word': '##nx', 'start': 417, 'end': 419}, {'entity_group': 'LOC', 'score': 0.9921244, 'word': 'Luxor', 'start': 448, 'end': 453}, {'entity_group': 'LOC', 'score': 0.9997489, 'word': 'Egypt',

### 3 - Question Answering

In [8]:
# Load the question-answering pipeline
qa_pipeline = pipeline("question-answering")

# Define the context about Messi
context = """
Lionel Messi, born on June 24, 1987, in Rosario, Argentina,
is widely considered one of the greatest footballers of all time.
 He began his career with FC Barcelona, where he won numerous titles,
 including La Liga and the UEFA Champions League. Messi has also been awarded the Ballon d'Or multiple times,
  recognizing him as the world's best player. In August 2021, Messi signed with Paris Saint-Germain (PSG) after leaving Barcelona due to financial issues faced by the club.
   He has also been a key player for the Argentina national team, leading them to victory in the Copa América 2021.
"""

# Define questions about Messi
questions = [
    "When was Lionel Messi born?",
    "Which club did Messi join after leaving Barcelona?",
    "How many times has Messi won the Ballon d'Or?",
    "What major international title did Messi win with Argentina?"
]

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [9]:
# Get answers to the questions
for question in questions:
    result = qa_pipeline(question=question, context=context)
    print(f"Question: {question}")
    print(f"Answer: {result['answer']}\n")

Question: When was Lionel Messi born?
Answer: June 24, 1987

Question: Which club did Messi join after leaving Barcelona?
Answer: Paris Saint-Germain

Question: How many times has Messi won the Ballon d'Or?
Answer: multiple

Question: What major international title did Messi win with Argentina?
Answer: Copa América 2021



### 4 - Text Generation

In [10]:
# Load the text-generation pipeline
text_gen_pipeline = pipeline("text-generation", model="gpt2")

# Define the prompt for text generation about animals
prompt = "In the heart of the jungle, there was a hidden sanctuary where animals of all kinds lived in harmony. The elephants would"

# Generate text
text_gen_result = text_gen_pipeline(prompt, max_length=100, num_return_sequences=1)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [11]:
# Print the generated text
print("Generated Text:")
print(text_gen_result[0]['generated_text'])

Generated Text:
In the heart of the jungle, there was a hidden sanctuary where animals of all kinds lived in harmony. The elephants would feed and play freely under the stars - only this did not stop the elephants from singing and dancing.

Today, all of these wild, magnificent creatures are in their infancy, yet with the help of such amazing care from their masters as the famous sculptors of the present day.

This sanctuary exists on the outskirts of the park in Naga, near the village of


In [12]:
# Define the prompt for text generation about animals
prompt = "Cats are known for their"

# Generate text
text_gen_result = text_gen_pipeline(prompt, max_length=50, num_return_sequences=1)

# Print the generated text
print("Generated Text:")
print(text_gen_result[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text:
Cats are known for their large head, and their eyes are so narrow that they cannot see far. Cats with eyes larger than six inches cannot see through wood with their long, round ears.

A large cat is a cat who has at


### 5 - Text Summarization

In [13]:
# Load the summarization pipeline with T5-small model
summarizer = pipeline("summarization", model="t5-small")

# Expanded text to summarize
text = """
Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.

AI systems can be classified into two types: narrow AI and general AI. Narrow AI, also known as weak AI, is designed and trained for a specific task, such as facial recognition or language translation. It operates within a limited scope and does not possess consciousness or general reasoning abilities. Examples of narrow AI include virtual assistants like Siri and Alexa, and recommendation systems used by streaming services and online retailers.

On the other hand, general AI, also known as strong AI or AGI (Artificial General Intelligence), aims to outperform human intelligence across a wide range of tasks, including problem-solving and creative activities. General AI remains largely theoretical and is a subject of ongoing research and debate within the AI community.

The field of AI includes various sub-disciplines, such as machine learning, which focuses on the development of algorithms that enable computers to learn from and make predictions based on data. Deep learning, a subset of machine learning, utilizes neural networks with many layers to analyze complex patterns in large datasets. Other areas of AI research include natural language processing, robotics, and computer vision.

As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. For instance, optical character recognition, which was once regarded as a sophisticated AI task, is now a routine technology widely used in document digitization and data entry processes.
"""

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

In [14]:
# Generate a summary
summary = summarizer(text, max_length=150, min_length=75, do_sample=False)

# Print the summary
print("Summary:")
print(summary[0]['summary_text'])

Summary:
general AI, also known as strong AI, aims to outperform human intelligence across a wide range of tasks, including problem-solving and creative activities . the field of AI includes various sub-disciplines, such as machine learning, which focuses on the development of algorithms that enable computers to learn from and make predictions based on data .


### 6 - Text Translation

In [15]:
# Load the translation pipeline for English to French
translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



In [16]:
# Define the text to translate
text = "Artificial intelligence is transforming the way we live and work."

# Perform translation
translation = translator(text, max_length=40)

# Print the translated text
print("Translated Text:")
print(translation[0]['translation_text'])

Translated Text:
L'intelligence artificielle transforme notre façon de vivre et de travailler.


### 7 - Fill_Mask

In [17]:
# Load the fill-mask pipeline
fill_mask = pipeline("fill-mask", model="bert-base-uncased")

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [18]:
# Define the text with a mask
text = "Artificial intelligence is transforming the way we [MASK] and work."

# Perform mask filling
predictions = fill_mask(text)

# Print the predicted words
print("Predicted Words:")
for prediction in predictions:
    print(f"Word: {prediction['token_str']}, Score: {prediction['score']}")

Predicted Words:
Word: live, Score: 0.9425740242004395
Word: think, Score: 0.04221838712692261
Word: learn, Score: 0.0027025111485272646
Word: talk, Score: 0.0013789129443466663
Word: look, Score: 0.0008475871291011572


### 8 - Chatbot

In [19]:
# Load pre-trained model and tokenizer
model_name = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/863M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [21]:
def get_response(user_input):
    # Encode user input and generate a response
    inputs = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt')
    response_ids = model.generate(inputs, max_length=1000, pad_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(response_ids[:, inputs.shape[-1]:][0], skip_special_tokens=True)
    return response

# Example usage
user_input = "Hello, how are you?"
response = get_response(user_input)
print(response)

user_input = "What is Neymar"
response = get_response(user_input)
print(response)

user_input = "What is Sport"
response = get_response(user_input)
print(response)

I'm good, how are you?
A Brazilian footballer
It's a sport that's played in the US.


In [27]:
while True:
    user_input = input("You : ")
    if user_input =="exit".lower():
        print ('Bot :  Bye')
        break
    response = response = get_response(user_input)
    print('Bot ; ' , response)

You :   hi chat , how are you


Bot ;  I'm good, how are you?


You :  what is Machine Learning ?


Bot ;  It's a field of research that is used to make machine learning algorithms.


You :  what is Computer Vision ?


Bot ;  It's a new thing that's been around for a while.


You :  what is Computer Vision and Image Processing ?


Bot ;  Computer vision is the process of seeing things in a computer. Image processing is the process of seeing things in a computer.


You :  what is Programming Languages ?


Bot ;  It's a language that is used to program.


You :  Python Programming Languages  


Bot ;  I'm not sure if you're serious, but I'm pretty sure that's a joke.


You :  i am very happy


Bot ;  I'm happy too!


You :  what is football sport ?


Bot ;  It's a sport where you play football.


You :  thanks chat


Bot ;  You're welcome


You :  exit


Bot :  Bye


### 9 - Zero Shot Learning ( Classification )

In [22]:
# Create a zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

def classify_text(text, candidate_labels):
    result = classifier(text, candidate_labels)
    return result


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [28]:
# Example usage
text = "I love playing soccer on weekends."
candidate_labels = ["sports", "cooking", "politics", "technology"]

# Get classification results
result = classify_text(text, candidate_labels)

print("Classification result:", result)

Classification result: {'sequence': 'I love playing soccer on weekends.', 'labels': ['sports', 'technology', 'cooking', 'politics'], 'scores': [0.9962514042854309, 0.001738918712362647, 0.0011786851100623608, 0.0008309668046422303]}


In [29]:
# Define the text to classify and candidate labels
text = "Artificial intelligence is transforming the tech industry."
candidate_labels = ["technology", "healthcare", "sports", "politics"]

# Get classification results
result = classify_text(text, candidate_labels)

print("Classification result:", result)

Classification result: {'sequence': 'Artificial intelligence is transforming the tech industry.', 'labels': ['technology', 'sports', 'healthcare', 'politics'], 'scores': [0.994301438331604, 0.0022269373293966055, 0.0021997641306370497, 0.0012718831421807408]}


In [30]:
# Define the text to classify and emotion labels
text = "I just received wonderful news and I can't stop smiling!"
candidate_labels = ["sad", "happy", "enjoy", "love", "angry", "bored", "excited"]

# Get classification results
result = classify_text(text, candidate_labels)

print("Classification result:", result)

Classification result: {'sequence': "I just received wonderful news and I can't stop smiling!", 'labels': ['excited', 'happy', 'enjoy', 'love', 'angry', 'sad', 'bored'], 'scores': [0.6016135215759277, 0.3182087540626526, 0.07249247282743454, 0.005898480769246817, 0.0006474311812780797, 0.0006373635842464864, 0.0005019706441089511]}
