# Hugging Face Transformers
The Hugging Face Transformers library provides a vast collection of state-of-the-art pre-trained models for various natural language processing (NLP) tasks. It simplifies the process of implementing these models by offering an intuitive API, allowing developers and researchers to focus on building applications rather than dealing with the intricacies of model training and architecture. With support for tasks like text classification, named entity recognition, question answering, summarization, and zero-shot classification, Hugging Face Transformers enables users to leverage powerful models with minimal effort.


In [1]:
#packages
from transformers import pipeline

This package provides the core functionality for working with pre-trained models and pipelines. The pipeline function allows users to easily instantiate a model for a specific NLP task without needing to manage the complexities of the model architecture or training process. By specifying the task type, users can quickly access a variety of models tailored for their needs.

# Text Classifiication

1. **Text Classification Model**: [nlptown/bert-base-multilingual-uncased-sentiment](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment)
   - **Description**: This model performs sentiment analysis by classifying input text into sentiment categories such as positive, negative, or neutral. It is based on the BERT architecture and is fine-tuned for multilingual support, making it suitable for various languages.
   - **Usage**: Ideal for applications that require understanding user sentiment from feedback, reviews, or social media posts.
   - **Output Example**: For an input like `"I'm feeling great today!"`, the output will look like:
     ```json
     [{"label": "5 stars", "score": 0.85}]
     ```
     This indicates a strong positive sentiment, with a score reflecting the model's confidence.

In [2]:
# Text Classification with a different model
# Using a sentiment analysis model for classification
pipe = pipeline(task="text-classification", model="nlptown/bert-base-multilingual-uncased-sentiment")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [3]:
# Run pipe with a single string
result1 = pipe("I'm feeling great today!")
# Display results
print("Text Classification Results:")
result1

Text Classification Results:


[{'label': '5 stars', 'score': 0.7518251538276672}]

In [4]:
# Run pipe with a list of strings
result2 = pipe(["I love programming!", "I'm really upset with the weather."])# Display results
print("Text Classification Results:")
result2

Text Classification Results:


[{'label': '5 stars', 'score': 0.8564531207084656},
 {'label': '2 stars', 'score': 0.4690088629722595}]

# Name Entity Recognition

2. **Named Entity Recognition Model**: [dbmdz/bert-large-cased-finetuned-conll03-english](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)
   - **Description**: This model identifies and classifies named entities within text, including organizations, people, dates, and locations. It is fine-tuned on the CoNLL-03 dataset, which is a benchmark for NER tasks.
   - **Usage**: Useful for extracting important information from unstructured text data, such as news articles or reports.
   - **Output Example**: For the input `"Microsoft was founded by Bill Gates and Paul Allen in 1975 in Albuquerque."`, the output will look like:
     ```json
     [
       {"word": "Microsoft", "score": 0.99, "entity": "B-ORG"},
       {"word": "Bill Gates", "score": 0.98, "entity": "B-PER"},
       {"word": "Paul Allen", "score": 0.97, "entity": "B-PER"},
       {"word": "1975", "score": 0.96, "entity": "B-DATE"},
       {"word": "Albuquerque", "score": 0.95, "entity": "B-LOC"}
     ]
     ```
     This output provides the recognized entities along with their types and confidence scores.



In [5]:
# Named Entity Recognition with a different model
# Using a fine-tuned NER model
ner_pipe = pipeline(task="ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [6]:
ner_result = ner_pipe("Microsoft was founded by Bill Gates and Paul Allen in 1975 in Albuquerque.")

In [7]:
print("Named Entity Recognition Results:")
ner_result

Named Entity Recognition Results:


[{'entity': 'I-ORG',
  'score': 0.99945015,
  'index': 1,
  'word': 'Microsoft',
  'start': 0,
  'end': 9},
 {'entity': 'I-PER',
  'score': 0.99527526,
  'index': 5,
  'word': 'Bill',
  'start': 25,
  'end': 29},
 {'entity': 'I-PER',
  'score': 0.99724114,
  'index': 6,
  'word': 'Gates',
  'start': 30,
  'end': 35},
 {'entity': 'I-PER',
  'score': 0.99846363,
  'index': 8,
  'word': 'Paul',
  'start': 40,
  'end': 44},
 {'entity': 'I-PER',
  'score': 0.99925965,
  'index': 9,
  'word': 'Allen',
  'start': 45,
  'end': 50},
 {'entity': 'I-LOC',
  'score': 0.99683386,
  'index': 13,
  'word': 'Albuquerque',
  'start': 62,
  'end': 73}]

# Question - Answering

3. **Question Answering Model**: [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squad)
   - **Description**: This model answers questions based on a provided context. It is a distilled version of BERT, optimized for faster inference while maintaining accuracy. It is specifically trained on the SQuAD (Stanford Question Answering Dataset).
   - **Usage**: Suitable for building chatbots, search engines, or any application where automated question answering is needed.
   - **Output Example**: For the context `"The Eiffel Tower is located in Paris, France."` and the question `"Where is the Eiffel Tower?"`, the output will look like:
     ```json
     {"answer": "Paris, France", "score": 0.95, "start": 28, "end": 42}
     ```
     This indicates the answer found in the context, along with its confidence score and the character indices of the answer in the context.

In [8]:
# Question Answering with a different context
qa_pipe = pipeline(task="question-answering", model="distilbert-base-uncased-distilled-squad")
qa_result = qa_pipe(context="The Eiffel Tower is located in Paris, France.", question="Where is the Eiffel Tower?")
print("Question Answering Result:")
qa_result

Question Answering Result:


{'score': 0.8541361093521118,
 'start': 31,
 'end': 44,
 'answer': 'Paris, France'}

# Text Summarization

4. **Text Summarization Model**: [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)
   - **Description**: This model generates concise summaries from longer texts while preserving the main ideas. It uses a sequence-to-sequence architecture that combines the strengths of both BERT and GPT models. The `facebook/bart-large-cnn` variant is fine-tuned for summarization tasks.
   - **Usage**: Great for applications requiring summarization of articles, reports, or lengthy documents for quick understanding.
   - **Output Example**: For the provided article about Hugging Face, the output may look like:
     ```json
     "Hugging Face, founded in 2016, has become a leader in Natural Language Processing, enabling collaboration and code sharing."
     ```
     This provides a concise summary, highlighting the key information from the input text.

In [9]:
# Text Summarization with a different model
summarization_pipe = pipeline("summarization", model="facebook/bart-large-cnn")

In [10]:
summary_article = """Hugging Face has transformed how the machine learning community collaborates and shares code.
Founded in 2016, it quickly became a leader in Natural Language Processing, allowing researchers and developers
to share models and datasets. The platform has grown rapidly, attracting investment and expanding its offerings,
including training, deployment, and user-friendly tools for various applications."""
summary_result = summarization_pipe(summary_article, min_length=30, max_length=50)

In [11]:
print("Text Summarization Result:")
summary_result

Text Summarization Result:


[{'summary_text': 'Hugging Face has transformed how the machine learning community collaborates and shares code. The platform has grown rapidly, attracting investment and expanding its offerings,  including training, deployment, and user-friendly tools.'}]

# Zero Shot Classification

5. **Zero-Shot Classification Model**: [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli)
   - **Description**: This model performs classification tasks without requiring specific training for the labels involved. It uses the natural language inference (NLI) task to determine the most appropriate labels for given text, allowing for flexible classification.
   - **Usage**: Ideal for scenarios where predefined categories may change frequently, such as content moderation or topic classification in dynamic environments.
   - **Output Example**: For input `"The quick brown fox jumps over the lazy dog."` and candidate labels `["literature", "animals", "philosophy"]`, the output may look like:
     ```json
     {
       "sequence": "The quick brown fox jumps over the lazy dog.",
       "labels": ["animals", "literature", "philosophy"],
       "scores": [0.92, 0.05, 0.03]
     }
     ```
     This indicates that the text is classified as belonging to the "animals" category with high confidence, followed by lower confidence scores for other labels.


In [12]:
import pandas as pd
zero_shot_classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
documents = [
    "The quick brown fox jumps over the lazy dog.",
    "To be or not to be, that is the question.",
    "A journey of a thousand miles begins with a single step."
]
candidate_labels = ["literature", "animals", "philosophy"]

In [13]:
# Classify documents
zero_shot_results = zero_shot_classifier(documents, candidate_labels=candidate_labels)

In [14]:
print("Zero-Shot Classification Results:")
zero_shot_results


Zero-Shot Classification Results:


[{'sequence': 'The quick brown fox jumps over the lazy dog.',
  'labels': ['animals', 'literature', 'philosophy'],
  'scores': [0.9911988973617554, 0.005967895500361919, 0.0028331829234957695]},
 {'sequence': 'To be or not to be, that is the question.',
  'labels': ['philosophy', 'literature', 'animals'],
  'scores': [0.7825393676757812, 0.1451304852962494, 0.07233018428087234]},
 {'sequence': 'A journey of a thousand miles begins with a single step.',
  'labels': ['philosophy', 'literature', 'animals'],
  'scores': [0.5001646876335144, 0.27931058406829834, 0.22052477300167084]}]

In [15]:
# Flagging multiple labels
multi_label_result1 = zero_shot_classifier(documents[0], candidate_labels=candidate_labels, multi_label=True)
multi_label_result2 = zero_shot_classifier(documents[1], candidate_labels=candidate_labels, multi_label=True)
multi_label_result3 = zero_shot_classifier(documents[2], candidate_labels=candidate_labels, multi_label=True)

In [16]:
print("Flagging Multiple Labels:")
multi_label_result1

Flagging Multiple Labels:


{'sequence': 'The quick brown fox jumps over the lazy dog.',
 'labels': ['animals', 'literature', 'philosophy'],
 'scores': [0.9961634278297424, 0.027642544358968735, 0.0013539043720811605]}

In [17]:
multi_label_result2

{'sequence': 'To be or not to be, that is the question.',
 'labels': ['philosophy', 'literature', 'animals'],
 'scores': [0.814192533493042, 0.11284735053777695, 0.011561857536435127]}

In [18]:
multi_label_result3

{'sequence': 'A journey of a thousand miles begins with a single step.',
 'labels': ['philosophy', 'literature', 'animals'],
 'scores': [0.3544122278690338, 0.045395947992801666, 0.016817163676023483]}