# Hugging Face Transformers 🤗

The Hugging Face transformers package is an immensely popular Python library providing pretrained models that are extraordinarily useful for a variety of natural language processing (NLP) tasks. It previously supported only PyTorch, but, as of late 2019, TensorFlow 2 is supported as well. While the library can be used for many tasks from Natural Language Inference (NLI) to Question-Answering, text classification remains one of the most popular and practical use cases.

Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax, PyTorch and TensorFlow.

https://huggingface.co/transformers/

In [0]:
from transformers import pipeline

# Zero Shot Classification

In [0]:
classifier = pipeline('zero-shot-classification')

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading config.json:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [0]:
text = """
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.

In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more.

The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise. Seasoned data scientists are often difficult to find and expensive to hire but citizen data scientists can be an effective way to mitigate this gap and address data-related challenges in the business setting.

PyCaret is a great library which not only simplifies the machine learning tasks for citizen data scientists but also helps new startups to reduce the cost of investing in a team of data scientists. Therefore, this library has not only helped the citizen data scientists but has also helped individuals who want to start exploring the field of data science, having no prior knowledge in this field. Iniitial idea of PyCaret was inspired by Caret library in R.
"""

In [0]:
classifier(text, candidate_labels = ['pycaret', 'data science', 'machine learning', 'politics', 'music'])

Out[4]: {'sequence': '\nPyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.\n\nIn comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more.\n\nThe design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required

In [0]:
text2 = """Today, the Honourable Chrystia Freeland, Deputy Prime Minister and Minister of Finance, 
           the Honourable Ahmed Hussen, Minister of Families, Children and Social Development, the Honourable Sandy Silver, 
           Yukon Premier, and the Honourable Jeanie McLean, Yukon Minister of Education, announced an agreement that 
           significantly improves early learning and child care for children in Yukon."""

In [0]:
classifier(text2, candidate_labels = ['education', 'data science', 'environment', 'politics', 'music'])

Out[6]: {'sequence': 'Today, the Honourable Chrystia Freeland, Deputy Prime Minister and Minister of Finance, \n           the Honourable Ahmed Hussen, Minister of Families, Children and Social Development, the Honourable Sandy Silver, \n           Yukon Premier, and the Honourable Jeanie McLean, Yukon Minister of Education, announced an agreement that \n           significantly improves early learning and child care for children in Yukon.',
 'labels': ['politics', 'education', 'environment', 'music', 'data science'],
 'scores': [0.4679723381996155,
  0.4324468970298767,
  0.050561320036649704,
  0.027336418628692627,
  0.02168296091258526]}

In [0]:
text3 = """Pre-existing T-cell immunity to SARS-CoV-2 in unexposed healthy controls in Ecuador, as 
            detected with a COVID-19 Interferon-Gamma Release Assay."""

In [0]:
classifier(text3, candidate_labels = ['COVID-19', 'health', 'virus', 'politics', 'music'])

Out[8]: {'sequence': 'Pre-existing T-cell immunity to SARS-CoV-2 in unexposed healthy controls in Ecuador, as \n            detected with a COVID-19 Interferon-Gamma Release Assay.',
 'labels': ['COVID-19', 'virus', 'health', 'music', 'politics'],
 'scores': [0.7881993651390076,
  0.1606992483139038,
  0.045646604150533676,
  0.003354237647727132,
  0.002100480254739523]}