# Transformer Implementation Example for Text Classification

**(C) 2024 by [Damir Cavar](http://damir.cavar.me/)**


**Download:** This and various other Jupyter notebooks are available from my [GitHub repo](https://github.com/dcavar/python-tutorial-for-ipython).


Based on [Transformer Text Classification by Eijaz Allibhai](https://github.com/jaz-alli/transformers-text-classification/blob/main/transformers_text_classification.ipynb).


In [None]:
!pip install transformers

Import the `transformer` pipeline:

In [2]:
from transformers import pipeline

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


See the [documentation for the pipeline](https://huggingface.co/docs/transformers/en/main_classes/pipelines) on the website.

In the following we use the zero-shot-classification approach. This is predicting a class for some text that wasn't seen by the model during training. The pre-trained language model is used in that task without tuning or additional training.

Setting the device parameter to 0 will activate GPU-based processing, which might improve the processing time:

    classifier = pipeline("zero-shot-classification", device=0)


In [3]:
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")



We can specify some text and potential labels (or keywords) for it:

In [10]:
sequence = "We like to use Code for Python, but vim for C."
labels = ["automotive", "programming", "politics", "economy"]

The `classifier` will generate a list of labels and a list of scores. The scores in this case sum up to 1 and represent probabilities.

In [11]:
classifier(sequence, labels)

{'sequence': 'We like to use Code for Python, but vim for C.',
 'labels': ['programming', 'economy', 'automotive', 'politics'],
 'scores': [0.894223690032959,
  0.07695318013429642,
  0.021453876048326492,
  0.00736920116469264]}

We can also specify a list of texts to label using the model:

In [7]:
sequences = ["An increase in travel demand is one of the causes of an oil price increase",
            "As expected, polls show that the party in power will lose seats in Congress during the midterm elections.",
            "The party in power may lose seats in the next election due to inflation and recession concerns"]

The `classifier` will return a list of text, label, and score tuples.

In [8]:
classifier(sequences, labels)

[{'sequence': 'An increase in travel demand is one of the causes of an oil price increase',
  'labels': ['economy', 'automotive', 'programming', 'politics'],
  'scores': [0.39332109689712524,
   0.35044941306114197,
   0.15705962479114532,
   0.09916982054710388]},
 {'sequence': 'As expected, polls show that the party in power will lose seats in Congress during the midterm elections.',
  'labels': ['politics', 'economy', 'programming', 'automotive'],
  'scores': [0.9058505892753601,
   0.039337169378995895,
   0.0312370453029871,
   0.023575183004140854]},
 {'sequence': 'The party in power may lose seats in the next election due to inflation and recession concerns',
  'labels': ['economy', 'politics', 'programming', 'automotive'],
  'scores': [0.5537724494934082,
   0.4306282103061676,
   0.009240726940333843,
   0.006358541082590818]}]

It is also possible to use multi-labels for the text, i.e., each label receives a score for how goot it fits the corresponding text. The user needs to specify a threshold of exclusion based on the scores.

In [9]:
classifier(sequences, labels, multi_label=True)

[{'sequence': 'An increase in travel demand is one of the causes of an oil price increase',
  'labels': ['automotive', 'economy', 'programming', 'politics'],
  'scores': [0.009671766310930252,
   0.0009950732346624136,
   0.00010522549564484507,
   4.6886860218364745e-05]},
 {'sequence': 'As expected, polls show that the party in power will lose seats in Congress during the midterm elections.',
  'labels': ['politics', 'economy', 'programming', 'automotive'],
  'scores': [0.9895076751708984,
   0.443315714597702,
   0.09925836324691772,
   0.04091533273458481]},
 {'sequence': 'The party in power may lose seats in the next election due to inflation and recession concerns',
  'labels': ['politics', 'economy', 'programming', 'automotive'],
  'scores': [0.9844666719436646,
   0.9779921174049377,
   0.009205055423080921,
   0.0031888692174106836]}]

(C) 2024 by [Damir Cavar](http://damir.cavar.me/)