# Text classification
__pipelines__
* Simple interface
* automatic model selection
* less control
* less flexibility in choice of task

__auto classes__
* Flexibility, customization
* manual setup is complex

Example:
* load pre-trained model weights and tokenizer by name
* model_name aka "model checkpoint"

AutoModel does not provide a head

In [1]:
import torch.nn as nn
from transformers import AutoModel, AutoTokenizer

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

text = "I am an example for text classification"

class SimpleClassifier(nn.Module):
    def __init__(self, input_size, num_classes):
        super(SimpleClassifier, self).__init__()
        self.fc = nn.Linear(input_size, num_classes)

    def forward(self, x):
        return self.fc(x)

  from .autonotebook import tqdm as notebook_tqdm


Tokenize inputs

In [2]:
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=64)

get model's hidden states:
* pooler_output: high-level aggregated
* last_hidden_states: raw

In [3]:
outputs = model(**inputs)
pooled_output = outputs.pooler_output

In [4]:
outputs.last_hidden_state.shape

torch.Size([1, 9, 768])

In [5]:
pooled_output.shape

torch.Size([1, 768])

Forward through custom classification head to obtain class probabilities

In [6]:
import torch

classifier_head = SimpleClassifier(pooled_output.size(-1), num_classes=2)
logits = classifier_head(pooled_output)
probs = torch.softmax(logits, dim=1)
probs

tensor([[0.4355, 0.5645]], grad_fn=<SoftmaxBackward0>)

Autoclass with preconfigured head

AutoModelForSequenceClassification: sentiment classification in a 5 star rating scale

In [7]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)



In [8]:
text = "The quality of the product was just okay."
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)
logits = outputs.logits

predicted_class = torch.argmax(logits, dim=1).item()
predicted_class + 1

3

# Text generation

* AutoModelForCausalLM accepts auto-regressive models like gpt2
* Model head for next word prediction
* takes prompt and generates max_length tokens

In [9]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'gpt2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "This is a simple example for text generation,"
inputs = tokenizer.encode(prompt, return_tensors='pt')
output = model.generate(inputs, max_length=26)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
generated_text

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


"This is a simple example for text generation, but it's also a good way to get a feel for how the text is generated"

# Datasets for text classification
Example: load imdb reviews dataset

In [10]:
from datasets import load_dataset
from torch.utils.data import DataLoader

dataset = load_dataset('imdb')
train_data = dataset['train']
dataloader = DataLoader(train_data, batch_size=2, shuffle=True)

In [17]:
# print some examples
i = 0
batch = next(iter(dataloader))
print(f"Example {i + 1}:")
print("------------------------------")
print(batch['text'])
print("------------------------------")
print("Label:", batch['label'][i])

Example 1:
------------------------------
["This film was abysmal. and not in the good way as some have claimed. First off the main character is a very unattractive gingerman. Second - WTF is going on with this van love. The plot, basically, is: boy wants sex so buys a van (which, in fairness is quite cool). Unbelievably given that he looks like a newt he scores with lots of chicks! And he fails with some. Then he scores with a really hot chick and realises he loves this dowdy bird who played hard to get. Then he drag races with the hot chicks boyfriend. And he tips his van. At which point danny devito saves the day. Although he didn't need to because in tipping the van the ginger kid crossed the line first. I gave this 2 *'s as i'm willing to assume that there's some sort of 70's Vanning subculture i'm not getting and also because there's some 70's boobage too.", "Chris Kattan is a great sketch actor on Saturday Night Live...but he should probably leave the movie industry alone unless