## Studying the pretrained model

![image.png](attachment:image.png)

We use WordPiece embeddings (Wu et al.,
2016) with a 30,000 token vocabulary. The first
token of every sequence is always a special clas-
sification token ([CLS]). The final hidden state
corresponding to this token is used as the ag-
gregate sequence representation for classification
tasks. Sentence pairs are packed together into a
single sequence. We differentiate the sentences in
two ways. First, we separate them with a special
token ([SEP]). Second, we add a learned embed-
ding to every token indicating whether it belongs
to sentence A or sentence B. As shown in Figure 1,
we denote input embedding as E, the final hidden
vector of the special [CLS] token as C ∈ RH ,
and the final hidden vector for the ith input token
as Ti ∈ RH .
For a given token, its input representation is
constructed by summing the corresponding token,
segment, and position embeddings. A visualiza-
tion of this construction can be seen in Figure 2

## Masked Language Modelling

![image.png](attachment:image.png)

The training data generator
chooses 15% of the token positions at random for
prediction. If the i-th token is chosen, we replace
the i-th token with (1) the [MASK] token 80% of
the time (2) a random token 10% of the time (3)
the unchanged i-th token 10% of the time. Then,
Ti will be used to predict the original token with
cross entropy loss. 

## Next Sentence Prediction

![image.png](attachment:image.png)

We pre-train for a binarized next sentence prediction task that can be trivially generated from any monolingual corpus. Specifically,
when choosing the sentences A and B for each pre-
training example, 50% of the time B is the actual
next sentence that follows A (labeled as IsNext),
and 50% of the time it is a random sentence from
the corpus (labeled as NotNext).

## Studying the input encodings & encoder outputs 

This page has list of all available bert models on tensorflow hub 

https://www.tensorflow.org/text/tutorials/classify_text_with_bert

In [22]:
pip install --default-timeout=100 tensorflow-text

Note: you may need to restart the kernel to use updated packages.




In [42]:
import tensorflow_hub as hub
import tensorflow_text as text

### Importing models for preprocessing and encoding

In [43]:


preprocess_url = "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"
encoder_url = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4"



In [5]:
bert_preprocess_model = hub.KerasLayer(preprocess_url)

This SavedModel implements the encoder API for text embeddings with transformer encoders. It expects a dict with three int32 Tensors as input: input_word_ids, input_mask, and input_type_ids

In [48]:


text_test = ['I deposited money at the bank','I deposited money at the bank']
text_preprocessed = bert_preprocess_model(text_test)
text_preprocessed.keys()



dict_keys(['input_mask', 'input_word_ids', 'input_type_ids'])

In [49]:
print(f'Keys       : {list(text_preprocessed.keys())}')
print(f'Shape      : {text_preprocessed["input_word_ids"].shape}')
print(f'Word Ids   : {text_preprocessed["input_word_ids"]}')
print(f'Input Mask : {text_preprocessed["input_mask"]}')
print(f'Type Ids   : {text_preprocessed["input_type_ids"]}')


Keys       : ['input_mask', 'input_word_ids', 'input_type_ids']
Shape      : (2, 128)
Word Ids   : [[  101  1045 14140  2769  2012  1996  2924   102     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0]
 [  101  1045 14140  2769  2012  1996  2924   102     0     0     0     0
      0     0     0     0     0     

In [46]:


bert_model = hub.KerasLayer(encoder_url)




In [50]:
bert_results = bert_model(text_preprocessed)

print(f'Loaded BERT: {encoder_url}')
print(f'Pooled Outputs Shape:{bert_results["pooled_output"].shape}')
print(f'Pooled Outputs Values:{bert_results["pooled_output"][0, :12]}')
print(f'Sequence Outputs Shape:{bert_results["sequence_output"].shape}')
print(f'Sequence Outputs Values:{bert_results["sequence_output"][0, :12]}')


Loaded BERT: https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4
Pooled Outputs Shape:(2, 768)
Pooled Outputs Values:[-0.6335036  -0.15912798  0.70064145  0.23789899 -0.36264604 -0.03991608
  0.7103103   0.06206353  0.5491746  -0.9981458   0.27230787 -0.00536387]
Sequence Outputs Shape:(2, 128, 768)
Sequence Outputs Values:[[ 0.2648022   0.31464565  0.07354611 ...  0.03115616  0.00530342
   0.24055707]
 [ 0.70859504 -0.01121734  0.0890084  ... -0.49432442  0.5450213
  -0.03496337]
 [ 0.18905516 -0.3614526   0.4886495  ...  0.06284427 -0.29083413
  -0.02081757]
 ...
 [ 0.09602314 -0.09523273  0.30062553 ...  0.6033697   0.10209306
  -0.11433922]
 [ 0.16647843 -0.08120763  0.3003333  ...  0.49049643  0.11249188
   0.00384711]
 [ 0.00185447 -0.12702593  0.33271322 ...  0.6269296   0.09413846
  -0.0849941 ]]


In [51]:
bert_results['encoder_outputs'][-1] == bert_results['sequence_output']

<tf.Tensor: shape=(2, 128, 768), dtype=bool, numpy=
array([[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]]])>

In [52]:
bert_results['sequence_output'][0][6] == bert_results['sequence_output'][1][6]

<tf.Tensor: shape=(768,), dtype=bool, numpy=
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  Tr

In [19]:
bert_results['sequence_output'][0][127]

<tf.Tensor: shape=(768,), dtype=float32, numpy=
array([ 8.51813182e-02,  2.93973964e-02,  5.13733447e-01, -6.48406148e-02,
        8.87904018e-02,  5.77466339e-02, -4.64034341e-02,  4.96545017e-01,
        1.86256886e-01, -3.35414618e-01,  7.81645030e-02,  1.94389895e-01,
        1.81618601e-01,  4.08956781e-02, -1.48688063e-01, -9.82638970e-02,
        4.77720708e-01,  2.02201843e-01,  3.47057849e-01, -1.30449295e-01,
       -1.54556870e-01, -1.61552489e-01,  2.20342517e-01,  4.09840010e-02,
        2.22101226e-01,  7.76365250e-02, -4.95887011e-01,  2.25153744e-01,
       -1.93176344e-01, -3.03927977e-02,  4.32102889e-01, -7.72122853e-03,
        1.81106403e-01,  2.51153857e-01, -2.51191556e-01,  7.61121511e-04,
       -2.83252716e-01, -2.69751638e-01, -5.14041930e-02, -3.27342242e-01,
        1.50985494e-02, -3.07712350e-02,  5.10364398e-02, -2.06841871e-01,
       -6.79815173e-01, -5.20147011e-02, -1.26541898e-01, -1.03230052e-01,
       -4.17382210e-01, -5.59072137e-01, -2.21037082

In [6]:
pip install tf-models-official --upgrade --force-reinstall --user


Collecting tf-models-official
  Using cached tf_models_official-2.13.1-py2.py3-none-any.whl (2.6 MB)
Collecting psutil>=5.4.3
  Using cached psutil-5.9.5-cp36-abi3-win_amd64.whl (255 kB)
Collecting tensorflow-datasets
  Using cached tensorflow_datasets-4.9.2-py3-none-any.whl (5.4 MB)
Collecting sentencepiece
  Using cached sentencepiece-0.1.99-cp39-cp39-win_amd64.whl (977 kB)
Collecting scipy>=0.19.1
  Using cached scipy-1.11.2-cp39-cp39-win_amd64.whl (44.1 MB)
Collecting oauth2client
  Using cached oauth2client-4.1.3-py2.py3-none-any.whl (98 kB)
Collecting gin-config
  Using cached gin_config-0.5.0-py3-none-any.whl (61 kB)
Collecting seqeval
  Using cached seqeval-1.2.2-py3-none-any.whl
Collecting numpy>=1.20
  Using cached numpy-1.25.2-cp39-cp39-win_amd64.whl (15.6 MB)
Collecting immutabledict
  Using cached immutabledict-3.0.0-py3-none-any.whl (4.0 kB)
Collecting tf-models-official
  Using cached tf_models_official-2.13.0-py2.py3-none-any.whl (2.6 MB)
  Using cached tf_models_offici

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiohttp 3.8.3 requires charset-normalizer<3.0,>=2.0, but you have charset-normalizer 3.2.0 which is incompatible.


## Sentiment Classification ( Fine tuning)

![image.png](attachment:image.png)

In [20]:
import os
import shutil

import tensorflow as tf
#from tensorflow_models import nlp
#from official.nlp import optimization  # to create AdamW optimizer

#import matplotlib.pyplot as plt

tf.get_logger().setLevel('ERROR')


In [21]:
url = 'https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'



In [53]:
dataset = tf.keras.utils.get_file('aclImdb_v1.tar.gz', url,
                                  untar=True, cache_dir='.',
                                  cache_subdir='')

dataset_dir = os.path.join(os.path.dirname(dataset), 'aclImdb')

train_dir = os.path.join(dataset_dir, 'train')

# remove unused folders to make it easier to load the data
remove_dir = os.path.join(train_dir, 'unsup')
shutil.rmtree(remove_dir)


In [23]:
AUTOTUNE = tf.data.AUTOTUNE
batch_size = 32
seed = 42

raw_train_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=batch_size,
    validation_split=0.2,
    subset='training',
    seed=seed)

class_names = raw_train_ds.class_names
train_ds = raw_train_ds.cache().prefetch(buffer_size=AUTOTUNE)

val_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=batch_size,
    validation_split=0.2,
    subset='validation',
    seed=seed)

val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

test_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/test',
    batch_size=batch_size)

test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)


Found 25000 files belonging to 2 classes.
Using 20000 files for training.
Found 25000 files belonging to 2 classes.
Using 5000 files for validation.
Found 25000 files belonging to 2 classes.


In [24]:
for text_batch, label_batch in train_ds.take(1):
  for i in range(3):
    print(f'Review: {text_batch.numpy()[i]}')
    label = label_batch.numpy()[i]
    print(f'Label : {label} ({class_names[label]})')


Review: b'"Pandemonium" is a horror movie spoof that comes off more stupid than funny. Believe me when I tell you, I love comedies. Especially comedy spoofs. "Airplane", "The Naked Gun" trilogy, "Blazing Saddles", "High Anxiety", and "Spaceballs" are some of my favorite comedies that spoof a particular genre. "Pandemonium" is not up there with those films. Most of the scenes in this movie had me sitting there in stunned silence because the movie wasn\'t all that funny. There are a few laughs in the film, but when you watch a comedy, you expect to laugh a lot more than a few times and that\'s all this film has going for it. Geez, "Scream" had more laughs than this film and that was more of a horror film. How bizarre is that?<br /><br />*1/2 (out of four)'
Label : 0 (neg)
Review: b"David Mamet is a very interesting and a very un-equal director. His first movie 'House of Games' was the one I liked best, and it set a series of films with characters whose perspective of life changes as they

![image.png](attachment:image.png)

In [25]:
def build_classifier_model():
  text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
  preprocessing_layer = hub.KerasLayer(preprocess_url, name='preprocessing')
  encoder_inputs = preprocessing_layer(text_input)
  encoder = hub.KerasLayer(encoder_url, trainable=True, name='BERT_encoder')
  outputs = encoder(encoder_inputs)
  net = outputs['pooled_output']
  net = tf.keras.layers.Dropout(0.1)(net)
  net = tf.keras.layers.Dense(1, activation=None, name='classifier')(net)
  return tf.keras.Model(text_input, net)


In [26]:
classifier_model = build_classifier_model()
bert_raw_result = classifier_model(tf.constant(text_test))
print(tf.sigmoid(bert_raw_result))


tf.Tensor(
[[0.6851285]
 [0.6978967]], shape=(2, 1), dtype=float32)


In [30]:
pip install graphviz

Note: you may need to restart the kernel to use updated packages.




Collecting graphviz
  Downloading graphviz-0.20.1-py3-none-any.whl (47 kB)
     ---------------------------------------- 0.0/47.0 kB ? eta -:--:--
     ---------------------------------- ----- 41.0/47.0 kB 1.9 MB/s eta 0:00:01
     -------------------------------------- 47.0/47.0 kB 782.9 kB/s eta 0:00:00
Installing collected packages: graphviz
Successfully installed graphviz-0.20.1


In [33]:
tf.keras.utils.plot_model(classifier_model)



You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model to work.


In [32]:
loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
metrics = tf.metrics.BinaryAccuracy()


In [33]:
from official.nlp import optimization  # to create AdamW optimizer

In [34]:
epochs = 5
steps_per_epoch = tf.data.experimental.cardinality(train_ds).numpy()
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1*num_train_steps)

init_lr = 3e-5
optimizer = optimization.create_optimizer(init_lr=init_lr,
                                          num_train_steps=num_train_steps,
                                          num_warmup_steps=num_warmup_steps,
                                          optimizer_type='adamw')


In [35]:
classifier_model.compile(optimizer=optimizer,
                         loss=loss,
                         metrics=metrics)


In [36]:
print(f'Training model with {encoder_url}')
history = classifier_model.fit(x=train_ds,
                               validation_data=val_ds,
                               epochs=epochs)


Training model with https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4
Epoch 1/5


KeyboardInterrupt: 

In [41]:
loss, accuracy = classifier_model.evaluate(test_ds)

print(f'Loss: {loss}')
print(f'Accuracy: {accuracy}')


Loss: 0.606525719165802
Accuracy: 0.8858000040054321


In [54]:
dataset_name = 'imdb'
saved_model_path = './{}_bert'.format(dataset_name.replace('/', '_'))
reloaded_model = tf.saved_model.load(saved_model_path)


In [57]:
def print_my_examples(inputs, results):
  result_for_printing = \
    [f'input: {inputs[i]:<30} : score: {results[i][0]:.6f}'
                         for i in range(len(inputs))]
  print(*result_for_printing, sep='\n')
  print()


examples = [
    'Tranformers are outdated.',
    'I love LLMs!', 
    'BERT Implementation is complex',
    'This was not good enough .',    
    'The movie was terrible...'
]

reloaded_results = tf.sigmoid(reloaded_model(tf.constant(examples)))
#original_results = tf.sigmoid(classifier_model(tf.constant(examples)))


print('Results from the saved model:')
print_my_examples(examples, reloaded_results)



Results from the saved model:
input: Tranformers are outdated.      : score: 0.137959
input: I love LLMs!                   : score: 0.968002
input: BERT Implementation is complex : score: 0.943607
input: This was not good enough .     : score: 0.063631
input: The movie was terrible...      : score: 0.001397



## Spam classification using DISTILBert

### Importing the dataset

In [None]:


import pandas as pd
df=messages = pd.read_csv('SMSSpamCollection', sep='\t',
                           names=["label", "message"])
df.head()


In [22]:
df.shape
     

(5572, 2)

In [23]:

X=list(df['message'])  
y=list(df['label'])

In [24]:

y=list(pd.get_dummies(y,drop_first=True)['spam'])

In [25]:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
     

In [26]:
X

['Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...',
 'Ok lar... Joking wif u oni...',
 "Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's",
 'U dun say so early hor... U c already then say...',
 "Nah I don't think he goes to usf, he lives around here though",
 "FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv",
 'Even my brother is not like to speak with me. They treat me like aids patent.',
 "As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callertune for all Callers. Press *9 to copy your friends Callertune",
 'WINNER!! As a valued network customer you have been selected to receivea £900 prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only.',
 'Had you

### encoding the inputs

In [17]:

from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
     

In [18]:


train_encodings = tokenizer(X_train, truncation=True, padding=True)
test_encodings = tokenizer(X_test, truncation=True, padding=True)

In [27]:
y_train

[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,


In [28]:
train_encodings[0]

Encoding(num_tokens=238, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])

In [29]:


import tensorflow as tf

train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    y_train
))

test_dataset = tf.data.Dataset.from_tensor_slices((
    dict(test_encodings),
    y_test
))

### fine tuning the BERT model

In [30]:


from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments

training_args = TFTrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=2,              # total number of training epochs
    per_device_train_batch_size=8,  # batch size per device during training
    per_device_eval_batch_size=16,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    eval_steps = 10,
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)
     


In [31]:
with training_args.strategy.scope():
    model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")

trainer = TFTrainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=test_dataset            # evaluation dataset
)

trainer.train()

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_projector.bias', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_transform.weight']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFDistilBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']
You should 

In [32]:

trainer.evaluate(test_dataset)
     
trainer.predict(test_dataset)

PredictionOutput(predictions=array([[ 3.1022956, -3.1589427],
       [-3.0821607,  3.136658 ],
       [ 3.0509737, -3.1124434],
       ...,
       [ 3.0331042, -3.034146 ],
       [-3.1517704,  3.2355945],
       [ 2.8066628, -2.850092 ]], dtype=float32), label_ids=array([0, 1, 0, ..., 0, 1, 0]), metrics={'eval_loss': 0.021236983367374965})

In [33]:

output=trainer.predict(test_dataset)[1]

In [34]:

from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test,output)
cm

array([[955,   0],
       [  0, 160]], dtype=int64)

## ZeroShot inference


In [3]:
from transformers import pipeline

pipe = pipeline(model="facebook/bart-large-mnli")
pipe("I have a problem with my iphone that needs to be resolved asap!",
    candidate_labels=["urgent","not urgent", "phone", "tablet", "computer"],
)


  from .autonotebook import tqdm as notebook_tqdm
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████| 1.15k/1.15k [00:00<00:00, 385kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading model.safetensors: 100%|██████████████████████████████████████████████| 1.63G/1.63G [18:28<00:00, 1.47MB/s]
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing TFBartForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are in

{'sequence': 'I have a problem with my iphone that needs to be resolved asap!',
 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'],
 'scores': [0.5227571725845337,
  0.458141028881073,
  0.014264789409935474,
  0.0026849950663745403,
  0.0021520687732845545]}