# CS549 Machine Learning - Irfan Khan
# Assignment 11: Transformer and Transformer-based Models

Updated Assignment designed by Yang Xu, Ex-Assistant Professor of Computer Science, San Diego State University

**Total points: 15**

In this assignment, you will do two things: 
1) Implement the **multiple head attention** sub layer in a transformer encoder.
2) Play with the transformer-based models provided in **transformers** for multiple natural language processing (NLP) tasks.

https://colab.research.google.com/drive/1rPk3ohrmVclqhH7uQ7qys4oznDdAhpzF#scrollTo=jUTNr15JBkSG

# Import Libraries and set Seed

In [1]:
import math
import numpy as np
import torch
import torch.nn as nn
from scipy.special import softmax
from torch.nn.functional import cosine_similarity

import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"   # This is just for my end, messed up my installs

#print(torch.__version__)

## Task 1. Implement the multiple head attention sub layer
**Total Points: 8**

### 1.1 Initialize input data

***1 point***<br>
Step 1, generate some random input data in the shape of ${m_{inputs}} x {d_{model}}$. *Hint*: Use `np.random.rand()`.

In [2]:
np.random.seed(0) # Do not remove this line

d_model = 512
m_inputs = 3

### START YOUR CODE ###
x = np.random.rand(m_inputs, d_model)
### END YOUR CODE ###

In [3]:
# Do not change the code in this cell
print('x:', x)
print('x.shape:', x.shape)

x: [[0.5488135  0.71518937 0.60276338 ... 0.44613551 0.10462789 0.34847599]
 [0.74009753 0.68051448 0.62238443 ... 0.6204999  0.63962224 0.9485403 ]
 [0.77827617 0.84834527 0.49041991 ... 0.07382628 0.49096639 0.7175595 ]]
x.shape: (3, 512)


**Expected output**\
x: [[0.5488135  0.71518937 0.60276338 ... 0.44613551 0.10462789 0.34847599]\
 [0.74009753 0.68051448 0.62238443 ... 0.6204999  0.63962224 0.9485403 ]\
 [0.77827617 0.84834527 0.49041991 ... 0.07382628 0.49096639 0.7175595 ]]\
x.shape: (3, 512)

---
### 1.2 Create weight matrices for *query*, *key*, and *value*

***2 points***

Step 2, create the weight matrices into the correct dimensions. 

Let's start with `W_query` and `Q`. *Hint*: We first initialize an empty tensor `W` in the dimension of `(d_model, d_k)`, using the `torch.empty()` function. Then we initialize it with `nn.init.xavier_uniform_()`.

After `W_query` is initialized, we can get the query matrix `Q` with a multiplication between `x` and `W_query`. *Hint*: Use `np.matmul()`.

In [4]:
torch.manual_seed(0) # Do not remove this line

n_heads = 8
d_k = d_model // n_heads #Integer division

### START YOUR CODE ###
# Create an empty tensor W with the correct dimension.
W = torch.empty((d_model, d_k))
### END YOUR CODE ###

nn.init.xavier_uniform_(W) # Randomly initialize the values in the tensor.
W_query = W.data.numpy() # Copy out the numpy array

### START YOUR CODE ###
#Hint: use np.matmul()
Q = np.matmul(x, W_query)

### END YOUR CODE ###

In [5]:
# Do not change the code in this cell
print('W_query[0,:5]:', W_query[0,:5])
print('W_query.shape:', W_query.shape)
print('Q[0, :5]:', Q[0,:5])
print('Q.shape:', Q.shape)

W_query[0,:5]: [-0.00076412  0.05475055 -0.0840017  -0.07511146 -0.03930965]
W_query.shape: (512, 64)
Q[0, :5]: [-0.22772415  0.48167861  1.48693408 -1.00410576  0.19323685]
Q.shape: (3, 64)


**Expected output**\
W_query[0,:5]: [-0.00076412  0.05475055 -0.0840017  -0.07511146 -0.03930965]\
W_query.shape: (512, 64)\
Q[0, :5]: [-0.22772415  0.48167861  1.48693408 -1.00410576  0.19323685]\
Q.shape: (3, 64)

---
Next, repeat for `W_key` & `K`, and `W_value` & `V`.

***2 points, 1 point for each of the two below cells***

In [6]:
torch.manual_seed(1) # Do not remove this line

### START YOUR CODE ###

### END YOUR CODE ###

nn.init.xavier_uniform_(W)
W_key = W.data.numpy()

### START YOUR CODE ###
K = np.matmul(x, W_key)
### END YOUR CODE ###

In [7]:
torch.manual_seed(2) # Do not remove this line

### START YOUR CODE ###

### END YOUR CODE ###

nn.init.xavier_uniform_(W)
W_value = W.data.numpy()

### START YOUR CODE ###
V = np.matmul(x, W_value)
### END YOUR CODE ###

In [8]:
# Do not change the code in this cell
print('K[0,:5]', K[0,:5])
print('K.shape', K.shape)
print('V[0,:5]', V[0,:5])
print('V.shape', V.shape)

K[0,:5] [ 0.2283654  -0.65482728 -0.07202067  0.49886374  0.57045028]
K.shape (3, 64)
V[0,:5] [-0.44997754  0.92097362 -0.76932045  0.03289757 -0.49462588]
V.shape (3, 64)


**Expected output**\
K[0,:5] [ 0.2283654  -0.65482728 -0.07202067  0.49886374  0.57045028]\
K.shape (3, 64)\
V[0,:5] [-0.44997754  0.92097362 -0.76932045  0.03289757 -0.49462588]\
V.shape (3, 64)

---
### 1.3 Compute attention scores and weighted output

***2 points***

Step 3, compute the attention scores using the matrices `Q` and `K`, following the equation:

\begin{equation}
Attention(Q, K) = softmax(\frac{Q\cdot K^T}{\sqrt{d_k}})
\end{equation}

in which $\sqrt{d_k}$ is for normalization purpose.

*Hint*: You should first compute `attn_scores`, which is the unnormalized score. Then you can apply the `softmax()` function imported from `scipy` to calculate the normalized scores. Note that you need to specify the `axis` argument correctly when you call `softmax()`.

In [11]:
### START YOUR CODE ###
attn_scores = np.dot(Q,K.T) / np.sqrt(d_k)
### END YOUR CODE ###

### START YOUR CODE ###
# compute attn_scores_norm
attn_scores_norm = softmax(attn_scores, axis=1)
### END YOUR CODE ###

In [12]:
# Do not change the code in this cell
print('attn_scores.shape:', attn_scores.shape)
print('Unnormalized attn_scores:', attn_scores)
print('Normalized atten_scores:', attn_scores_norm)

attn_scores.shape: (3, 3)
Unnormalized attn_scores: [[-0.75497307 -0.97036233 -0.85112729]
 [ 0.23777018 -0.70730381 -0.37639239]
 [ 0.21608578 -0.73905372 -0.89881112]]
Normalized atten_scores: [[0.36838498 0.29700212 0.33461289]
 [0.51820328 0.20140013 0.2803966 ]
 [0.58387084 0.22464925 0.19147991]]


**Expected output**\
attn_scores.shape: (3, 3)\
Unnormalized attn_scores: [[-0.75497307 -0.97036233 -0.85112729]\
 [ 0.23777018 -0.70730381 -0.37639239]\
 [ 0.21608578 -0.73905372 -0.89881112]]\
Normalized atten_scores: [[0.36838498 0.29700212 0.33461289]\
 [0.51820328 0.20140013 0.2803966 ]\
 [0.58387084 0.22464925 0.19147991]]\

---

### Step 4
***1 point***<br>
Finally, compute the output as the weighted sum of value (`V`), using the above computed `attn_scores_norm` as the weight.

*Hint*: `attn_scores_norm[0,:]` is the weight for the first output `weighted_output[0,:]`, \
so the computation is:\
`weighted_output[0,:] = attn_scores_norm[0,0] * V[0,:] + attn_scores_norm[0,1] * V[1,:] + attn_scores_norm[0,2] * V[2,:]`. \
But you can achieve this with one line code using `@`.

In [13]:
### START YOUR CODE ###

weighted_output = np.dot(attn_scores_norm, V)
### END YOUR CODE ###

# Do not change the code below
print('weighted_output[0,:5]:', weighted_output[0,:5])
print('weighted_output.shape:', weighted_output.shape)

weighted_output[0,:5]: [-0.37040031  0.493314   -0.78595572  0.09711595 -0.33551551]
weighted_output.shape: (3, 64)


**Expected output**\
weighted_output[0,:5]: [-0.37040031  0.493314   -0.78595572  0.09711595 -0.33551551]\
weighted_output.shape: (3, 64)

---
**Congratulations!** You have finished Task 1, and now you know how to implement the self-attention module, which is the core technique of Transformer.

## Task 2. Play with transformer-based models
**Points: 7**

### 2.1 Installation
If not already installed, install the *transformers* package 

After it is done, you can load some pretrained BERT models and tokenizers like this (you can ignore the warnings):

In [14]:
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained("bert-base-uncased")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

### 2.2 Tokenizing inputs

Run the following examples

In [15]:
text = """The hotness of the sun and the coldness of the outer space are inexhaustible thermodynamic
resources for human beings. From a thermodynamic point of view, any energy conversion systems
that receive energy from the sun and/or dissipate energy to the universe are heat engines with
photons as the "working fluid" and can be analyzed using the concept of entropy. While entropy
analysis provides a particularly convenient way to understand the efficiency limits, it is typically
taught in the context of thermodynamic cycles among quasi-equilibrium states and its
generalization to solar energy conversion systems running in a continuous and non-equilibrium
fashion is not straightforward. In this educational article, we present a few examples to illustrate
how the concept of photon entropy, combined with the radiative transfer equation, can be used to
analyze the local entropy generation processes and the efficiency limits of different solar energy
conversion systems. We provide explicit calculations for the local and total entropy generation
rates for simple emitters and absorbers, as well as photovoltaic cells, which can be readily
reproduced by students. We further discuss the connection between the entropy generation and the
device efficiency, particularly the exact spectral matching condition that is shared by infinitejunction photovoltaic cells and reversible thermoelectric materials to approach their theoretical
efficiency limit."""

encoded_input = tokenizer(text, return_tensors='pt')

print(len(text.split()))
print(encoded_input['input_ids'].shape)


211
torch.Size([1, 275])


### Expected Output

211<br>
torch.Size([1, 275])


Can you explain why the `encoded_input` has more elements than the actual number of words in `text`?\
(**Points: 2**)

In [None]:
### Write your answer within the quotes ###
answer = """
It interperets not only words but also 'sub-words' where it breaks down the words into parts that may convey more information
"""


---

### 2.3 Output word vectors from BERT

In [16]:
output = model(**encoded_input)

last_hidden_state = output['last_hidden_state']

print(last_hidden_state.shape)

torch.Size([1, 275, 768])


### Expected output

torch.Size([1, 275, 768])

With the following code, you can find the corresponding token of each integer id in `input_ids`.

In [17]:
input_ids_pt = encoded_input['input_ids']
input_ids_list = input_ids_pt.tolist()[0]
input_tokens = tokenizer.convert_ids_to_tokens(input_ids_list)

print(input_ids_list[:10])
print(input_tokens[:10])

[101, 1996, 2980, 2791, 1997, 1996, 3103, 1998, 1996, 3147]
['[CLS]', 'the', 'hot', '##ness', 'of', 'the', 'sun', 'and', 'the', 'cold']


### Expected Output

[101, 1996, 2980, 2791, 1997, 1996, 3103, 1998, 1996, 3147]<br>
['[CLS]', 'the', 'hot', '##ness', 'of', 'the', 'sun', 'and', 'the', 'cold']

Can you find the output vector**s** among `last_hidden_state` that correpond to the input word "entropy"?\
Do they have the same values?\
**(Points: 2)**

*Hint*: You can use a `if` statement to check if the current token is the word "entropy", and if so, you can append it to `vectors`.

In [36]:
vectors = []
for i, token in enumerate(input_tokens):
    ### START YOUR CODE ###
    if input_tokens[i] == "entropy":
        vectors.append(last_hidden_state[0,i])
    ### END YOUR CODE ###
# Do not change the code below
print('Number of "entropy":', len(vectors))

matches = [torch.allclose(vectors[i], vectors[i+1]) for i in range(len(vectors)-1)]
print(f'Do they have the same value? {matches}')

Number of "entropy": 6
Do they have the same value? [False, False, False, False, False]


**Expected output:** \
Number of "entropy": 6\
Do they have the same value? [False, False, False, False, False]

---
### 2.4 Sentence vectors from BERT

We can obtain the output vectors for a batch of sentences.

First, we need to break the text into a list of sentences, using a simple end-of-sentence str '.' as a separater. 

In [37]:
sentences = text.replace('\n', ' ').split('.')
sentences = [s.strip() + '.' for s in sentences if len(s.strip())>0] # Some cleaning work

print(f'Resulting in {len(sentences)} sentences:')
print(sentences)

Resulting in 6 sentences:
['The hotness of the sun and the coldness of the outer space are inexhaustible thermodynamic resources for human beings.', 'From a thermodynamic point of view, any energy conversion systems that receive energy from the sun and/or dissipate energy to the universe are heat engines with photons as the "working fluid" and can be analyzed using the concept of entropy.', 'While entropy analysis provides a particularly convenient way to understand the efficiency limits, it is typically taught in the context of thermodynamic cycles among quasi-equilibrium states and its generalization to solar energy conversion systems running in a continuous and non-equilibrium fashion is not straightforward.', 'In this educational article, we present a few examples to illustrate how the concept of photon entropy, combined with the radiative transfer equation, can be used to analyze the local entropy generation processes and the efficiency limits of different solar energy conversion 

### Expected Output

Resulting in 6 sentences:<br>
['The hotness of the sun and the coldness of the outer space are inexhaustible thermodynamic resources for human beings.', 'From a thermodynamic point of view, any energy conversion systems that receive energy from the sun and/or dissipate energy to the universe are heat engines with photons as the "working fluid" and can be analyzed using the concept of entropy.', 'While entropy analysis provides a particularly convenient way to understand the efficiency limits, it is typically taught in the context of thermodynamic cycles among quasi-equilibrium states and its generalization to solar energy conversion systems running in a continuous and non-equilibrium fashion is not straightforward.', 'In this educational article, we present a few examples to illustrate how the concept of photon entropy, combined with the radiative transfer equation, can be used to analyze the local entropy generation processes and the efficiency limits of different solar energy conversion systems.', 'We provide explicit calculations for the local and total entropy generation rates for simple emitters and absorbers, as well as photovoltaic cells, which can be readily reproduced by students.', 'We further discuss the connection between the entropy generation and the device efficiency, particularly the exact spectral matching condition that is shared by infinitejunction photovoltaic cells and reversible thermoelectric materials to approach their theoretical efficiency limit.']

Now, let's use tokenizer on this batch of sentences

In [38]:
encoded_sentences = tokenizer(sentences, padding=True, return_tensors='pt')

print(encoded_sentences['input_ids'].shape)
print(encoded_sentences['input_ids'][0,:])
print(encoded_sentences['input_ids'][1,:])

torch.Size([6, 57])
tensor([  101,  1996,  2980,  2791,  1997,  1996,  3103,  1998,  1996,  3147,
         2791,  1997,  1996,  6058,  2686,  2024,  1999, 10288, 13821,  3775,
         3468,  1996, 10867,  7716, 18279,  7712,  4219,  2005,  2529,  9552,
         1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0])
tensor([  101,  2013,  1037,  1996, 10867,  7716, 18279,  7712,  2391,  1997,
         3193,  1010,  2151,  2943,  7584,  3001,  2008,  4374,  2943,  2013,
         1996,  3103,  1998,  1013,  2030,  4487, 18719, 17585,  2943,  2000,
         1996,  5304,  2024,  3684,  5209,  2007, 26383,  2015,  2004,  1996,
         1000,  2551,  8331,  1000,  1998,  2064,  2022, 16578,  2478,  1996,
         4145,  1997, 23077,  1012,   102,     0,     0])


You can find that shorter sentences are padded with a special id `0`.

Next, we can obtain the output tensors for all input sentences, also in a batch.

### Expected Output

torch.Size([6, 57])<br>
tensor([  101,  1996,  2980,  2791,  1997,  1996,  3103,  1998,  1996,  3147,
         2791,  1997,  1996,  6058,  2686,  2024,  1999, 10288, 13821,  3775,
         3468,  1996, 10867,  7716, 18279,  7712,  4219,  2005,  2529,  9552,
         1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0])<br>
tensor([  101,  2013,  1037,  1996, 10867,  7716, 18279,  7712,  2391,  1997,
         3193,  1010,  2151,  2943,  7584,  3001,  2008,  4374,  2943,  2013,
         1996,  3103,  1998,  1013,  2030,  4487, 18719, 17585,  2943,  2000,
         1996,  5304,  2024,  3684,  5209,  2007, 26383,  2015,  2004,  1996,
         1000,  2551,  8331,  1000,  1998,  2064,  2022, 16578,  2478,  1996,
         4145,  1997, 23077,  1012,   102,     0,     0])

In [39]:
outputs = model(**encoded_sentences)

print(outputs['last_hidden_state'].shape)

torch.Size([6, 57, 768])


### Expected Output

torch.Size([6, 57, 768])

Note that the first dimension of `outputs['last_hidden_state']` is batch size. So the output tensor for the 1st sentence is `outputs['last_hidden_state'][0]`, and so on.

In [40]:
print(outputs['last_hidden_state'][0].shape)

torch.Size([57, 768])


### Expected Output

torch.Size([57, 768])


For each output tensor, the first 768-dim vector (at position 0) always corresponds to the special input token `[CLS]`. We can use this vector to represent the meaning of the whole sentence.

In [41]:
CLS_vec = outputs['last_hidden_state'][0][0]
print(CLS_vec.shape)

torch.Size([768])


### Expected Output

torch.Size([768])

Now, it is your task to compute the cosine similarities between each pair of the 6 sentences, and find the pair that has the closest meanings.\
**(Points: 3)**

*Hint*: You can use the `cosine_similarity()` function imported at the beginning, which takes input two tensors and returns the similarity score in a tensor. So you will need to append a `.item()` to retrieve the numeric value from the returned tensor. You also need to specify the argument `dim=0`.

In [67]:
for i in range(5):
    for j in range(i+1, 6):
        ### START YOUR CODE ###
        sim = cosine_similarity(outputs['last_hidden_state'][i][0], outputs['last_hidden_state'][j][0], dim=0).item()
        # Hint: when you call cosine_similarity() function for "sim", remember to specify dim=0. 
        #Also, you need append .item() at the end to obtain a number instead of a tensor.
        ### END YOUR CODE ###
        print(f'{i} <-> {j}: {sim}')

0 <-> 1: 0.8591638803482056
0 <-> 2: 0.7771982550621033
0 <-> 3: 0.7985226511955261
0 <-> 4: 0.7754688262939453
0 <-> 5: 0.8052164316177368
1 <-> 2: 0.8763416409492493
1 <-> 3: 0.8321618437767029
1 <-> 4: 0.823844850063324
1 <-> 5: 0.8492751717567444
2 <-> 3: 0.8241373896598816
2 <-> 4: 0.859862744808197
2 <-> 5: 0.8579832911491394
3 <-> 4: 0.9018082618713379
3 <-> 5: 0.9291439056396484
4 <-> 5: 0.918526828289032


**Expected output:**\
0 <-> 1: 0.8591639399528503\
0 <-> 2: 0.777198314666748\
0 <-> 3: 0.7985224723815918\
0 <-> 4: 0.7754684090614319\
0 <-> 5: 0.8052163124084473\
1 <-> 2: 0.876341700553894\
1 <-> 3: 0.8321619629859924\
1 <-> 4: 0.823844850063324\
1 <-> 5: 0.8492751717567444\
2 <-> 3: 0.8241377472877502\
2 <-> 4: 0.8598626852035522\
2 <-> 5: 0.8579834699630737\
3 <-> 4: 0.9018082618713379\
3 <-> 5: 0.929144024848938\
4 <-> 5: 0.9185266494750977

---
You can print out the two sentences to see if the similarity score makes sense.

In [51]:
print(sentences[3])
print(sentences[5])

In this educational article, we present a few examples to illustrate how the concept of photon entropy, combined with the radiative transfer equation, can be used to analyze the local entropy generation processes and the efficiency limits of different solar energy conversion systems.
We further discuss the connection between the entropy generation and the device efficiency, particularly the exact spectral matching condition that is shared by infinitejunction photovoltaic cells and reversible thermoelectric materials to approach their theoretical efficiency limit.


### Expected Output

In this educational article, we present a few examples to illustrate how the concept of photon entropy, combined with the radiative transfer equation, can be used to analyze the local entropy generation processes and the efficiency limits of different solar energy conversion systems.<br>
We further discuss the connection between the entropy generation and the device efficiency, particularly the exact spectral matching condition that is shared by infinitejunction photovoltaic cells and reversible thermoelectric materials to approach their theoretical efficiency limit.

---

### 2.5 Play with summarization

Let's play with the summarization pipeline provided by transformers. Be patient when the model is downloading. 

You can try the following code with different input text or arguments. Ignore warnings.

In [52]:
from transformers import pipeline

summarizer = pipeline("summarization")

print(summarizer(text, max_length=150, min_length=30))



No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' The hotness of the sun and the coldness of outer space are inexhaustible thermodynamic resources for human beings . From a thermodynamic point of view, any energy conversion systems that receive energy from the sun or dissipate energy to the universe are heat engines with photons as the "working fluid"'}]


### Expected Result


Ignore any warnings. Will take some time to run.<br><br>
[{'summary_text': ' The hotness of the sun and the coldness of outer space are inexhaustible thermodynamic resources for human beings . From a thermodynamic point of view, any energy conversion systems that receive energy from the sun or dissipate energy to the universe are heat engines with photons as the "working fluid"'}]<br>



### 2.6 Play with Sentiment Analysis

Let's play with the Sentiment Analysis pipeline provided by transformers. Be patient when the model is downloading. 

You can try the following code with different input text or arguments. Ignore warnings.

In [68]:
sentiment_classifier = pipeline("sentiment-analysis")
text2 = "I love using transformers library for natural language processing!"

# Perform sentiment classification
result = sentiment_classifier(text2)

# Output the result
print(result)

text3 = "I didn't like the movie. It was boring"

result = sentiment_classifier(text3)

# Output the result
print(result)


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9984171390533447}]
[{'label': 'NEGATIVE', 'score': 0.999295711517334}]


### Expected Result

Ignore any warnings <br><br>

[{'label': 'POSITIVE', 'score': 0.9984171390533447}]<br>
[{'label': 'NEGATIVE', 'score': 0.999295711517334}]