BERT as Feature extractor

Installation

In [None]:
!pip install transformers
!pip install torch
!pip install scikit-learn
!pip install pandas


Collecting transformers
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m33.0 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.16.2-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.5/268.5 kB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m63.9 MB/s[0m eta [36m0:00:0

Necessary imports

In [None]:
import torch
from transformers import BertTokenizer, BertModel
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import pandas as pd

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')


Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
res = pd.read_csv('/content/Restaurant_Reviews.csv')
res.head()

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1


In [None]:
texts = res['Review'].tolist()
labels = res['Liked'].tolist()

In [None]:
#Tokenize the input texts and convert them to tensors
encoded_inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')


In [None]:
# Pass the input tensors through the BERT model
with torch.no_grad():
    model_outputs = model(**encoded_inputs)

In [None]:
# Extract the output representations from a specific layer (e.g., second-to-last layer)
features = model_outputs.last_hidden_state[:, 1, :].numpy()


In [None]:
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)


In [None]:
# Train a classifier (e.g., logistic regression) on top of the extracted features
classifier = LogisticRegression()
classifier.fit(X_train, y_train)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [None]:
# Predict labels for the test set
y_pred = classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.865


The code  provided demonstrates transfer learning in the context of text classification. Transfer learning involves leveraging the knowledge learned from a pretrained model (in this case, BERT) and applying it to a new task (text classification).

In the code, we load a pretrained BERT model and use it as a fixed feature extractor. The BERT model has been pretrained on a large corpus of text data and has learned useful representations of language. We pass the input texts through the BERT model and extract the output representations from a specific layer (e.g., the second-to-last layer). These representations capture the contextual information and semantic meaning of the input text.

By using the pretrained BERT model, we benefit from its ability to capture rich linguistic features and general knowledge about language. We then train a classifier (e.g., logistic regression) on top of the extracted features to perform the specific text classification task.

This approach is considered transfer learning because we transfer the knowledge learned by the pretrained model to our specific classification task. Instead of training the model from scratch on the text classification task, we fine-tune the pretrained model by training only the classifier on top of the fixed features extracted from the pretrained model. This allows us to leverage the pretrained model's understanding of language while adapting it to our specific task with a smaller amount of task-specific labeled data.