# Prompt-based zero-shot learning

In this file, I will conduct few experiment to test which model will give us the highest f1 score for classification tasks using prompt-based zero-shot learning.

In [1]:
!pip install transformers evaluate accelerate sentencepiece

Collecting transformers
  Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.24.1-py3-none-any.whl (261 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [9

In [2]:
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
import torch
import numpy as np
import pandas as pd
from sklearn.metrics import f1_score, accuracy_score

try:
    from google.colab import drive
    drive.mount('/content/gdrive')

    train_path = '/content/gdrive/MyDrive/advanced-ml-project/data/train.tsv'
    test_path = '/content/gdrive/MyDrive/advanced-ml-project/data/test.tsv'
    dev_path = '/content/gdrive/MyDrive/advanced-ml-project/data/dev.tsv'

except:
    train_path = 'data/train.tsv'
    test_path = 'data/test.tsv'
    dev_path = 'data/dev.tsv'

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

Mounted at /content/gdrive


device(type='cuda', index=0)

## Load Data

In [3]:
train = pd.read_csv(train_path, sep='\t', header=0)
train['label'] = train['label'].apply(lambda x: 'healthy' if x == 'not depression' else x)
train = train.sample(frac=1).reset_index(drop=True)

test = pd.read_csv(test_path, sep='\t', header=0)
test['label'] = test['label'].apply(lambda x: 'healthy' if x == 'not depression' else x)
test = test.sample(frac=1).reset_index(drop=True)

dev = pd.read_csv(dev_path, sep='\t', header=0)
dev['label'] = dev['label'].apply(lambda x: 'healthy' if x == 'not depression' else x)
dev = dev.sample(frac=1).reset_index(drop=True)
print(train.label.value_counts())
test.head(10)

moderate    6019
healthy     1971
severe       901
Name: label, dtype: int64


Unnamed: 0,PID,text,label
0,test_pid_1333,"Yesterday I dreamt : That I suicided, I still ...",moderate
1,test_pid_586,I hate 2020. : I don’t look forward to this ye...,moderate
2,test_pid_589,I was going to kill myself. : I was going to k...,moderate
3,test_pid_1778,Depression as a comfort : Is it possible that ...,moderate
4,test_pid_2450,Job applying season and I feel like a failure ...,healthy
5,test_pid_2598,Having a bad day...please tell me something yo...,healthy
6,test_pid_1826,"""I'm not depressed"" : I know this is weird, th...",moderate
7,test_pid_2204,I can’t handle another winter : Where I live w...,healthy
8,test_pid_356,If I disappeared tomorrow nobody would notice ...,moderate
9,test_pid_2296,It always hurts : I don't know what to do with...,healthy


## fill mask

In [4]:
mask_filler = pipeline(
    "fill-mask",
    'microsoft/deberta-v3-large',
    device=device,
    batch_size=16,
)

prompt = "The level of depression in this tweet is {}.\n{}"

predictions = mask_filler(
    [prompt.format(mask_filler.tokenizer.mask_token, x) for x in test['text'].tolist()],
    top_k=1,
    targets=['moderate', 'healthy', 'severe'],
    tokenizer_kwargs={'padding': 'max_length', 'truncation': True, 'max_length': 512}
)

Downloading pytorch_model.bin:   0%|          | 0.00/874M [00:00<?, ?B/s]

Some weights of DebertaV2ForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-v3-large and are newly initialized: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [10]:
label2idx = {'moderate': 0, 'severe': 1, 'healthy': 2}
predictions = [label2idx[each[0]['token_str']] for each in predictions]
y_true = [label2idx[each] for each in test['label'].tolist()]

f1_score(y_true, predictions, average='weighted')

0.011079714132930966

## few-shot learning (with text-generation pipeline)

In [3]:
prompt = "Classify the tweet in in one of the following level of depression: moderate, severe, healthy. \
Tweet: 'Now that 2019 is behind us, and we greet 2020... : ...\n\n...\n\n...\n\nI still want to fucking die.' \
Level of depression: moderate\
Tweet: 'Is anyone awake? : Please help, I need a voice, it's a terrible night. My heart is so heavy. Help please.' \
Level of depression: severe \
Tweet: 'Insecurities, fuck em. : I constantly feel like anyone I talk to at all, or act like myself around is just trying to get me to shut up.\n\nIm tired of being disgusted of what I see in the mirror.\n\nI just to have a friend of the opposite gender that I can cuddle with or hug.\n\nNothing sexual about it, I'd just like to cuddle with someone of the opposite sex. I feel lonely and I honestly could just hug someone for like, hours probably.. is that creepy? I feel like itd come off as creepy..' \
Level of depression: not depression \
Tweet: '{}' \
Level of depression:"

In [4]:
model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model, token='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')
pipeline = pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
    token='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',
)

sequences = pipeline(
    [prompt.format(each) for each in test['text'].tolist()[0:3]],
    do_sample=True,
    top_k=1,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=10,
    return_full_text=False,
)
# for seq in sequences:
#     print(f"Result: {seq['generated_text']}")
sequences

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



[[{'generated_text': ' severe'}],
 [{'generated_text': ' not'}],
 [{'generated_text': ' moder'}]]

## zero-shot-classification

In [4]:
classifier = pipeline(
    model="cross-encoder/nli-deberta-v3-large",
    device=device,
    batch_size=16,
)

predictions = classifier(
    test['text'].tolist(),
    candidate_labels=["healthy", "moderate", "severe"],
    hypothesis_template='The level of depression is {}.',
)
predictions

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.74G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/418 [00:00<?, ?B/s]

Downloading spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/18.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/156 [00:00<?, ?B/s]



[{'sequence': 'Yesterday I dreamt : That I suicided, I still remember everything that happened. I’m sick of this shit.',
  'labels': ['severe', 'moderate', 'healthy'],
  'scores': [0.6853596568107605, 0.2411329597234726, 0.07350737601518631]},
 {'sequence': 'I hate 2020. : I don’t look forward to this year anymore. My friends hate me because I’m depressed all the fucking time, Literally nothing makes me happy at all. I lost a lot of friends and people close to me. I try to make things better but people just push me away. It’s like i’m a burden to everyone.\nI just don’t want to exist anymore, and i mean it. Just want to forever sleep, and never wake up again. I wish I can die painlessly.',
  'labels': ['severe', 'moderate', 'healthy'],
  'scores': [0.782556414604187, 0.20136818289756775, 0.016075430437922478]},
 {'sequence': "I was going to kill myself. : I was going to kill myself today, however I got my dates mixed and it's my grandma's birthday. I'll give life one more week, if I st

In [10]:
def compute_class_weight(train_y):
    """
    Compute class weight given imbalanced training data
    Usually used in the neural network model to augment the loss function (weighted loss function)
    Favouring/giving more weights to the rare classes.
    """
    import sklearn.utils.class_weight as scikit_class_weight

    train_y = [label2idx[each] for each in train_y]
    class_list = list(set(train_y))
    class_weight_value = scikit_class_weight.compute_class_weight(class_weight='balanced', classes=class_list, y=train_y)

    return class_weight_value

In [11]:
compute_class_weight(test.label.to_list())

array([0.49869371, 4.74415205, 1.27555031])

In [12]:
label2idx = {'moderate': 0, 'severe': 1, 'healthy': 2}
y_pred = [label2idx[each['labels'][0]] for each in predictions]
y_true = [label2idx[each] for each in test['label'].tolist()]

f1 = f1_score(y_true, y_pred, average='weighted')
accuracy = accuracy_score(y_true, y_pred)

print(f"F1 score: {f1}, Accuracy: {accuracy}")

F1 score: 0.22176083856557965, Accuracy: 0.20523882896764253
