Multi-label Classification

Adds a linear layer on top of the base model, which is used to produce a tensor of shape (batch_size, num_labels), indicating the unnormalized scores for a number of labels for every example in the batch.

Based on https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BERT/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb

Uses Lora for PEFT

# Setup

In [15]:
# Base Model

base_model_id = "microsoft/Phi-3-mini-4k-instruct"

In [16]:
seed = 2024
use_lora = True
use_fp16 = True
use_gradient_checkpointing = True,  # Save some memory at the expense of training
# See https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one

hf_site_id = '2024-mcm-everitt-ryan'
dataset_id = f'{hf_site_id}/job-bias-synthetic-human-benchmark'
base_model_name = base_model_id.split('/')[-1]
model_id = f'phi3-mini-job-bias-mixed'


In [17]:
!pip install -q transformers datasets sentencepiece accelerate evaluate peft

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# Dataset

In [18]:
from datasets import load_dataset

dataset = load_dataset(dataset_id)
column_names = dataset['train'].column_names

print(f"Columns: {dataset.num_columns}")
print(f"Rows: {dataset.num_rows}")
print(f"Column Names: {column_names}")

Columns: {'train': 19, 'val': 19, 'test': 19}
Rows: {'train': 4820, 'val': 1051, 'test': 1053}
Column Names: ['id', 'label_age', 'analysis_age', 'label_disability', 'analysis_disability', 'label_masculine', 'analysis_masculine', 'label_feminine', 'analysis_feminine', 'label_racial', 'analysis_racial', 'label_sexuality', 'analysis_sexuality', 'label_general', 'analysis_general', 'verified', 'synthetic', 'text', 'metadata']


In [19]:
example = dataset['train'][0]
example['text']

'Company: Harrington, Richardson and Collins\n\nJob Title: Strategic Partnership Representative\n\nAbout Us:\nHarrington, Richardson and Collins is a leading consulting firm that specializes in strategic partnerships and business development. We are committed to delivering exceptional results to our clients and fostering a collaborative work environment.\n\nJob Description:\nWe are seeking a highly motivated and experienced Strategic Partnership Representative to join our team. The successful candidate will be responsible for identifying and pursuing new business opportunities, building and maintaining strong relationships with clients, and developing strategic partnerships.\n\nResponsibilities:\n- Develop and execute business development strategies to drive revenue growth\n- Identify and pursue new business opportunities through networking, research, and cold-calling\n- Build and maintain strong relationships with clients and partners\n- Collaborate with internal stakeholders to devel

In [20]:
text_col = 'text'
label_cols = [col for col in column_names if col.startswith('label_')]

labels = [label.replace("label_", "") for label in label_cols]

id2label = {idx: label for idx, label in enumerate(labels)}
label2id = {label: idx for idx, label in enumerate(labels)}
print(f"Text column: {text_col}")
print(f"Label columns: {label_cols}")
print(f"Labels: {labels}")

Text column: text
Label columns: ['label_age', 'label_disability', 'label_masculine', 'label_feminine', 'label_racial', 'label_sexuality', 'label_general']
Labels: ['age', 'disability', 'masculine', 'feminine', 'racial', 'sexuality', 'general']


In [21]:
# Remove all columns apart from the two needed for multi-class classification
keep_columns = ['context_id', 'synthetic', text_col] + label_cols
for split in ["train", "val", "test"]:
    dataset[split] = dataset[split].remove_columns(
        [col for col in dataset[split].column_names if col not in keep_columns])
    
for type in ['train','val','test']:
    dataset[type] = dataset[type].shuffle(seed=seed).select(range(10))

dataset


DatasetDict({
    train: Dataset({
        features: ['label_age', 'label_disability', 'label_masculine', 'label_feminine', 'label_racial', 'label_sexuality', 'label_general', 'synthetic', 'text'],
        num_rows: 10
    })
    val: Dataset({
        features: ['label_age', 'label_disability', 'label_masculine', 'label_feminine', 'label_racial', 'label_sexuality', 'label_general', 'synthetic', 'text'],
        num_rows: 10
    })
    test: Dataset({
        features: ['label_age', 'label_disability', 'label_masculine', 'label_feminine', 'label_racial', 'label_sexuality', 'label_general', 'synthetic', 'text'],
        num_rows: 10
    })
})

In [22]:
import pandas as pd

# Merge train,val, test into one dataframe
df = pd.concat([
    dataset['train'].to_pandas(),
    dataset['val'].to_pandas(),
    dataset['test'].to_pandas()])

print(f"{df.synthetic.value_counts().to_string()}")
for col in label_cols:
    print(f"\n{df[col].value_counts().to_string()}")

synthetic
True     25
False     5

label_age
False    27
True      3

label_disability
False    27
True      3

label_masculine
False    30

label_feminine
False    24
True      6

label_racial
False    23
True      7

label_sexuality
False    25
True      5

label_general
False    26
True      4


In [23]:
# Longest phrase
longest_text = df[text_col].apply(lambda x: (len(x), x)).max()[1]
longest_text

"<country>United States</country>\n  <company>Bright Horizons Marketing</company>\n  <role>Marketing Manager</role>\n  \n  <company_background>\n    Bright Horizons Marketing is a leading digital marketing agency committed to helping businesses grow through innovative marketing strategies. We pride ourselves on delivering top-notch marketing solutions while fostering a collaborative and inclusive work environment.\n  </company_background>\n  \n  <job_type>Full-Time</job_type>\n  \n  <job_description>\n    We are seeking an experienced Marketing Manager to join our dynamic team. The ideal candidate will have a strong background in public relations, strategic thinking, and leadership abilities. The Marketing Manager will be responsible for developing and implementing marketing campaigns that meet our clients' goals.\n  </job_description>\n  \n  <responsibilities>\n    - Develop and execute marketing strategies to drive brand awareness and growth.\n    - Manage marketing campaigns across 

In [24]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_prefix_space=True)
tokenizer

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


LlamaTokenizerFast(name_or_path='microsoft/Phi-3-mini-4k-instruct', vocab_size=32000, model_max_length=4096, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '<|endoftext|>', 'unk_token': '<unk>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=False),
	32000: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	32001: AddedToken("<|assistant|>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=True),
	32002: AddedToken("<|placeholder1|>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=

In [25]:
max_char = len(longest_text)
max_words = len(longest_text.split())
max_tokens = len(tokenizer.encode(longest_text))

print(f'Max characters: {max_char}')
print(f'Max words: {max_words}')
print(f'Max tokens: {max_tokens}')

Max characters: 3389
Max words: 402
Max tokens: 883


In [26]:
tokenizer_max_length = min(max_tokens, tokenizer.model_max_length)
tokenizer_max_length

883

In [27]:
import numpy as np


def preprocess_data(sample):
    # take a batch of texts
    text = sample[text_col]
    # encode them
    encoding = tokenizer(text, truncation=True, max_length=tokenizer_max_length, padding="max_length")
    #encoding = tokenizer(text, truncation=True, max_length=tokenizer_max_length, padding=True)
    # add labels
    labels_batch = {k: sample[k] for k in sample.keys() if k in label_cols}
    # create numpy array of shape (batch_size, num_labels)
    labels_matrix = np.zeros((len(text), len(label_cols)))
    # fill numpy array
    for idx, label in enumerate(label_cols):
        labels_matrix[:, idx] = labels_batch[label]

    encoding["labels"] = labels_matrix.tolist()

    return encoding

In [28]:
#ds_train = ds_train.map(tokenize, batched=True, batch_size=len(ds_train))
encoded_dataset = dataset.map(preprocess_data, batched=True, remove_columns=dataset['train'].column_names)

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

In [29]:
example = encoded_dataset['train'][0]
print(example.keys())

dict_keys(['input_ids', 'attention_mask', 'labels'])


In [30]:
tokenizer.decode(example['input_ids'])

2024-07-04 20:56:48.693242: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-04 20:56:48.964878: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX_VNNI, in other operations, rebuild TensorFlow with the appropriate compiler flags.


'<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext

In [31]:
example['labels']

[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]

In [32]:
[id2label[idx] for idx, label in enumerate(example['labels']) if label == 1.0]

['age', 'racial']

In [33]:
encoded_dataset.set_format("torch")

# Model

Here we define a model that includes a pre-trained base (i.e. the weights) are loaded, with a random initialized classification head (linear layer) on top. One should fine-tune this head, together with the pre-trained base on a labeled dataset.

This is also printed by the warning.

We set the `problem_type` to be "multi_label_classification", as this will make sure the appropriate loss function is used (namely [`BCEWithLogitsLoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html)). We also make sure the output layer has `len(label_cols)` output neurons, and we set the id2label and label2id mappings.

In [35]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(base_model_id,
                                                           problem_type="multi_label_classification",
                                                           num_labels=len(label_cols),
                                                           id2label=id2label,
                                                           label2id=label2id,
                                                           trust_remote_code=True
                                                           )
model

modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

KeyboardInterrupt: 

In [22]:
model.config

Phi3Config {
  "_name_or_path": "microsoft/Phi-3-mini-4k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-4k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-4k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "id2label": {
    "0": "age",
    "1": "disability",
    "2": "feminine",
    "3": "general",
    "4": "masculine",
    "5": "racial",
    "6": "sexuality"
  },
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "label2id": {
    "age": 0,
    "disability": 1,
    "feminine": 2,
    "general": 3,
    "masculine": 4,
    "racial": 5,
    "sexuality": 6
  },
  "max_position_embeddings": 4096,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "original_max_position_emb

In [23]:
from peft import get_peft_model, LoraConfig, TaskType

if use_lora:
    peft_config = LoraConfig(
        task_type=TaskType.SEQ_CLS, lora_alpha=16, lora_dropout=0.1, bias="none",
        r=8,
        target_modules='all-linear'
    )
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    model


trainable params: 12,629,048 || all params: 3,735,254,128 || trainable%: 0.3381


Let's verify a batch as well as a forward pass:

In [24]:
encoded_dataset['train'][0]['labels'].type()

'torch.FloatTensor'

In [None]:
encoded_dataset['train']['input_ids'][0]

In [None]:
#forward pass
outputs = model(input_ids=encoded_dataset['train']['input_ids'][0].unsqueeze(0),
                labels=encoded_dataset['train'][0]['labels'].unsqueeze(0))
outputs

# Define Metrics

In [27]:
from sklearn.metrics import f1_score, precision_score, recall_score, roc_auc_score, accuracy_score
from transformers import EvalPrediction
import torch


# source: https://jesusleal.io/2021/04/21/Longformer-multilabel-classification/
# added extras
def multi_label_metrics(predictions, labels, threshold=0.5):
    # first, apply sigmoid on predictions which are of shape (batch_size, num_labels)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions))
    # next, use threshold to turn them into integer predictions
    y_pred = np.zeros(probs.shape)
    y_pred[np.where(probs >= threshold)] = 1
    # finally, compute metrics
    y_true = labels

    accuracy = accuracy_score(y_true=y_true, y_pred=y_pred)

    f1_micro = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
    f1_macro = f1_score(y_true=y_true, y_pred=y_pred, average='macro')
    f1_samples = f1_score(y_true=y_true, y_pred=y_pred, average='samples')
    f1_weighted = f1_score(y_true=y_true, y_pred=y_pred, average='weighted')

    precision_micro = precision_score(y_true=y_true, y_pred=y_pred, average='micro')
    recall_micro = recall_score(y_true=y_true, y_pred=y_pred, average='micro')
    roc_auc_micro = roc_auc_score(y_true=y_true, y_score=y_pred, average='micro')
    # return as dictionary
    metrics = {
        'accuracy': accuracy,
        f'f1_micro': f1_micro,
        f'f1_macro': f1_macro,
        f'f1_samples': f1_samples,
        f'f1_weighted': f1_weighted,
        f'precision_micro': precision_micro,
        f'recall_micro': recall_micro,
        f'roc_auc_micro': roc_auc_micro}
    return metrics


def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    result = multi_label_metrics(
        predictions=preds,
        labels=p.label_ids)
    return result

# Train

In [28]:
from transformers import TrainingArguments, Trainer, DataCollatorWithPadding
from huggingface_hub import HfFolder

batch_size = 16
metric_name = "f1_micro"
optimiser = 'paged_adamw_8bit'  # Use paged optimizer to save memory
#learning_rate = 4e-5  # Use value slightly smaller than pretraining lr value & close to LoRA standard
learning_rate = 5e-5

args = TrainingArguments(
    model_id,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=learning_rate,
    #optim=optimiser,
    #lr_scheduler_type="cosine",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    weight_decay=0.01,
    #weight_decay=0.001,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
    fp16=use_fp16,
    gradient_checkpointing=use_gradient_checkpointing,
    #push_to_hub=True,
    #output_dir=repository_id,
    #logging_dir=f"{model_id}/logs",
    #logging_strategy="steps",
    #logging_steps=10,
    #warmup_steps=500,
    #warmup_ratio=0.1,
    #max_grad_norm=0.3,
    #save_total_limit=2,
    #report_to="tensorboard",
    #push_to_hub=True,
    #hub_strategy="every_save",
    #hub_model_id=hub_model_id,
    #hub_token=HfFolder.get_token(),
)

#early_stop = transformers.EarlyStoppingCallback(10, 1.15)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["val"],
    # For padding a batch of examples to the maximum length seen in the batch
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    #tokenizer=tokenizer,
    #   callbacks=[early_stop]
)

model.config.use_cache = False  # Silence the warnings.
trainer.train()



Epoch,Training Loss,Validation Loss,Accuracy,F1 Micro,F1 Macro,F1 Samples,F1 Weighted,Precision Micro,Recall Micro,Roc Auc Micro
1,No log,0.392633,0.15528,0.241825,0.079952,0.160801,0.184342,0.75,0.144152,0.567393
2,No log,0.367843,0.184265,0.279245,0.094745,0.188406,0.214164,0.833333,0.167724,0.580593
3,No log,0.355321,0.228778,0.332382,0.110022,0.23637,0.242572,0.779264,0.211242,0.59979
4,0.389500,0.347009,0.222567,0.328967,0.119599,0.229469,0.252021,0.833948,0.204896,0.598472
5,0.389500,0.344625,0.226708,0.335512,0.127685,0.2343,0.261464,0.843066,0.209429,0.600915


  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


TrainOutput(global_step=705, training_loss=0.37749592666084886, metrics={'train_runtime': 888.5235, 'train_samples_per_second': 12.684, 'train_steps_per_second': 0.793, 'total_flos': 1.502554538166864e+17, 'train_loss': 0.37749592666084886, 'epoch': 5.0})

# Evaluate

In [32]:
test_results = trainer.evaluate(eval_dataset=encoded_dataset['test'])

  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


In [33]:
print(f'evaluation (test) results: {test_results}')

evaluation (test) results: {'eval_loss': 0.3445480167865753, 'eval_accuracy': 0.24596273291925466, 'eval_f1_micro': 0.356729975227085, 'eval_f1_macro': 0.13248163979120295, 'eval_f1_samples': 0.2600414078674948, 'eval_f1_weighted': 0.27487788471918095, 'eval_precision_micro': 0.8212927756653993, 'eval_recall_micro': 0.22784810126582278, 'eval_roc_auc_micro': 0.6089101824869757, 'eval_runtime': 41.982, 'eval_samples_per_second': 19.175, 'eval_steps_per_second': 1.215, 'epoch': 5.0}


In [34]:
import pandas as pd
df = pd.DataFrame(list(test_results.items()), columns=['Metric', 'Value'])
print(df.to_string(index=False))

                 Metric     Value
              eval_loss  0.344548
          eval_accuracy  0.245963
          eval_f1_micro  0.356730
          eval_f1_macro  0.132482
        eval_f1_samples  0.260041
       eval_f1_weighted  0.274878
   eval_precision_micro  0.821293
      eval_recall_micro  0.227848
     eval_roc_auc_micro  0.608910
           eval_runtime 41.982000
eval_samples_per_second 19.175000
  eval_steps_per_second  1.215000
                  epoch  5.000000


In [42]:
#The following contains age and disability bias
text = "Responsibilities: Oversee daily warehouse operations, including receiving, storing, and distributing products. Manage inventory control processes and ensure accurate record-keeping. Develop and implement warehouse policies and procedures to improve efficiency and safety. Lead and mentor a team of warehouse staff, fostering a positive and productive work environment. Coordinate with other departments to ensure smooth workflow and timely order fulfillment. Monitor performance metrics and prepare reports for senior management. Ensure compliance with health and safety regulations. Requirements: Bachelor's degree in logistics, supply chain management, or a related field. Minimum of 5 years of experience in warehouse management. Strong leadership and organizational skills. Excellent communication and interpersonal skills. Proficiency in warehouse management software and Microsoft Office Suite. Ability to work in a fast-paced environment and handle multiple tasks simultaneously. Must be under 40 years old to ensure a fit with our energetic and fast-paced team culture. (Note: This is an example of age-biased language and should be avoided) Preferred Qualifications: Experience with lean warehouse operations and continuous improvement methodologies. Certification in warehouse management or related disciplines. Knowledge of industry-specific regulations and best practices. Physical Requirements: Ability to lift up to 50 pounds. Ability to stand and walk for extended periods. Young and dynamic individuals preferred to keep up with the physical demands of the job. Benefits: Health, dental, and vision insurance. Retirement savings plan with company match. Paid time off and holidays. Opportunities for professional development and career advancement. How to Apply: Interested candidates are invited to submit their resume and cover letter to [email@example.com]. ABC Logistics is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees."

encoding = tokenizer(text, return_tensors="pt")
encoding = {k: v.to(trainer.model.device) for k,v in encoding.items()}

outputs = trainer.model(**encoding)

OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 

In [40]:
logits = outputs.logits
logits.shape
sigmoid = torch.nn.Sigmoid()
probs = sigmoid(logits.squeeze().cpu())
predictions = np.zeros(probs.shape)
predictions[np.where(probs >= 0.5)] = 1
# turn predicted id's into actual label names
predicted_labels = [id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]
print(predicted_labels)

['disability', 'general']
