# Fine-tuning BERT (and friends) for multi-label text classification

In this notebook, we are going to fine-tune BERT to predict one or more labels for a given piece of text. Note that this notebook illustrates how to fine-tune a bert-base-uncased model, but you can also fine-tune a RoBERTa, DeBERTa, DistilBERT, CANINE, ... checkpoint in the same way. 

All of those work in the same way: they add a linear layer on top of the base model, which is used to produce a tensor of shape (batch_size, num_labels), indicating the unnormalized scores for a number of labels for every example in the batch.



## Set-up environment

First, we install the libraries which we'll use: HuggingFace Transformers and Datasets.

In [1]:
%pip install -q transformers datasets

[0mNote: you may need to restart the kernel to use updated packages.


## Load dataset

Next, let's download a multi-label text classification dataset from the [hub](https://huggingface.co/).

At the time of writing, I picked a random one as follows:   

* first, go to the "datasets" tab on huggingface.co
* next, select the "multi-label-classification" tag on the left as well as the the "1k<10k" tag (fo find a relatively small dataset).

Note that you can also easily load your local data (i.e. csv files, txt files, Parquet files, JSON, ...) as explained [here](https://huggingface.co/docs/datasets/loading.html#local-and-remote-files).



In [2]:
from datasets import load_dataset
from transformers import AutoTokenizer
import numpy as np
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score
from transformers import EvalPrediction
import torch
import pandas as pd
from ast import literal_eval    
from datetime import datetime
import datasets
import re

In [3]:
level_selection = 'product'#'subcat'#'cat'
#dir(datasets)

In [4]:
df_train_raw = pd.read_csv('/kaggle/input/cleaned-toxic-comments/train_preprocessed.csv')

In [5]:
# less toxic gets stack on top of more toxic and less toxic gets a 0 target, more toxic is a 1 target
df_train = df_train_raw.copy()
df_train

Unnamed: 0,comment_text,id,identity_hate,insult,obscene,set,severe_toxic,threat,toxic,toxicity
0,explanation why the edits made under my userna...,0000997932d777bf,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
1,d aww he matches this background colour i m s...,000103f0d9cfb60f,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
2,hey man i m really not trying to edit war it...,000113f07ec002fd,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
3,more i can t make any real suggestions on im...,0001b41b1c6bb37e,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
4,you sir are my hero any chance you remember...,0001d958c54c6e35,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
159566,and for the second time of asking when your ...,ffe987279560d7ff,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
159567,you should be ashamed of yourself that is a ho...,ffea4adeee384e90,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
159568,spitzer umm theres no actual article for pros...,ffee36eab5c267c9,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
159569,and it looks like it was actually you who put ...,fff125370e4aaaf3,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0


In [6]:
df_train = df_train.sample(frac=1).reset_index(drop=True)
#df_train = df_train.iloc[:300000,:]# shrink down the training data so training doesn't take so long
df_val = df_train.iloc[:1000,:]
test_all = df_train.iloc[1000:10000,:]
df_train = df_train.iloc[10000:,:]


In [7]:
dataset = load_dataset("sem_eval_2018_task_1", "subtask5.english")

Downloading builder script:   0%|          | 0.00/1.95k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.22k [00:00<?, ?B/s]

Downloading and preparing dataset sem_eval2018_task1/subtask5.english (download: 5.70 MiB, generated: 1.24 MiB, post-processed: Unknown size, total: 6.94 MiB) to /root/.cache/huggingface/datasets/sem_eval2018_task1/subtask5.english/1.1.0/a7c0de8b805f1988b118882fb289ccfbbeb9085c7820b6f046b5887e234af182...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/5.98M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/6838 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/3259 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/886 [00:00<?, ? examples/s]

Dataset sem_eval2018_task1 downloaded and prepared to /root/.cache/huggingface/datasets/sem_eval2018_task1/subtask5.english/1.1.0/a7c0de8b805f1988b118882fb289ccfbbeb9085c7820b6f046b5887e234af182. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

In [8]:
df_train.columns

Index(['comment_text', 'id', 'identity_hate', 'insult', 'obscene', 'set',
       'severe_toxic', 'threat', 'toxic', 'toxicity'],
      dtype='object')

In [9]:
targets = ['identity_hate', 'insult', 'obscene', 'severe_toxic', 'threat', 'toxic']
grouping = {k:v for k,v in zip(targets, range(len(targets)))}
train_dict = []
test_dict = []
val_dict = []
for d, f in zip([train_dict, test_dict, val_dict], [df_train, test_all, df_val]):
    for row in f.index:
        target_row = f[targets].loc[row].tolist()                       
        temp = {k:int(v) for k,v in zip(targets, target_row)}
        temp['text'] = f.comment_text[row]
        temp['ID'] = row
        d.append(temp)

In [10]:
f

Unnamed: 0,comment_text,id,identity_hate,insult,obscene,set,severe_toxic,threat,toxic,toxicity
0,agreed let s just keep the name as it is,ece1431ef774612c,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
1,it s fine to edit for personal gain so long ...,b66e5fffbd70f8fe,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
2,hi hi tya it s me allen how are you talk to...,3b9639cf9d4a5b00,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
3,hahahaha it is said some listeners spontaneo...,3df0118d8c0971d8,0.0,0.0,1.0,train,0.0,0.0,1.0,2.0
4,well it s on outpost gallifrey now too http...,848008a8e0ed7d6a,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
995,hi quantling you are absolutely right with yo...,5adb39326685924f,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
996,papal bull you are not going to get copies of ...,0b7903fd42ddb2bb,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0
997,xmother fuck er i didn t add that informatio...,c9d6159a7e7214e6,0.0,1.0,1.0,train,1.0,0.0,1.0,4.0
998,sock puppets where sock puppets where i am...,ac06b93eb5bfe4e2,0.0,0.0,0.0,train,0.0,0.0,0.0,0.0


In [11]:
print([len(i) for i in [train_dict, test_dict, val_dict]])

[149571, 9000, 1000]


In [12]:
target_row

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

As we can see, the dataset contains 3 splits: one for training, one for validation and one for testing.

In [13]:
dir(dataset['train'])
dataset['train'].__class__
#https://arrow.apache.org/docs/python/generated/pyarrow.Table.html
import pyarrow as pa
test_table = pa.Table.from_pandas(pd.DataFrame(test_dict))
test_dataset = datasets.arrow_dataset.Dataset(test_table)

train_table = pa.Table.from_pandas(pd.DataFrame(train_dict))
train_dataset = datasets.arrow_dataset.Dataset(train_table)

val_table = pa.Table.from_pandas(pd.DataFrame(val_dict))
val_dataset = datasets.arrow_dataset.Dataset(val_table)


In [14]:
dataset.__class__
dataset = datasets.dataset_dict.DatasetDict({'test':test_dataset, 
                                             'train':train_dataset,
                                            'validation':val_dataset})


In [15]:
dataset['train']

Dataset({
    features: ['identity_hate', 'insult', 'obscene', 'severe_toxic', 'threat', 'toxic', 'text', 'ID'],
    num_rows: 149571
})

Let's check the first example of the training split:

In [16]:
example = train_dict[0]
example

{'identity_hate': 0,
 'insult': 0,
 'obscene': 0,
 'severe_toxic': 0,
 'threat': 0,
 'toxic': 0,
 'text': 'what is the factual inaccuracy  please explain or i will remove the tag ',
 'ID': 10000}

The dataset consists of tweets, labeled with one or more emotions. 

Let's create a list that contains the labels, as well as 2 dictionaries that map labels to integers and back.

In [17]:
labels = [str(label) for label in grouping if label not in ['ID', 'text']]
id2label = {idx:label for idx, label in enumerate(labels)}
label2id = {label:idx for idx, label in enumerate(labels)}
labels[0]

'identity_hate'

## Preprocess data

As models like BERT don't expect text as direct input, but rather `input_ids`, etc., we tokenize the text using the tokenizer. Here I'm using the `AutoTokenizer` API, which will automatically load the appropriate tokenizer based on the checkpoint on the hub.

What's a bit tricky is that we also need to provide labels to the model. For multi-label text classification, this is a matrix of shape (batch_size, num_labels). Also important: this should be a tensor of floats rather than integers, otherwise PyTorch' `BCEWithLogitsLoss` (which the model will use) will complain, as explained [here](https://discuss.pytorch.org/t/multi-label-binary-classification-result-type-float-cant-be-cast-to-the-desired-output-type-long/117915/3).

In [18]:


tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_data(examples):
  # take a batch of texts
  text = examples["text"]
  # encode them
  encoding = tokenizer(text, padding="max_length", truncation=True, max_length=128)
  # add labels
  labels_batch = {k: examples[k] for k in examples.keys() if k in labels}
  # create numpy array of shape (batch_size, num_labels)
  labels_matrix = np.zeros((len(text), len(labels)))
  # fill numpy array
  for idx, label in enumerate(labels):
    labels_matrix[:, idx] = labels_batch[label]

  encoding["labels"] = labels_matrix.tolist()
  
  return encoding

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [19]:
encoded_dataset = dataset.map(preprocess_data, batched=True, remove_columns=train_dataset.column_names)

  0%|          | 0/9 [00:00<?, ?ba/s]

  0%|          | 0/150 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [20]:
example = encoded_dataset['train'][0]
print(example.keys())

dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'labels'])


In [21]:
tokenizer.decode(example['input_ids'])

'[CLS] what is the factual inaccuracy please explain or i will remove the tag [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]'

In [22]:
example['labels']

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

In [23]:
[id2label[idx] for idx, label in enumerate(example['labels']) if label == 1.0]

[]

Finally, we set the format of our data to PyTorch tensors. This will turn the training, validation and test sets into standard PyTorch [datasets](https://pytorch.org/docs/stable/data.html). 

In [24]:
encoded_dataset.set_format("torch")

## Define model

Here we define a model that includes a pre-trained base (i.e. the weights from bert-base-uncased) are loaded, with a random initialized classification head (linear layer) on top. One should fine-tune this head, together with the pre-trained base on a labeled dataset.

This is also printed by the warning.

We set the `problem_type` to be "multi_label_classification", as this will make sure the appropriate loss function is used (namely [`BCEWithLogitsLoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html)). We also make sure the output layer has `len(labels)` output neurons, and we set the id2label and label2id mappings.

In [25]:
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                           problem_type="multi_label_classification", 
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id)

#model = torch.load('Colab_cats_model_202304131403').cuda()                                                           

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

## Train the model!

We are going to train the model using HuggingFace's Trainer API. This requires us to define 2 things: 

* `TrainingArguments`, which specify training hyperparameters. All options can be found in the [docs](https://huggingface.co/transformers/main_classes/trainer.html#trainingarguments). Below, we for example specify that we want to evaluate after every epoch of training, we would like to save the model every epoch, we set the learning rate, the batch size to use for training/evaluation, how many epochs to train for, and so on.
* a `Trainer` object (docs can be found [here](https://huggingface.co/transformers/main_classes/trainer.html#id1)).

In [26]:
batch_size = 16
metric_name = "f1"

In [27]:


args = TrainingArguments(
    f"bert-finetuned-sem_eval-english",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=2,#5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
    #push_to_hub=True,
)

We are also going to compute metrics while training. For this, we need to define a `compute_metrics` function, that returns a dictionary with the desired metric values.

In [28]:

# source: https://jesusleal.io/2021/04/21/Longformer-multilabel-classification/
def multi_label_metrics(predictions, labels, threshold=0.5):
    # first, apply sigmoid on predictions which are of shape (batch_size, num_labels)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions))
    # next, use threshold to turn them into integer predictions
    y_pred = np.zeros(probs.shape)
    y_pred[np.where(probs >= threshold)] = 1
    # finally, compute metrics
    y_true = labels
    f1_micro_average = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
    roc_auc = roc_auc_score(y_true, y_pred, average = 'micro')
    accuracy = accuracy_score(y_true, y_pred)
    # return as dictionary
    metrics = {'f1': f1_micro_average,
               'roc_auc': roc_auc,
               'accuracy': accuracy}
    return metrics

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, 
            tuple) else p.predictions
    result = multi_label_metrics(
        predictions=preds, 
        labels=p.label_ids)
    return result

Let's verify a batch as well as a forward pass:

In [29]:
encoded_dataset['train'][0]['labels'].type()

'torch.FloatTensor'

In [30]:
encoded_dataset['train']['input_ids'][0]

tensor([  101,  2054,  2003,  1996, 25854, 27118,  9468,  4648,  5666,  3531,
         4863,  2030,  1045,  2097,  6366,  1996,  6415,   102,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0])

In [31]:
#forward pass
#outputs = model(input_ids=encoded_dataset['train']['input_ids'][0].unsqueeze(0), labels=encoded_dataset['train'][0]['labels'].unsqueeze(0))
#outputs

Let's start training!

In [32]:
trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

In [33]:
#trainer.train()

## Evaluate

After training, we evaluate our model on the validation set.

In [34]:
#trainer.evaluate()

## Inference

Let's test the model on a new sentence:

The logits that come out of the model are of shape (batch_size, num_labels). As we are only forwarding a single sentence through the model, the `batch_size` equals 1. The logits is a tensor that contains the (unnormalized) scores for every individual label.

To turn them into actual predicted labels, we first apply a sigmoid function independently to every score, such that every score is turned into a number between 0 and 1, that can be interpreted as a "probability" for how certain the model is that a given class belongs to the input text.

Next, we use a threshold (typically, 0.5) to turn every probability into either a 1 (which means, we predict the label for the given example) or a 0 (which means, we don't predict the label for the given example).

In [40]:
model = torch.load(r'/kaggle/working/Colab_model_202305150433').cuda()

In [82]:
test_all = pd.read_csv('/kaggle/input/cleaned-toxic-comments/test_preprocessed.csv')
test_all

Unnamed: 0,comment_text,id,identity_hate,insult,obscene,set,severe_toxic,threat,toxic,toxicity
0,yo bitch ja rule is more succesful then you ll...,00001cee341fdb12,,,,test,,,,
1,from rfc the title is fine as it is imo,0000247867823ef7,,,,test,,,,
2,sources zawe ashton on lapland,00013b17ad220c46,,,,test,,,,
3,if you have a look back at the source the in...,00017563c3f7919a,,,,test,,,,
4,i don t anonymously edit articles at all,00017695ad8997eb,,,,test,,,,
...,...,...,...,...,...,...,...,...,...,...
153159,i totally agree this stuff is nothing but t...,fffcd0960ee309b5,,,,test,,,,
153160,throw from out field to home plate does it g...,fffd7a9a6eb32c16,,,,test,,,,
153161,okinotorishima categories i see your changes ...,fffda9e8d6fafa9e,,,,test,,,,
153162,one of the founding nations of the eu germany...,fffe8f1340a79fc2,,,,test,,,,


In [83]:
empty_df = pd.DataFrame({k:[] for k in targets})
zero_df = pd.DataFrame({k:[0] for k in targets})
df_prob = empty_df
df_pos = empty_df


for text in test_all.comment_text.tolist():
    if text!=text:
        df_prob = pd.concat([df_prob, zero_df], axis=0)
        df_pos = pd.concat([df_pos, zero_df], axis=0)
    else:
        encoding = tokenizer(text,  return_tensors="pt", padding=True, truncation=True,max_length=512, add_special_tokens = True)
        encoding = {k: v.to(trainer.model.device) for k,v in encoding.items()}
        #encoding = {i:torch.tensor([encoding[i][0][:512]]) for i in ['input_ids', 'token_type_ids', 'attention_mask']}

        outputs = model(**encoding)

        logits = outputs.logits
        logits.shape

        # apply sigmoid + threshold
        sigmoid = torch.nn.Sigmoid()
        probs = sigmoid(logits.squeeze().cpu())

        predictions = np.zeros(probs.shape)
        predictions[np.where(probs >= 0.5)] = 1
        
        df_prob = pd.concat([df_prob, pd.DataFrame({k:[v] for k,v in zip(targets, probs.tolist())})], axis=0)
        df_pos = pd.concat([df_pos, pd.DataFrame({k:[v] for k,v in zip(targets, predictions.tolist())})], axis=0)

        #print(predicted_labels, len(encoding['input_ids'][0]))

In [84]:
test_all.id

0         00001cee341fdb12
1         0000247867823ef7
2         00013b17ad220c46
3         00017563c3f7919a
4         00017695ad8997eb
                ...       
153159    fffcd0960ee309b5
153160    fffd7a9a6eb32c16
153161    fffda9e8d6fafa9e
153162    fffe8f1340a79fc2
153163    ffffce3fb183ee80
Name: id, Length: 153164, dtype: object

In [85]:
df_pos

Unnamed: 0,identity_hate,insult,obscene,severe_toxic,threat,toxic
0,0.0,1.0,1.0,1.0,0.0,1.0
0,0.0,0.0,0.0,0.0,0.0,0.0
0,0.0,0.0,0.0,0.0,0.0,0.0
0,0.0,0.0,0.0,0.0,0.0,0.0
0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...
0,0.0,0.0,0.0,0.0,0.0,1.0
0,0.0,0.0,0.0,0.0,0.0,0.0
0,0.0,0.0,0.0,0.0,0.0,0.0
0,0.0,0.0,0.0,0.0,0.0,0.0


In [86]:
df_prob_out = df_prob.reset_index(drop=True).copy()
df_prob_out['id'] = test_all.reset_index(drop=True).id
df_pos_out = df_pos.reset_index(drop=True).copy()
df_pos_out['id'] = test_all.reset_index(drop=True).id


In [87]:
df_prob_out

Unnamed: 0,identity_hate,insult,obscene,severe_toxic,threat,toxic,id
0,0.411027,0.970825,0.991723,0.561212,0.094651,0.998618,00001cee341fdb12
1,0.000284,0.000304,0.000319,0.000228,0.000248,0.000568,0000247867823ef7
2,0.000234,0.000304,0.000315,0.000185,0.000207,0.000723,00013b17ad220c46
3,0.000289,0.000312,0.000322,0.000231,0.000266,0.000557,00017563c3f7919a
4,0.000205,0.000301,0.000395,0.000158,0.000160,0.001088,00017695ad8997eb
...,...,...,...,...,...,...,...
153159,0.001041,0.021782,0.328867,0.002356,0.000553,0.626985,fffcd0960ee309b5
153160,0.000203,0.000459,0.000517,0.000109,0.000121,0.003841,fffd7a9a6eb32c16
153161,0.000247,0.000288,0.000379,0.000213,0.000215,0.000589,fffda9e8d6fafa9e
153162,0.000490,0.000397,0.000360,0.000200,0.000245,0.001079,fffe8f1340a79fc2


In [88]:
df_pos_out

Unnamed: 0,identity_hate,insult,obscene,severe_toxic,threat,toxic,id
0,0.0,1.0,1.0,1.0,0.0,1.0,00001cee341fdb12
1,0.0,0.0,0.0,0.0,0.0,0.0,0000247867823ef7
2,0.0,0.0,0.0,0.0,0.0,0.0,00013b17ad220c46
3,0.0,0.0,0.0,0.0,0.0,0.0,00017563c3f7919a
4,0.0,0.0,0.0,0.0,0.0,0.0,00017695ad8997eb
...,...,...,...,...,...,...,...
153159,0.0,0.0,0.0,0.0,0.0,1.0,fffcd0960ee309b5
153160,0.0,0.0,0.0,0.0,0.0,0.0,fffd7a9a6eb32c16
153161,0.0,0.0,0.0,0.0,0.0,0.0,fffda9e8d6fafa9e
153162,0.0,0.0,0.0,0.0,0.0,0.0,fffe8f1340a79fc2


In [89]:
df_prob_out = df_prob_out[['id']+targets]
df_pos_out = df_pos_out[['id']+targets]

In [90]:
df_prob_out.to_csv('submission_prob.csv', index=False)
df_pos_out.to_csv('submission_pos.csv', index=False)