# Llama2 7b tuning & inference

This is more about exploring the possibilities of finetuning than an actual solution. Nevertheless, all the metrics would be computed

## Llama2 7b tuning

For the tuning part, the resourses of the Kaggle or Colab seems not to be quite enough. Therefore, one need to seek for other sourses. 

I've decided to use service called [modal.com](https://modal.com/) and utilize their computational power to run the evaluations.


To gain access to the service, you have to:
1. Regiseter in modal.com (1 minute, requires GitHub authentication)
2. Enter secret from Huggingface (enter the hf token in the `HUGGINGFACE_TOKEN` field and name it `huggingface`).
 
Their tool is much easier to use via the terminal, so here is the list of commands to launch it in CLI (and corresponding cell with these commands):
```bash
pip install modal # the only dependency in the code
modal token new   # this will open modal's tab in the browser and automatically authorize you
modal run src/data/llama/train_modal.py --dataset llama2_dataset.py --base chat7 --run-id chat7-nontoxic

```

In [None]:
!pip install peft
!pip install --upgrade bitsandbytes
!pip install --upgrade accelerate

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from collections.abc import Iterable
from tqdm.auto import trange
import torch
import numpy as np
import peft
import transformers, accelerate, bitsandbytes



In [3]:
def wrap_messages(msgs):
    B_INST, E_INST = "[INST] ", " [/INST]"
    B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
    prefixed_queries = [
        B_INST
        + B_SYS
        + "You are a Twitch moderator that paraphrases sentences to be non-toxic.\n"
        + E_SYS
        + "Could you paraphrase this: "
        + msg
        + "?\n"
        + E_INST
        for msg in msgs
    ]
    return prefixed_queries


def predict(requests, greb_answer = False, batch_size = 1, max_length = 64):
    requests = wrap_messages(requests)
    
    model = AutoModelForCausalLM.from_pretrained(
        'daryl149/llama-2-7b-chat-hf', 
        load_in_4bit=True, 
        bnb_4bit_compute_dtype=torch.float16
    )
    model.load_adapter('domrachev03/llama2_7b_detoxification')
    model.eval()
        
    tokenizer = AutoTokenizer.from_pretrained('daryl149/llama-2-7b-chat-hf')
    tokenizer.pad_token = tokenizer.eos_token
    
    
    results = []
    for i in trange(0, len(requests), batch_size):
        batch = [t for t in requests[i: i + batch_size]]
        inputs = tokenizer(
            batch, 
            padding=True, 
            truncation=True, 
            max_length = max_length, 
            return_tensors='pt'
        ).input_ids.to(model.device)
        
        with torch.no_grad():
            out = model.generate(inputs, max_new_tokens=max_length+1)
            decoded = [tokenizer.decode(out_i, skip_special_tokens=True,temperature=0) for out_i in out]
            
            if greb_answer:
                decoded = [decoded[i][len(batch[i]):decoded[i].find('</s>')] for i in range(batch_size)]
            results.extend(decoded)
    
    return results

In [4]:
queries = ['Fuck you!', 'This freaking chair makes me nuts', 'This fucking sause, I love it', 'I hate gays']

predict(queries, greb_answer=True, batch_size=2)

Downloading (…)lve/main/config.json:   0%|          | 0.00/507 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Downloading (…)/adapter_config.json:   0%|          | 0.00/449 [00:00<?, ?B/s]

Downloading adapter_model.bin:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

  0%|          | 0/2 [00:00<?, ?it/s]

["you're crazy!",
 'this chair makes me crazy.',
 'this sauce, I love it.',
 'I hate gays.']

## Loading the dataset

In [5]:
import datasets

dataset = datasets.load_dataset("domrachev03/toxic_comments_subset")

Downloading and preparing dataset parquet/domrachev03--toxic_comments_subset to /root/.cache/huggingface/datasets/parquet/domrachev03--toxic_comments_subset-482f891ea0d5a6ed/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/34.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.83M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/parquet/domrachev03--toxic_comments_subset-482f891ea0d5a6ed/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

In [9]:
n_test = 1000
test_subset = dataset['test'].select(range(n_test))

In [10]:
test_preds = predict([*test_subset['reference']], greb_answer=True, batch_size=10)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

## Metrics & saving

In [13]:
!pip install sacrebleu
!pip install evaluate

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: evaluate
Successfully installed evaluate-0.4.1


In [19]:
cleanup()

In [20]:
import gc
import tqdm
from tqdm.auto import trange
import torch
import numpy as np

from transformers import AutoModelForSequenceClassification, AutoTokenizer, \
    RobertaTokenizer, RobertaForSequenceClassification

import evaluate


def cleanup():
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()


def get_toxicity(preds, soft=False):
    results = []

    model_name = 'SkolkovoInstitute/roberta_toxicity_classifier'

    tokenizer = RobertaTokenizer.from_pretrained(model_name)
    model = RobertaForSequenceClassification.from_pretrained(model_name)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)

    model.eval()
    for i in tqdm.tqdm(range(0, len(preds), 1)):
        batch = tokenizer(preds[i:i + 1], return_tensors='pt', padding=True).to(device)

        with torch.no_grad():
            logits = model(**batch).logits
            out = torch.softmax(logits, -1)[:, 1].cpu().numpy()
            results.append(out)
    return np.concatenate(results)


def get_sacrebleu(inputs, preds):
    metric = evaluate.load("sacrebleu")

    result = metric.compute(predictions=preds, references=inputs)
    return result['score']


def get_fluency(preds, soft=False):
    path = 'cointegrated/roberta-large-cola-krishna2020'

    model = AutoModelForSequenceClassification.from_pretrained(path)
    tokenizer = AutoTokenizer.from_pretrained(path)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)

    results = []
    bs = 1
    for i in trange(0, len(preds), bs):
        batch = [t for t in preds[i: i + bs]]
        inputs = tokenizer(batch, padding=True, truncation=True, return_tensors='pt').to(device)
        with torch.no_grad():
            out = torch.softmax(model(**inputs).logits, -1)[:, 0].cpu().numpy()
            results.append(out)
    return np.concatenate(results)


def compute_metrics(eval_preds, tokenizer=None, print_results=False):
    preds, labels = eval_preds
    
    if tokenizer is not None:
        detokenized_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
        detokenized_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    else:
        detokenized_preds = preds
        detokenized_labels = labels

    results = {}
    toxicity_per_sent = get_toxicity(detokenized_preds)
    results['avg_toxic'] = sum(toxicity_per_sent) / len(toxicity_per_sent)
    cleanup()

    results['bleu'] = get_sacrebleu(detokenized_labels, detokenized_preds) / 100
    cleanup()

    fluency_per_sent = get_fluency(preds)
    results['fluency'] = sum(fluency_per_sent) / len(preds)
    cleanup()

    # count metrics
    results['joint'] = sum(toxicity_per_sent * results['bleu'] * fluency_per_sent) / len(preds)
    if print_results:
        print("--------------")
        print("Metric   | Value")
        print("--------------")
        print(f"toxic    | {results['avg_toxic']:.2f}")
        print(f"bleu (n) | {results['bleu']:.2f}")
        print(f"fluency  | {results['fluency']:.2f}")
        print("===============")
        print(f"Total    | {results['joint']:.2f}")
        print("--------------")
    return results

In [15]:
labels_list = [*test_subset['translation']]

In [21]:
compute_metrics((test_preds, labels_list), print_results=True)

Some weights of the model checkpoint at SkolkovoInstitute/roberta_toxicity_classifier were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 1000/1000 [00:09<00:00, 105.50it/s]


Downloading (…)lve/main/config.json:   0%|          | 0.00/628 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/289 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

--------------
Metric   | Value
--------------
toxic    | 0.28
bleu (n) | 0.21
fluency  | 0.88
Total    | 0.05
--------------


{'avg_toxic': 0.28193867359716024,
 'bleu': 0.2141521479259361,
 'fluency': 0.8797155001051724,
 'joint': 0.05208585016087908}

In [28]:
import pandas as pd

preds_dict = pd.DataFrame([[orig_i, pred_i, label_i] for orig_i, pred_i, label_i in zip(test_subset['reference'], test_preds, labels_list)], columns=['Query', 'Predictions', 'Original values'])

preds_dict.head(10)

Unnamed: 0,Query,Predictions,Original values
0,and he smoked all this shit himself.,And he smoked it all himself.,And smoked all of this weed by myself.
1,"Mother told me that Dad, the oldest of five ch...",T]Mother told me that Dad was the oldest of fi...,"my mother told me that Papa, the eldest of fiv..."
2,What the hell did you say to Garrett?,what did you say to Garrett?,what did you tell him?
3,"Otherwise, there could be a strong incentive f...",ncentive for the issuer to deceive the investor.,"otherwise, the issuer could be highly motivate..."
4,I've been a pig for 19 months.,I've been a pig for 19 months.,I'm a cop 1 9 months.
5,"and by the time, the authorities would be usel...","d by the time the authorities got there, they'...","By that time, the organs would be of no use, s..."
6,"oh, shit!","Oh, my God!",Oh no!
7,Shake his fucking hand and we can all go home.,shake his hand and we can all go home.,to shake his hand and we can all go home happy.
8,people tend to be cruel to intellectually disa...,People tend to be cruel to intellectually disa...,People can be cruel to people intellectually c...
9,all these things are dead.,All this is dead.,All this stuff is dead.


In [33]:
preds_dict.to_csv('llama_test.csv')