# Final solution
by Domrachev Ivan, B20-RO-01

In [1]:
from datasets import load_dataset

# My packages
# Love you, python <3
import sys 
sys.path.append('../src/')
# The packages themselves
from models.llama.inference import llama2_predict 
from models.t5_small.inference import t5_predict
# from models.t5_small.train import train

This notebook provides a guide on training and running inference of `t5-small` and `llama2-7b-chat` models. Both of the models are already finetuned, so you can start right from the inference part without specifying the path to the model -- it will automatically pull it from huggingface.co 🤗!

## Dataset

There are two options:
1. Create new instance of the dataset
2. Load it from huggingface

This is the first one:

In [None]:
from data.make_dataset import DetoxDataset

# This command would access the tsv file and then save it to the specified folder 
DetoxDataset(dataset_sv_fname='../data/raw/filtered.tsv', dataset_arrow_fdir='data/')

And this is the second:

In [12]:
ds = load_dataset('domrachev03/toxic_comments_subset')
ds

DatasetDict({
    train: Dataset({
        features: ['reference', 'translation', 'similarity', 'lenght_diff', 'ref_tox', 'trn_tox'],
        num_rows: 156516
    })
    test: Dataset({
        features: ['reference', 'translation', 'similarity', 'lenght_diff', 'ref_tox', 'trn_tox'],
        num_rows: 17391
    })
})

## T5-small

### Training

> *Note*. Unfortunately, I have messed up my system, and now I'm not able to launch everything, requiring `bitsandbytes` library. Therefore, there is a little possibility that this code would not actually launch. However, one might refer to the notebooks, which are 100% functional

In [None]:
model = train(ds)

# and then...
# model.save_pretrained("path/to/save", from_pt=True)
# or
# model.push_to_hub("where/to/push")

### Inference

In [3]:
t5_predict(["I fucking hate you!", "Fuck off, I'm busy!"], device='cpu')

['I hate you!', "I'm busy!"]

## Llama2

### Modal.com training & inference 

First of all, you have to gain access to the service:
1. Regiseter in [modal.com](https://modal.com/) (1 minute, requires GitHub authentication)
2. Enter secret from Huggingface (enter the hf token in the `HUGGINGFACE_TOKEN` field and name it `huggingface`), which could be found in the `Settings/API tokens`.
 
The tool is much easier to use via the terminal, because it generates way too much output. Here is the list of commands to launch it in CLI (and corresponding cell with these commands):
```bash
# Authorization in modal account
modal token new   
# Launch training process
modal run src/models/llama/train_modal.py --dataset llama2_dataset.py --base chat7 --run-id chat7-nontoxic
# Copying PEFT pretrained model from modal cloud to local dir
modal volume get example-results-vol 'chat7-nontoxic/*' models/llama2 
# Running inference for the model in cloud
modal run inference.py --base chat7 --run-id chat7-nontoxic --prompt "[INST]<<SYS>>\nYou are a Twitch moderator that paraphrases sentences to be non-toxic.\n<<SYS>> \n\nCould you paraphrase this: ...?\n [/INST]"

```

### Local inference 

If your PC is powerful enough, then you might try to launch the inference offline using `llama.inference.llama2_predict`. Note, that even with quantified version, execution requires 16 Gb of VRAM. Moreover, it could not launch on the CPU, since the quantization is available only for GPU

In [None]:
from models.llama.inference import llama2_predict

# God bless you
llama2_predict(["I fucking hate you!", "Fuck off, I'm busy!"])