# Quantized Bonito Tutorial
This tutorial shows how to run a quantized version of [Bonito](https://github.com/BatsResearch/bonito) on a Google Colab T4 instance using the `transformers` package (instead of `vllm` as in the original repo). We use the quantized model from [alexandreteles/bonito-v1-awq](https://huggingface.co/alexandreteles/bonito-v1-awq). Note that the quantized models may behave differently than their non-quantized counterparts.

If you wish to run the original Bonito model on A100 GPUs, check out [this tutorial](https://colab.research.google.com/drive/1XuDRVKpUUqdjrqg2-P2FIqkdAQBnqoNL?usp=sharing).


## Setup
First we clone into the repo and install the dependencies. This will take several minutes.

In [None]:
!git clone https://github.com/BatsResearch/bonito.git
!pip install -U bonito/

To use this quantized model, we need to install the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) package, which deals with AWQ ([Activation-aware Weight Quantization](https://arxiv.org/abs/2306.00978)) models, such as the one we'll be using. AWQ is a quantization technique that treats different weight parameters differently based on their importance. To get it to work with Colab, we have to install the kernel from a specialized wheel so the CUDA versions match.

In [None]:
!pip install autoawq
!git clone https://github.com/Boltuzamaki/AutoAWQ_kernels.git
!pip install AutoAWQ_kernels/builds/autoawq_kernels-0.0.6+cu122-cp310-cp310-linux_x86_64.whl

## Quantized Bonito Wrapper
This is a simplified quantized bonito class to generate a single synthetic input-output instruction for a given text and task type.
This code uses huggingface `transformers` library for generation.
For complete functionality and faster generations, we recommend using the `Bonito` class from the package.

In [5]:
from typing import Optional, List, Dict
from datasets import Dataset
from awq import AutoAWQForCausalLM
from bonito import AbstractBonito
from transformers import AutoTokenizer


class QuantizedBonito(AbstractBonito):
    def __init__(self, model_name_or_path):
        self.model = AutoAWQForCausalLM.from_quantized(
            model_name_or_path, fuse_layers=True
        ).cuda()
        self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

    def generate_task(
        self,
        unannotated_paragraph: str,
        task_type: str,
        sampling_params: dict,
    ) -> Dict:
        """
        Generates synthetic instruction tuning pair using the Quantized Bonito model.
        This method takes a text unannotated text, a task type, and sampling parameters,
        and generates synthetic input-output pair.

        Args:
            unannotated_paragraph (str): The unannotated text or a paragraph
            task_type (str): The type of the tasks. This can be a
                short form or a full form.
            sampling_params (dict): The parameters for
                sampling.
            **kwargs: Additional keyword arguments.

        Returns:
            Dict: The synthetic input-output pair for the task type.
        """

        text_dataset = Dataset.from_list([{"input": unannotated_paragraph}])

        processed_dataset = self._prepare_bonito_input(
            text_dataset, task_type, context_col="input"
        )

        outputs = self._generate_text(processed_dataset["input"], sampling_params)
        examples = []
        for i, example in enumerate(text_dataset.to_list()):
            output = outputs[i]
            example["prediction"] = output.strip()
            examples.append(example)

        synthetic_dataset = Dataset.from_list(examples)

        # filter out the examples that cannot be parsed
        synthetic_dataset_dict = self._postprocess_dataset(
            synthetic_dataset, context_col="input"
        ).to_list()[0]

        return synthetic_dataset_dict

    def _generate_text(
        self,
        dataset: Dataset,
        sampling_params: dict,
    ) -> List[str]:
        """
        Generate text using huggingface transformers generate function.

        This method takes a dataset of prompts, encodes them,
        generates text using the model, decodes the generated
        text, and appends it to a list.

        Args:
            dataset (Dataset): A dataset containing prompts for text generation.
            sampling_params (dict): Parameters for sampling during generation.

        Returns:
            List[str]: A list of generated texts corresponding to the prompts.
        """
        generated_texts = []

        for prompt in dataset:
            input_ids = self.tokenizer.encode(prompt, return_tensors="pt")
            input_ids = input_ids.cuda()

            output = self.model.generate(input_ids, do_sample=True, **sampling_params)

            generated_text = self.tokenizer.decode(
                output[0][len(input_ids[0]) :], skip_special_tokens=True
            )
            generated_texts.append(generated_text)

        return generated_texts

## Load the Bonito Model
Load the quantized Bonito model from HuggingFace Hub.

In [None]:
bonito = QuantizedBonito("alexandreteles/bonito-v1-awq")

## Synthetic Data Generation
Here we will load the quantized bonito model and generate synthetic instruction for the unannotated text.



### Sample Text
We select the sample text from the ContractNLI dataset. You can replace this text as you wish.

In [7]:
from pprint import pprint

unannotated_paragraph = """1. “Confidential Information”, whenever used in this Agreement, shall mean any data, document, specification and other information or material, that is delivered or disclosed by UNHCR to the Recipient in any form whatsoever, whether orally, visually in writing or otherwise (including computerized form), and that, at the time of disclosure to the Recipient, is designated as confidential."""
pprint(unannotated_paragraph)

('1. “Confidential Information”, whenever used in this Agreement, shall mean '
 'any data, document, specification and other information or material, that is '
 'delivered or disclosed by UNHCR to the Recipient in any form whatsoever, '
 'whether orally, visually in writing or otherwise (including computerized '
 'form), and that, at the time of disclosure to the Recipient, is designated '
 'as confidential.')


### Generate the synthetic instructions
After loading the model, we pass the unannotated paragraph and the task type to generate the instructions.
Here we generate an NLI task:

In [None]:
from transformers import set_seed

# making predictions deterministic.
set_seed(2)

# Generate synthetic instruction tuning dataset
sampling_params = {
    "max_new_tokens": 256,
    "top_p": 0.95,
    "temperature": 0.7,
    "num_return_sequences": 1,
}
synthetic_dataset = bonito.generate_task(
    unannotated_paragraph, task_type="nli", sampling_params=sampling_params
)
pprint("----Generated Instructions----")
pprint(f'Input: {synthetic_dataset["input"]}')
pprint(f'Output: {synthetic_dataset["output"]}')

Now we change the task type from NLI (nli) to multiple choice question answering (mcqa). For more details on task types, see [supported task types](https://github.com/BatsResearch/bonito?tab=readme-ov-file#supported-task-types)

In [None]:
# making predictions deterministic.
set_seed(55)
sampling_params = {
    "max_new_tokens": 256,
    "top_p": 0.95,
    "temperature": 0.7,
    "num_return_sequences": 1,
}
synthetic_dataset = bonito.generate_task(
    unannotated_paragraph, task_type="mcqa", sampling_params=sampling_params  # changed
)
pprint("----Generated Instructions----")
pprint(f'Input: {synthetic_dataset["input"]}')
pprint(f'Output: {synthetic_dataset["output"]}')

Now go try it out with your own datasets! You can vary the `task_type` for different types of generated instructions.
You can also play around the sampling hyperparameters such as `top_p` and `temperature`.