<a href="https://colab.research.google.com/github/cahya-wirawan/lora/blob/main/Bloomz_Inference_Comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inference Comparison between Bloomz Language Models
We make inference comparison between Bloomz finetuned with Anh instruction dataset and its Bloomz original. The inference is using several supported and unsupported languages by Bloom.

In [1]:
!pip install -q bitsandbytes datasets accelerate loralib deepspeed
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git
!pip install -q mtranslate

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.2/84.2 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 KB[0m [31m36.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.3/215.3 KB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m765.4/765.4 KB[0m [31m61.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.8/199.8 KB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m62.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 KB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.2/212.2 KB[0m [31m23.9 MB/s

In [1]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, set_seed
from transformers import set_seed
from mtranslate import translate
import os


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)


In [2]:
!nvidia-smi

Tue Mar 28 09:34:50 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   29C    P0    46W / 400W |      3MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
class LMInstruction():
    def __init__(self, model_name, lora=True):
        self.model_name = model_name
        if lora:
            self.config = PeftConfig.from_pretrained(model_name)
            self.model = AutoModelForCausalLM.from_pretrained(self.config.base_model_name_or_path, return_dict=True, 
                                                              load_in_8bit=True, device_map='auto')
            self.tokenizer = AutoTokenizer.from_pretrained(self.config.base_model_name_or_path)
            self.model = PeftModel.from_pretrained(self.model, self.model_name)
        else:
            self.model = AutoModelForCausalLM.from_pretrained(model_name, return_dict=True, 
                                                              load_in_8bit=True, device_map='auto')
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        
    def generate(self, text, is_instruction=True):
        if is_instruction:
            prompt = f"User: {text}\nAssistant: "
        else:
            prompt = f"{text}\n"
        batch = self.tokenizer(prompt, return_tensors='pt').to(0)
        with torch.cuda.amp.autocast():
            output_tokens = self.model.generate(**batch, max_new_tokens=200,
                                               min_length=50,
                                               do_sample=True,
                                               top_k=40,
                                               top_p=0.9,
                                               temperature=0.2,
                                               repetition_penalty=1.2,
                                               num_return_sequences=1)
        result = self.tokenizer.decode(output_tokens[0], skip_special_tokens=True).strip()
        if is_instruction:
            results = result.split("Assistant: ")
            result_en = f"User: {translate(text)}\nAssistant: {translate(results[1])}"
        else:
            result_en = f"{translate(text)}\n{translate(result[len(text)+1:])}"
        print('Original answer:\n', result)
        print('\nEnglish translation:\n', result_en)
        
    def model_name(self):
        return self.model_name()
        

## Set the languages and its instruction
We set the list of supported and unsupported languages, and also the instructions list in English which will be translated using Google translation to the target language before we do the inference.

In [4]:
languages = {
    "supported": ["en", "id", "fr", "es", "hi", "vi"],
    "unsupported": ["de", "tr", "ko", "ru"]
}

instructions = [
    "How old is the universe?",
    "Explain what quantum computing is.",
    "How to educate children to be honest?",
    "Write a list of 5 most beautiful places in the world.",
    "When did the first humans live?"
]

## Inference of the finetuned "bigscience/bloomz-7b1-mt" with instruction dataset

In [5]:
bloom = LMInstruction("cahya/bloomz-7b1-mt")
set_seed(42)
for lang_support in sorted(list(languages)):
    print(f"# Languages that is {lang_support} by Bloom")
    for lang in languages[lang_support]:
        print(f"## Language {lang}")
        for i, instruction in enumerate(instructions):
            print(f"### {i+1}.")
            text = translate(instruction, lang)
            bloom.generate(text, is_instruction=True)
            print()        



# Languages that is supported by Bloom
## Language en
### 1.




Original answer:
 User: How old is the universe?
Assistant: 宇宙的年龄是137亿年。它是由一个被称为暗物质和暗能量的未知实体组成的,这些实体占宇宙中所有物质的99.8%以上。 暗能量是一种完全看不见的能量,可以驱动宇宙运动并产生潮汐力。

English translation:
 User: How old is the universe?
Assistant: The age of the universe is 13.7 billion years. It is made up of unknown entities known as dark matter and dark energy, which make up more than 99.8 percent of all matter in the universe. Dark energy is a completely invisible energy that can drive the motion of the universe and generate tidal forces.

### 2.
Original answer:
 User: Explain what quantum computing is.
Assistant: 量子计算是一种新的计算机类型,它使用被称为"量子比特"的微小的粒子来执行操作。这些粒子是原子大小的物质,它们可以存储信息并对其进行处理。

English translation:
 User: Explain what quantum computing is.
Assistant: Quantum computing is a new type of computer that uses tiny particles called "qubits" to perform operations. These particles are atomic-sized matter that can store information and process it.

### 3.
Original answer:
 User: How to educate children to be honest?
As

## Inference of the original "bigscience/bloomz-7b1-mt"

In [6]:
bloom = LMInstruction("bigscience/bloomz-7b1-mt", lora=False)
set_seed(42)
for lang_support in sorted(list(languages)):
    print(f"# Languages that is {lang_support} by Bloom")
    for lang in languages[lang_support]:
        print(f"## Language {lang}")
        for i, instruction in enumerate(instructions):
            print(f"### {i+1}.")
            text = translate(instruction, lang)
            bloom.generate(text, is_instruction=False)
            print() 



# Languages that is supported by Bloom
## Language en
### 1.
Original answer:
 How old is the universe?
 13.7 billion years old, according to NASA's Hubble Space Telescope (HST) and other astronomical observations.[1][2][3][4] The age of the Universe has been calculated using a variety of methods including:[5]

English translation:
 How old is the universe?
13.7 billion years old, according to NASA's Hubble Space Telescope (HST) and other astronomical observations.[1][2][3][4] The age of the Universe has been calculated using a variety of methods including:[5]

### 2.
Original answer:
 Explain what quantum computing is.
Quantum computers are machines that can solve problems using the laws of physics. They use a different way to process information than conventional computers do, and they have been shown to be much faster at doing so.

What does it take for a computer to work?
 Energy

English translation:
 Explain what quantum computing is.
Quantum computers are machines that can solve