<a href="https://colab.research.google.com/github/alexcpn/tranformer_learn/blob/main/bloom_3b_quant_overfitting_train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers==4.28.1
!pip install accelerate
!pip install bitsandbytes
!pip install peft
!pip install pynvml

In [2]:
from pynvml import *
import torch

def print_gpu_utilization():
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(handle)
    print(f"GPU memory occupied: {info.used//1024**2} MB.")


def print_summary(result):
    print(f"Time: {result.metrics['train_runtime']:.2f}")
    print(f"Samples/second: {result.metrics['train_samples_per_second']:.2f}")
    print_gpu_utilization()

torch.ones((1, 1)).to("cuda")
print_gpu_utilization()


GPU memory occupied: 363 MB.


In [3]:
#upload files to your colab environment
!wget https://raw.githubusercontent.com/alexcpn/tranformer_learn/main/data/small_3.txt
#!wget https://gist.githubusercontent.com/alexcpn/54e88130f9d186494f1c3ce5e83263b4/raw/7cdf5f93b819024c58a891fc808fbdbe052d0eb1/small_3_mixed.txt

--2023-06-20 07:20:10--  https://raw.githubusercontent.com/alexcpn/tranformer_learn/main/data/small_3.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56513 (55K) [text/plain]
Saving to: ‘small_3.txt.2’


2023-06-20 07:20:10 (24.4 MB/s) - ‘small_3.txt.2’ saved [56513/56513]



In [4]:
train_path = 'small_3.txt'


In [5]:
from transformers import TextDataset,DataCollatorForLanguageModeling
from transformers import AutoTokenizer

def load_dataset(path,tokenizer):
    dataset = TextDataset(
          tokenizer=tokenizer,
          file_path=path,
          block_size=128)

    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, mlm=False,
    )
    return dataset,data_collator

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b")
train_dataset,data_collator = load_dataset(train_path,tokenizer)
print_gpu_utilization()

GPU memory occupied: 363 MB.




In [6]:
# for quantised loading
from torch import float32, nn, exp

class CastOutputToFloat(nn.Sequential):
    def forward(self, x):
        return super().forward(x).to(float32)


def prepare_model(model):
    for param in model.parameters():
      param.requires_grad = False  # freeze the model - train adapters later
      if param.ndim == 1:
        # cast the small parameters (e.g. layernorm) to fp32 for stability
        param.data = param.data.to(float32)
    model.gradient_checkpointing_enable()  # reduce number of stored activations
    model.enable_input_require_grads()
    model.lm_head = CastOutputToFloat(model.lm_head)
    return model

In [7]:
from transformers import Trainer, TrainingArguments,AutoModelWithLMHead
from peft import LoraConfig, PeftModel, PeftConfig, get_peft_model
import bitsandbytes as bnb

lora_config = {
    "r": 16,# attention heads
    "lora_alpha": 32, # alpha scaling
    "lora_dropout": 0.05,
    'bias': "none",
    "task_type": "CAUSAL_LM", # set this for CLM or Seq2Seq

}


model = AutoModelWithLMHead.from_pretrained("bigscience/bloom-3b", device_map='auto',load_in_8bit=True)
model = prepare_model(model)
model = get_peft_model(model, LoraConfig(**lora_config))
#print(f"Model trainable parameters:\n {print_trainable_parameters(model)}")

print_gpu_utilization()



Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


GPU memory occupied: 4087 MB.


In [9]:

training_args = TrainingArguments(
    output_dir="./bloom-3b-small3-v1", #The output directory
    overwrite_output_dir=True, #overwrite the content of the output directory
    num_train_epochs=50, # number of training epochs
    per_device_train_batch_size=4, # batch size for training
    per_device_eval_batch_size=4,  # batch size for evaluation
    eval_steps = 400, # Number of update steps between two evaluations.
    save_steps=800, # after # steps model is saved
    warmup_steps=500,# number of warmup steps for learning rate scheduler
    prediction_loss_only=True,
    fp16= True,
    )


trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    #eval_dataset=test_dataset,
)

In [10]:
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss
500,2.4158
1000,0.7418




TrainOutput(global_step=1200, training_loss=1.3558273824055989, metrics={'train_runtime': 1980.7285, 'train_samples_per_second': 2.348, 'train_steps_per_second': 0.606, 'total_flos': 8446673092608000.0, 'train_loss': 1.3558273824055989, 'epoch': 50.0})


Short detail about training -
- Step	Training Loss
- 500	2.263500
- 1000	0.205800
- 1500	0.029900

Took about 6 hours in Colab Free in TPU runtime

Stopped at Epoch 35 as the Training loss was pretty low

1655/2350 7:15:15 < 3:03:00, 0.06 it/s, Epoch 35.19/50]

In [13]:
trainer.save_model()

In [15]:
model.config.to_json_file("config.json")

In [None]:
!zip -r bloom-3b-small3-v1.zip bloom-3b-small3-v1/config.json  bloom-3b-small3-v1/training_args.bin  bloom-3b-small3-v1/pytorch_model.bin bloom-3b-small3-v1/generation_config.json


In [43]:
torch.save(model.state_dict(), 'bloom-3b-small3-v1.zip')

In [None]:
!cp bloom-3b-small3-v1.zip ./drive/MyDrive/models

# Test Model

In [38]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): BloomForCausalLM(
      (transformer): BloomModel(
        (word_embeddings): Embedding(250880, 2560)
        (word_embeddings_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (h): ModuleList(
          (0-29): 30 x BloomBlock(
            (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
            (self_attention): BloomAttention(
              (query_key_value): Linear8bitLt(
                in_features=2560, out_features=7680, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2560, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=7680, bias=False)
                )
                (lora_embedding_A): Parame

In [None]:
#!cp ./drive/MyDrive/models/bloom-3b-small3-v1.zip . #if you are taking the fine tuned model from drive

In [None]:
#!unzip bloom-3b-small3-v1.zip

Archive:  bloom-560-small3-v1.zip
  inflating: bloom-560-small3-v1/config.json  
  inflating: bloom-560-small3-v1/training_args.bin  
  inflating: bloom-560-small3-v1/pytorch_model.bin  
  inflating: bloom-560-small3-v1/generation_config.json  


In [46]:
from transformers import pipeline

#test = pipeline('text-generation',model='./bloom-3b-small3-v1/', tokenizer='bigscience/bloom-3b')
test = pipeline('text-generation',model=model, tokenizer=tokenizer)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBer

In [45]:
with torch.cuda.amp.autocast(cache_enabled=True):
  prompt = "what is bacteria"
  encoded_input = tokenizer(prompt,truncation=True,padding=True, return_tensors='pt')
  test_output_2 = model.generate(input_ids=encoded_input.input_ids,
                  max_new_tokens=100,
                  num_return_sequences=1,
                  early_stopping=True)
  test_answer_2 = tokenizer.decode(test_output_2[0], skip_special_tokens=True)
  print(f"Generated test_answer_1 : {test_answer_2}")




Generated test_answer_1 : what is bacteria called 抵抗 力). 抵抗力第一次被利用来治病，是由法国医生莫奈在1805年首先采用。他注射伤寒杆菌或鼠伤寒杆菌，使患者产生抵抗力，以致后来不必用隔离法，即可治愈伤寒病。这种注射疗法，至今仍在用。此外，烧灼疗法、喷雾疗法、冲洗疗法、敷贴疗法、塞入疗法、佩带口罩疗法、佩带手套疗法、佩带护目


from the passage
[ An alkaline medium favours bacterial growth; and moisture is a necessary condition; spores, however, can survive the want of water for much longer periods than fully developed bacteria. The necessity for oxygen varies in different species. Those that require oxygen are known as  aërobic bacilli  or  aërobes ; those that cannot live in the presence of oxygen are spoken of as  anaërobes . The great majority of bacteria, however, while they prefer to have oxygen, are able to live without it, and are called  facultative anaërobes

In [49]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out = test('Streptococci are met with in', max_new_tokens=120,num_return_sequences=1)
print(out)



[{'generated_text': 'Streptococci are met with in greater abundance than those of the pneumococcus, the staphylococcus being often the only species present. Streptococci are not so frequently met with in the blood as the pneumococcus, and it is not uncommon to find them in the blood even after the patient has been successfully treated with penicillin. Streptococci are not so easily killed by the strong chaleur, or by the antiseptics, as the pneumococcus or the staphylococcus.  Acid-fast Bacilli  are the only micro-organisms with which the path of disease is absolutely to be determined. They'}]


The inital part is exaclty as in the passage [Streptococci are met with in erysipelas and various other inflammatory and suppurative processes of a spreading character.  Bacilli  are rod-shaped bacteria, usually at least twice as long as they are broad (Fig. 4). Some multiply by fission, others by sporulation. Some forms are motile, others are non-motile.]

The last line

from passage [Tuberculosis, tetanus, anthrax, and many other surgical diseases are due to different forms of bacilli.  Spirilla  are long, slender, thread-like cells, more or less spiral or wavy.]

The last line generated is not correct

"Some forms are motile only in virtue of the contractility of the protoplasm, "

is there in another pace where "motile" is reffered

whereas the following is pure hallicunation

"some in virtue of the fibroblasts which they carry. Others are'}]"

In [50]:
with torch.cuda.amp.autocast(cache_enabled=True): # else RuntimeError: expected scalar type Half but found Float
  out =test('Streptococci', max_new_tokens=100,num_return_sequences=1)
print(out)



[{'generated_text': 'Streptococci, Streptococcus pyogenes, being the most important. It is only necessary to introduce a small amount of culture in the more recent stages of erysipelas, to the exclusion of all power of recovery; and it is not improbable that the patient may be rendered tolerant of the organisms that are obtained.  Hygienic considerations.  It is impossible to ensure aseptic handling of diseased tissue or of vital substances, and it is therefore important that the instruments should not only be'}]


In [None]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test('Metchnikoff', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': "Metchnikoff's disease, diabetes, syphilis, scurvy, or alcoholism, also impedes healing. Infection by disease-producing micro-organisms or  pathogenic bacteria  is, however, the most potent factor in disturbing the natural process of repair in wounds.\n\nSURGICAL BACTERIOLOGY The influence of micro-organisms in the causation of disease, and the rôle played by them in interfering with the natural process of repair, are so important that the science of applied"}]

In [52]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test('To this process Metchnikoff', max_new_tokens=100,num_return_sequences=1)
print(out)



[{'generated_text': 'To this process Metchnikoff ascribe suturing powers, which he believes to be universal, as well as that the operators should be free from all infection of the disease to which the wounds are subjected. For the treatment of sepsis the precautions required in suturing viva voce organisms should be gone to, are much the same as in suturing pure culture.  First-aid ضماده  or  bandage ضماده is absolutely necessary when the patient is in the acute stages of disease, and when the wounds'}]


In [53]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out = test('phagocytosis', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'phagocytosis, and in the production of  hydroxypenicillin  from the bacteria. b. By virtue of its being affected by the vital changes in the bacteria, the bacterium is no longer capable of producing its usual products. Thus, when a pyogenic or a toxenic bacillus is affected, such as producing less acid than usual, or when the environment of the cells is altered, such as becoming more moist, the vital properties of the bacteria diminish, and they become susceptible to'}]


In [54]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test('During the process of phagocytosis,', max_new_tokens=100,num_return_sequences=1)
print(out)



[{'generated_text': 'During the process of phagocytosis, the granulation tissue formed around the edges of the wound acts as a scaffolding to hold the edges together, and in a few days the scaffolding is replaced by granulation tissue, which is characterised by the presence of fibrin in the vessels of the scaffolding and by the presence of leucocytes in the vessels of the new circulation. In the large vessels of the scaffolding the fibrin is coagulated by the heat generated by the bacteria, which are thus killed, and the products of'}]


In [55]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test(' diplococci ', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': ' diplococci   streptococci   pyogènes   toxines   bacilli de la tuberculose  etc。  b l l i b l l i s s t e r s  s t e r i n s  i n t e r n e s  i n t e r n e r  i n t e r n e r i n s   Bulle  Bulle liquide  ou  liquide   ou  liquide   ou  sérum   ou'}]


In [57]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out = test('Cocci  or  micrococci', max_new_tokens=100,num_return_sequences=1)
print(out)



[{'generated_text': 'Cocci  or  micrococci.  The production of gas by the bacteria may cause them to appear in the surface as papillae, as  staphylococci, for example, when the bacteria produce gas by fermentation the acid produced oxidises the tissue around it and causes a darker area to appear: these areas are called  cocci, or  acid bacteria, and are usually associated with more general changes in the diseased tissue: see  acid reaction (diathesis) and  cocci reaction (diagnosis'}]


In [58]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test('Bacteria are most conveniently', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'Bacteria are most conveniently cultivated in media made up of glucose as the main ingredient. They require no supplement other than a moderately acid medium, and the atmosphere must be free from all animal life for a period of from 14 to 21 days. The most common bacteria cultivated are described in the next chapter.  Bacteria  of Malignant Character  are characterised by a high degree of metaplasia of the bacteria. The metaplasia is a change in the nature of the bacteria, the original'}]


In [59]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out = test('given the context "Thus we recognise (1) those that are globular  cocci ; (2) those that resemble a rod  bacilli ; (3) the spiral or wavy forms  spirilla .  Cocci  or  micrococci  are minute round bodies, averaging about 1 µ in diameter. The great majority are non-motile. They multiply by fission; and when they divide in such a way that the resulting cells remain in pairs, are called  diplococci , of which the bacteria of gonorrhœa and pneumonia are examples (Fig. 5). When they divide irregularly, and form grape-like bunches, they are known as  staphylococci , and to this variety the commonest pyogenic or pus-forming organisms belong' +
  'answer "What are Cocci  or  micrococci', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'given the context "Thus we recognise (1) those that are globular  cocci ; (2) those that resemble a rod  bacilli ; (3) the spiral or wavy forms  spirilla .  Cocci  or  micrococci  are minute round bodies, averaging about 1 µ in diameter. The great majority are non-motile. They multiply by fission; and when they divide in such a way that the resulting cells remain in pairs, are called  diplococci , of which the bacteria of gonorrhœa and pneumonia are examples (Fig. 5). When they divide irregularly, and form grape-like bunches, they are known as  staphylococci , and to this variety the commonest pyogenic or pus-forming organisms belonganswer "What are Cocci  or  micrococci?" and "Spiral or Wavy Forms?" below (p. 57).  The diplococci of the skin and of other surfaces are drawn into groups into which the name  cicatricial  is applied, as being used in a reparative process. In the eye the diplococci drawn from the retina are a source of the disease-trouble which they br