<a href="https://colab.research.google.com/github/alexcpn/tranformer_learn/blob/main/bloom_3b_quant_overfitting_train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers==4.28.1
!pip install accelerate
!pip install bitsandbytes
!pip install peft
!pip install pynvml

In [2]:
from pynvml import *
import torch

def print_gpu_utilization():
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(handle)
    print(f"GPU memory occupied: {info.used//1024**2} MB.")


def print_summary(result):
    print(f"Time: {result.metrics['train_runtime']:.2f}")
    print(f"Samples/second: {result.metrics['train_samples_per_second']:.2f}")
    print_gpu_utilization()

torch.ones((1, 1)).to("cuda")
print_gpu_utilization()


GPU memory occupied: 363 MB.


In [3]:
#upload files to your colab environment
!wget https://raw.githubusercontent.com/alexcpn/tranformer_learn/main/data/small_3.txt
#!wget https://gist.githubusercontent.com/alexcpn/54e88130f9d186494f1c3ce5e83263b4/raw/7cdf5f93b819024c58a891fc808fbdbe052d0eb1/small_3_mixed.txt
train_path = 'small_3.txt'

--2023-06-20 09:18:22--  https://raw.githubusercontent.com/alexcpn/tranformer_learn/main/data/small_3.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56513 (55K) [text/plain]
Saving to: ‘small_3.txt’


2023-06-20 09:18:22 (8.26 MB/s) - ‘small_3.txt’ saved [56513/56513]



In [4]:
from transformers import TextDataset,DataCollatorForLanguageModeling
from transformers import AutoTokenizer

def load_dataset(path,tokenizer):
    dataset = TextDataset(
          tokenizer=tokenizer,
          file_path=path,
          block_size=128)

    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, mlm=False,
    )
    return dataset,data_collator

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b")
train_dataset,data_collator = load_dataset(train_path,tokenizer)
print_gpu_utilization()

Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

GPU memory occupied: 363 MB.




In [5]:
# for quantised loading
from torch import float32, nn, exp

class CastOutputToFloat(nn.Sequential):
    def forward(self, x):
        return super().forward(x).to(float32)


def prepare_model(model):
    for param in model.parameters():
      param.requires_grad = False  # freeze the model - train adapters later
      if param.ndim == 1:
        # cast the small parameters (e.g. layernorm) to fp32 for stability
        param.data = param.data.to(float32)
    model.gradient_checkpointing_enable()  # reduce number of stored activations
    model.enable_input_require_grads()
    model.lm_head = CastOutputToFloat(model.lm_head)
    return model

In [6]:
from transformers import Trainer, TrainingArguments,AutoModelWithLMHead
from peft import LoraConfig, PeftModel, PeftConfig, get_peft_model
import bitsandbytes as bnb

lora_config = {
    "r": 16,# attention heads
    "lora_alpha": 32, # alpha scaling
    "lora_dropout": 0.05,
    'bias': "none",
    "task_type": "CAUSAL_LM", # set this for CLM or Seq2Seq

}


model = AutoModelWithLMHead.from_pretrained("bigscience/bloom-3b", device_map='auto',load_in_8bit=True)
model = prepare_model(model)
model = get_peft_model(model, LoraConfig(**lora_config))
#print(f"Model trainable parameters:\n {print_trainable_parameters(model)}")

print_gpu_utilization()



Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


Downloading (…)lve/main/config.json:   0%|          | 0.00/693 [00:00<?, ?B/s]



Downloading pytorch_model.bin:   0%|          | 0.00/6.01G [00:00<?, ?B/s]

GPU memory occupied: 4087 MB.


In [9]:

training_args = TrainingArguments(
    output_dir="./bloom-3b-small3-v1", #The output directory
    overwrite_output_dir=True, #overwrite the content of the output directory
    num_train_epochs=250, # number of training epochs
    per_device_train_batch_size=4, # batch size for training
    per_device_eval_batch_size=4,  # batch size for evaluation
    eval_steps = 400, # Number of update steps between two evaluations.
    save_steps=1000, # after # steps model is saved
    warmup_steps=500,# number of warmup steps for learning rate scheduler
    prediction_loss_only=True,
    fp16= True,
    )


trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    #eval_dataset=test_dataset,
)

In [10]:
trainer.train()

Step,Training Loss
500,2.4118
1000,0.7262
1500,0.1568
2000,0.0668
2500,0.0297
3000,0.0125
3500,0.0087




In [11]:
trainer.save_model()

In [12]:
model.config.to_json_file("./bloom-3b-small3-v1/config.json")

In [13]:
!zip -r bloom-3b-small3-v1.zip bloom-3b-small3-v1/config.json  bloom-3b-small3-v1/training_args.bin  bloom-3b-small3-v1/pytorch_model.bin bloom-3b-small3-v1/generation_config.json


  adding: bloom-3b-small3-v1/config.json (deflated 50%)
  adding: bloom-3b-small3-v1/training_args.bin (deflated 48%)
  adding: bloom-3b-small3-v1/pytorch_model.bin (deflated 11%)


In [14]:
!cp bloom-3b-small3-v1.zip ./drive/MyDrive/models

In [None]:
torch.save(model.state_dict(), 'bloom-3b-small3-v1.zip')

# Test Model

In [15]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): BloomForCausalLM(
      (transformer): BloomModel(
        (word_embeddings): Embedding(250880, 2560)
        (word_embeddings_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (h): ModuleList(
          (0-29): 30 x BloomBlock(
            (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
            (self_attention): BloomAttention(
              (query_key_value): Linear8bitLt(
                in_features=2560, out_features=7680, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2560, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=7680, bias=False)
                )
                (lora_embedding_A): Parame

In [16]:
#!cp ./drive/MyDrive/models/bloom-3b-small3-v1.zip . #if you are taking the fine tuned model from drive

In [17]:
#!unzip bloom-3b-small3-v1.zip

In [18]:
from transformers import pipeline

#test = pipeline('text-generation',model='./bloom-3b-small3-v1/', tokenizer='bigscience/bloom-3b')
test = pipeline('text-generation',model=model, tokenizer=tokenizer)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBer

In [19]:
with torch.cuda.amp.autocast(cache_enabled=True):
  prompt = "what is bacteria"
  encoded_input = tokenizer(prompt,truncation=True,padding=True, return_tensors='pt')
  test_output_2 = model.generate(input_ids=encoded_input.input_ids,
                  max_new_tokens=100,
                  num_return_sequences=1,
                  early_stopping=True)
  test_answer_2 = tokenizer.decode(test_output_2[0], skip_special_tokens=True)
  print(f"Generated test_answer_1 : {test_answer_2}")


Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Generated test_answer_1 : what is bacteria) is said to be metaplastic tissue. In some cases, such as the course of amputation in the hand, the tissue does not proliferate but simply converts into general connective tissue; in others, such as the tissue around the blood vessels, the body organs may be so influenced that they are transformed into blood-vessel tissue. After the tissues have been subjected to the influence of bacteria, they are capable of being converted into pathogenic tissue if the life of the bacteria is not destroyed


In [20]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out = test('Streptococci are met with in', max_new_tokens=120,num_return_sequences=1)
print(out)

[{'generated_text': "Streptococci are met with in great abundance, of which the most common and most suitable for isolation are the streptococci, are also met with in low numbers. In certain diseases, such as chronic fevers and various forms of acute inflammation, the individual's body defends himself against his disease-ayds by increasing his number of bacteria, especially of streptococci, so that they predominate over those organisms, such as bacteria of the pneumonic, peritoneal, or blood-stream, that are met with. This method of defense may continue for many days, at which time the number of streptococci"}]


In [21]:
with torch.cuda.amp.autocast(cache_enabled=True): # else RuntimeError: expected scalar type Half but found Float
  out =test('Streptococci', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'Streptococci, Glycoproteins, Bacteriophages, Glycophorins, Chlamydiae, etc. Any of these organisms may be the source of the disease, and their presence may be detected by the appearance of the symptoms after the patient is brought into contact with them, or by the observation of the symptoms when they are identified. It is also possible to detect the presence of other organisms after the usual tests have been performed.  Detection by Antibodies.  The antibodies produced by'}]


In [22]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test('Metchnikoff', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': "Metchnikoff's method is the most popular ; it consists in injecting an oily fluid such as oil of vit. es and hydrocolloid fluid such as human blood serum into the tissues of a wound, and returninging them to health by bringing them to an accurate knowledge. The fluid is removed at the time of injection. The method is employed with success on certain pyogenic infections, such as infection by staphylococci, for example  oil of vit. es  or  human blood serum. "}]


In [23]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test('To this process Metchnikoff', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'To this process Metchnikoff has given the name  scientific death. The term  natural death  is applied to the state natural death appears to be only a modified form of meta- or back- death.  The Process of Pneumonia and Other Bacterial Infected Death.  Death due to Overt Infection with the Pneumococcus and by the Blood-Self.  Degradation in the Human Body of the Pneumospasms produced by the Pyogenic Fungi.  By the Metabolic Process.  In the'}]


In [24]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out = test('phagocytosis', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'phagocytosis, by which the bacteria are broken up and can be drawn off in the filtrate. Other methods involve the exposure of the bacteria to strong chemical agents, such as phagocytosis with human neutrophils  see above, or to thermal action, the temperature at which it is required to kill a certain number of bacteria being determined. In the phagocytosis with the neutrophils, the heat needed is between 55° and 60°. Other methods include the action of chemical'}]


In [25]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test('During the process of phagocytosis,', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'During the process of phagocytosis, the membrane of the polymorphonuclear leucocyte increases in thickness until it becomes granular, brown, and even cobweb-like. In the later stages of the process, the membrane becomes incrusted with a thin, white, fluid layer, the  fluid layer. The  fluid layer  appears in the course of the phagocytosis after the first moments of clotting, and continues to flow in after the red blood cells have been lysed. The  fluid layer  is removed by'}]


In [26]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test(' diplococci ', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': ' diplococci  in  in  in  in  virescents  lesions.  Bacteria.  Archesites  are formed by the union of septic tissue with a hard surface. When a black, smooth, metachromatic, or blueish tissue is formed by the elements of blood culture on a hard surface, the surface is found to have been occupied by bacteria. The elements are found to be multiplying, or, more strictly speaking, undergoing certain physiological changes, such as are sufficient to'}]


In [27]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out = test('Cocci  or  micrococci', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'Cocci  or  micrococci. The most important parts of the immune system are shown in the next page; they are also known as  vaccine tissues  and  vaccine organs. They are developed in the ovary and the kidney; when these are transplanted into the body and used as grafts, are at first loaded with bacteria in the way described above, and are therefore infected. When the organism is taken, the process is complete; the patient is treated; and when the grafts have been established, they are'}]


In [28]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out =test('Bacteria are most conveniently', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'Bacteria are most conveniently classified as  pure cultures, when they are separated from the specific medium in which they are kept apart, and as  culture mixtures or broths, when they are distributed throughout the medium. The presence of bacteria in so far diffused a condition as to render them almost uniform in shape, colour, and motility, is an indication of the primacy of a bacterium. Any one of the specific bacteria, however pure, can be used as a  culture material. The most useful are those'}]


In [29]:
with torch.cuda.amp.autocast(cache_enabled=True):
  out = test('given the context "Thus we recognise (1) those that are globular  cocci ; (2) those that resemble a rod  bacilli ; (3) the spiral or wavy forms  spirilla .  Cocci  or  micrococci  are minute round bodies, averaging about 1 µ in diameter. The great majority are non-motile. They multiply by fission; and when they divide in such a way that the resulting cells remain in pairs, are called  diplococci , of which the bacteria of gonorrhœa and pneumonia are examples (Fig. 5). When they divide irregularly, and form grape-like bunches, they are known as  staphylococci , and to this variety the commonest pyogenic or pus-forming organisms belong' +
  'answer "What are Cocci  or  micrococci', max_new_tokens=100,num_return_sequences=1)
print(out)

[{'generated_text': 'given the context "Thus we recognise (1) those that are globular  cocci ; (2) those that resemble a rod  bacilli ; (3) the spiral or wavy forms  spirilla .  Cocci  or  micrococci  are minute round bodies, averaging about 1 µ in diameter. The great majority are non-motile. They multiply by fission; and when they divide in such a way that the resulting cells remain in pairs, are called  diplococci , of which the bacteria of gonorrhœa and pneumonia are examples (Fig. 5). When they divide irregularly, and form grape-like bunches, they are known as  staphylococci , and to this variety the commonest pyogenic or pus-forming organisms belonganswer "What are Cocci  or  micrococci?"  bacilli  or  staph  bacteria are round or elliptical bodies, of a uniform dark brown colour. They are motile. When they divide by fission, the resulting cells remain in pairs and are called  staphlava  or  staphlea. When they remain grouped: in lines or rows, they are known as  lines of growth  