## Checking Colab Configuration

In [1]:
!nvidia-smi

Sat Sep  2 12:14:52 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   60C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-90f70a34-61d0-4eba-bc3b-a2a4cea8263f)


## 1. Downloading Necessary Libraries

In [3]:
!pip install -Uqqq pip
!pip install -qqq bitsandbytes==0.39.0 # For QLORA
!pip install -qqq torch==2.0.1
!pip install -qqq -U git+https://github.com/huggingface/transformers.git@e03a9cc
!pip install -qqq -U git+https://github.com/huggingface/peft.git@42a184f # Parameter efficient Fine Tuning
!pip install -qqq -U git+https://github.com/huggingface/accelerate.git@c9fbb71
!pip install -qqq datasets==2.12.0
!pip install -qqq loralib==0.1.1 # Low Rank Adapter
!pip install -qqq einops==0.6.1

[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0m

## 2. Importing packages

In [4]:
import json
import os
from pprint import pprint
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset
from huggingface_hub import notebook_login

# Peft
from peft import (
    LoraConfig,
    PeftConfig,
    PeftModel,
    get_peft_model,
    prepare_model_for_kbit_training
)

# Transformers
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig
)

os.environ["CUDA_VISIBLE_DEVICES"] = "0"


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


## 3. HuggingFace Notebook CLI Login

In [5]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## 4. Load Falcon-7b Model and Tokenizer

### *In this section we will load the [Falcon 7B model](https://huggingface.co/vilsonrodrigues/falcon-7b-sharded), quantize it in 4bit and attach LoRA adapters on it. Let's get started!*

In [6]:
MODEL_NAME = "vilsonrodrigues/falcon-7b-instruct-sharded"

# Using Q Lora, 4bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/15 [00:00<?, ?it/s]

Some weights of FalconForCausalLM were not initialized from the model checkpoint at vilsonrodrigues/falcon-7b-instruct-sharded and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


#### *We are going to train quantized (4bit) version of Falcon, in the end we are going to have adaptors on top of the base Falcon Model which is trained on our dataset*

## 5. Test the pretrained Model on our use-case

In [7]:
prompt = """
<human>: midjourney prompt for a girl sit on the mountain
<assistant>:
""".strip()

In [43]:
# Config for falcon
generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [None]:
%%time
device = "cuda:0"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
  outputs = model.generate(
      input_ids = encoding.input_ids,
      attention_mask = encoding.attention_mask,
      generation_config = generation_config
  )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

#### We can see the results are not what we wanted

## 6. Prepare our own dataset we created

In [11]:
data = load_dataset("csv", data_files="/content/midjourney training dataset - midjourney_prompt_dataset.csv")



  0%|          | 0/1 [00:00<?, ?it/s]

In [12]:
data

DatasetDict({
    train: Dataset({
        features: ['User', 'Prompt'],
        num_rows: 288
    })
})

#### *Some Example output and their user prompts in Midjourney style*

In [13]:
data['train'][5]

{'User': '"midjourney prompt for a Melbourne tram in Avatar movie style with blue and green color palette"',
 'Prompt': '"[Melbourne\'s Flinders Street] by [Avatar Movie] art style::20 bubbles::10 aquarium scene::10 lights::10 One Melbourne Tram::15 blue and green color palette::10 ultra-detail, wide-angle, sharp look, high detail, epic lighting, vivid light refractions, photorealistic, ultra-realistic, photo-realistic, fit in the screen, rule of thirds::8 —ar 16:9"'}

In [14]:
data['train'][10]

{'User': '"midjourney prompt for a playful kitten in a garden"',
 'Prompt': '"A cute little kitten by Joy Ang"'}

In [15]:
data['train'][15]

{'User': '"midjourney prompt for a Buddhist mandala with mushroom elements"',
 'Prompt': '"Buddhist mandala in the style of a mushroom spore print. Rich colors, deeply symbolic, arresting beauty, the key to the future of life on Earth. Highly intricate and very detailed 12K, in the style of Katsuhiro Otomo after they have rested --ar 12:16 —test"'}

In [16]:
# Cleaner way to store
def generate_prompt(data_point):
  return f"""
<human>: {data_point["User"]}
<assistant>: {data_point["Prompt"]}
""".strip()

# Tokenize prompt
def generate_and_tokenize_prompt(data_point):
  full_prompt = generate_prompt(data_point)
  tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
  return tokenized_full_prompt

In [17]:
data = data["train"].shuffle().map(generate_and_tokenize_prompt)

Map:   0%|          | 0/288 [00:00<?, ? examples/s]

In [18]:
data

Dataset({
    features: ['User', 'Prompt', 'input_ids', 'attention_mask'],
    num_rows: 288
})

## 7. Finetuning using PEFT

In [19]:
def print_trainable_parameters(model):
  """
  Prints the number of trainable parameters in the model.
  """
  trainable_params = 0
  all_param = 0
  for _, param in model.named_parameters():
    all_param += param.numel()
    if param.requires_grad:
      trainable_params += param.numel()
  print(
      f"trainable params: {trainable_params} || all params: {all_param} || trainables%: {100 * trainable_params / all_param}"
  )

In [20]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

#### *Peft Config*

Below we will load the configuration file in order to create the LoRA model. According to QLoRA paper, it is important to consider all linear layers in the transformer block for maximum performance. Therefore we will add `dense`, `dense_h_to_4_h` and `dense_4h_to_h` layers in the target modules in addition to the mixed query key value layer.

In [21]:
peft_config = LoraConfig(
    lora_alpha=32,
    lora_dropout=0.05,
    r=16,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

In [22]:
model = get_peft_model(model, peft_config)
print_trainable_parameters(model)

trainable params: 32636928 || all params: 3641381760 || trainables%: 0.8962786697761677


### *Start Training*

In [23]:
training_args = transformers.TrainingArguments(
      per_device_train_batch_size=1,
      gradient_accumulation_steps=4,
      num_train_epochs=1,
      learning_rate=2e-4,
      fp16=True,
      save_total_limit=3,
      logging_steps=1,
      output_dir="experiments",
      optim="paged_adamw_8bit",
      lr_scheduler_type="cosine",
      warmup_ratio=0.05,
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=data,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

In [24]:
model.config.use_cache = False
trainer.train()

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,4.4434
2,4.6102
3,4.3685
4,4.3192
5,4.8697
6,4.125
7,3.2451
8,3.4113
9,3.2814
10,3.3593


TrainOutput(global_step=72, training_loss=2.2380039042068853, metrics={'train_runtime': 438.8809, 'train_samples_per_second': 0.656, 'train_steps_per_second': 0.164, 'total_flos': 428972775954432.0, 'train_loss': 2.2380039042068853, 'epoch': 1.0})

## 8. Saving our FineTuned Model

In [25]:
# Saving in Local
model.save_pretrained("trained-model")

In [28]:
# Saving to Hub
PEFT_MODEL = "ArjunPrSarkhel/Falcon-7b-MidjourneyPrompts"

model.push_to_hub(
    PEFT_MODEL, use_auth_token=True, create_pr=1
)

CommitInfo(commit_url='https://huggingface.co/ArjunPrSarkhel/Falcon-7b-MidjourneyPrompts/commit/d79198add7f042c599e0b23d3c3e9fa433d0922b', commit_message='Upload model', commit_description='', oid='d79198add7f042c599e0b23d3c3e9fa433d0922b', pr_url='https://huggingface.co/ArjunPrSarkhel/Falcon-7b-MidjourneyPrompts/discussions/1', pr_revision='refs/pr/1', pr_num=1)

## 9. Inferencing our FineTuned Model

In [38]:
config = PeftConfig.from_pretrained(PEFT_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

model = PeftModel.from_pretrained(model, PEFT_MODEL)

Downloading (…)/adapter_config.json:   0%|          | 0.00/491 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/15 [00:00<?, ?it/s]

Some weights of FalconForCausalLM were not initialized from the model checkpoint at vilsonrodrigues/falcon-7b-instruct-sharded and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading adapter_model.bin:   0%|          | 0.00/131M [00:00<?, ?B/s]

In [39]:
generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [40]:
%%time
device = "cuda:0"

prompt = """
<human>: midjourney prompt for a boy running in the snow
<assistant>:
""".strip()

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
  outputs = model.generate(
      input_ids = encoding.input_ids,
      attention_mask = encoding.attention_mask,
      generation_config = generation_config
  )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

<human>: midjourney prompt for a boy running in the snow
<assistant>: A boy running in the snow, wearing a red jacket, black pants, white socks, black shoes, --ar 16:9 --no dof --w 6000 --h 3000 --uplight --w 6000 --h 3000 --uplight --w 6000 --h 3000 --uplight --w 6000 --h 3000 --uplight --w 6000 --h 3000 --uplight --w 6000 --h 3000 --uplight --w 6000 --h 3000 --uplight --w 6000 --h 3000 --uplight --w 6000 --h 3000 --uplight --w 6000 --h 3000 --
CPU times: user 1min 17s, sys: 119 ms, total: 1min 17s
Wall time: 1min 18s


### We can see already the results are much much better with being finetuned on ~250 training data, with more samples we can definitely improve the results