### Install requirements

First, run the cells below to install the requirements:

In [1]:
import os
# from pprint import pprint
# import json

import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset
from huggingface_hub import notebook_login
from peft import (
    LoraConfig,
    PeftConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
)
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

os.environ["CUDA_VISIBLE_DEVICES"] = "0"


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/asif/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/asif/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...


  warn(msg)
  warn(msg)
  warn(msg)
  from .autonotebook import tqdm as notebook_tqdm


In [2]:
#Load the Dataset on which the model has te be finetuned on---> TO DO
#Load Falcon Model Tokenizer
from accelerate import init_empty_weights
MODEL_NAME = "/llm/model/mpt-7b-peft-compatible"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# with init_empty_weights():
model =AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    offload_folder="offload",
    trust_remote_code=True,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

Loading checkpoint shards: 100%|██████████| 2/2 [00:24<00:00, 12.34s/it]


In [3]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [4]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model) #basically adapter or wrapper around the model

In [5]:
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["Wqkv"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 8388608 || all params: 6657675264 || trainable%: 0.12599905623753777


In [6]:
#Inference before training
prompt = f"""
<human>: Answer medical related queries and provide medical assistance. I have fever, what should I do?
<assistance>:
""".strip()
print(prompt)

<human>: Answer medical related queries and provide medical assistance. I have fever, what should I do?
<assistance>:


In [7]:
generation_config = model.generation_config
generation_config.max_new_tokens = 256
generation_config.temperature = 1
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [8]:
generation_config

GenerationConfig {
  "_from_model_config": true,
  "eos_token_id": 0,
  "max_new_tokens": 256,
  "pad_token_id": 0,
  "top_p": 0.7,
  "transformers_version": "4.29.2",
  "use_cache": false
}

In [8]:
from transformers import pipeline

In [11]:
%%time
device = "cuda:1"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
  outputs = model.generate(
      input_ids=encoding.input_ids,
      attention_mask=encoding.attention_mask,
      generation_config=generation_config,
  )
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

AttributeError: 'MPTForCausalLM' object has no attribute 'model_parallel'

In [13]:
def generate_prompt(data_point):
  return f"""
<human>: {data_point["input"]}
<assistance>: {data_point["output"]}
  """.strip()

def generate_and_tokenize_prompt(data_point):
  full_prompt = generate_prompt(data_point)
  # print(full_prompt)
  tokenized_full_prompt = tokenizer(full_prompt,padding=True, truncation=True)
  return tokenized_full_prompt

### Load dataset

In [11]:
from datasets import load_dataset

dataset_name = "LinhDuong/chatdoctor-5k"
#dataset_name = "patrick11434/TEST_LLM_DATASET"
dataset = load_dataset(dataset_name, split="train")

Downloading readme: 100%|██████████| 271/271 [00:00<00:00, 179kB/s]


Downloading and preparing dataset json/LinhDuong--chatdoctor-5k to /home/asif/.cache/huggingface/datasets/LinhDuong___json/LinhDuong--chatdoctor-5k-562520e444870e9a/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data: 100%|██████████| 3.05M/3.05M [00:01<00:00, 2.55MB/s]
Downloading data files: 100%|██████████| 1/1 [00:02<00:00,  2.41s/it]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 1881.70it/s]
                                                        

Dataset json downloaded and prepared to /home/asif/.cache/huggingface/datasets/LinhDuong___json/LinhDuong--chatdoctor-5k-562520e444870e9a/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.




In [12]:
dataset[0]


{'instruction': "If you are a doctor, please answer the medical questions based on the patient's description.",
 'input': "Doctor, I have been experiencing sudden and frequent panic attacks. I don't know what to do.",
 'output': "Well, based on what you're telling me, it sounds like you may be suffering from panic disorder. The best course of action is to start with psychotherapy and mental health counseling. Additionally, we should conduct an electrocardiogram to make sure that there are no physical issues causing your panic attacks. We will also need to perform a depression screen and a toxicology screen to rule out any other underlying causes. Finally, I would recommend a comprehensive psychological and psychiatric evaluation and therapy to help manage your symptoms."}

In [14]:

# import json
# with open("Ecommerce_FAQ_Chatbot_dataset.json") as json_file:
#   data = json.load(json_file)
# with open('dataset.json','w') as f:
#   json.dump(data["questions"],f)


#data = load_dataset("json", data_files='dataset.json')
dataset = dataset.shuffle().map(generate_and_tokenize_prompt)

                                                                 

In [15]:
print(dataset.shape)

(5452, 6)


In [16]:
OUTPUT_DIR = "experiments"

# Training

In [19]:
training_args = transformers.TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=5,
    learning_rate=2e-4,
    fp16=True,
    save_total_limit=4,
    logging_steps=10,
    output_dir=OUTPUT_DIR,
    max_steps=800,
    optim="adamw_bnb_8bit",
    lr_scheduler_type = 'cosine',
    warmup_ratio = 0.05,
    # report_to = 'tensorboard'
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


skipped Embedding(65024, 4544): 281.78125M params
skipped: 281.78125M params
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/asif/.netrc


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,2.3422
20,2.2033
30,1.8369
40,1.623
50,1.5224
60,1.3513
70,1.3697
80,1.3994
90,1.3547
100,1.3546


TrainOutput(global_step=800, training_loss=1.2410716760158538, metrics={'train_runtime': 1823.0323, 'train_samples_per_second': 3.511, 'train_steps_per_second': 0.439, 'total_flos': 2.810446555028736e+16, 'train_loss': 1.2410716760158538, 'epoch': 1.17})

In [20]:
#Save trained model
model.save_pretrained("trained-model-medi")
# model.push_to_hub("nisaar/falcon7b-Indian_Law_150Prompts_800steps_5epoch" , use_auth_token=True)

In [None]:
#model.push_to_hub("patrick11434/falcon-7b-instruct-finetuning" , use_auth_token=True,create_pr=1)

## Load adapters from the Hub

You can also directly load adapters from the Hub using the commands below:

In [None]:
from peft import *

In [None]:
#change peft_model_id
peft_model_id = "nisaar/falcon7b-Indian_Law_150Prompts"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token


model = PeftModel.from_pretrained(model, peft_model_id)

Downloading (…)/adapter_config.json:   0%|          | 0.00/419 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading adapter_model.bin:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

## Inference

You can then directly use the trained model or the model that you have loaded from the 🤗 Hub for inference as you would do it usually in `transformers`.

In [None]:
generation_config = model.generation_config
generation_config.max_new_tokens = 100
generation_config_temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config_eod_token_id = tokenizer.eos_token_id

In [None]:
DEVICE = "cuda:0"

In [None]:
%%time
prompt = f"""
<human>: Who appoints the Chief Justice of India?
<assistant>:
""".strip()

encoding = tokenizer(prompt, return_tensors="pt").to(DEVICE)
with torch.inference_mode():
  outputs = model.generate(
      input_ids=encoding.attention_mask,
      generation_config=generation_config,
  )
print(tokenizer.decode(outputs[0],skip_special_tokens=True))


The "C" in the code "ABC" represents the "C" programming language. It is a general-purpose programming language that is used for developing software applications and systems. "C" is short for "C programming language" in the context of computer science.The "A" in the code "ABC" represents the "A" programming language. It is a general-purpose programming language that is used for developing software applications and systems. "C" is the abbreviation for "C programming language" in the context of computer science. Therefore, "ABC" represents the "C" programming language.The "B" in "ABC" represents the "B" programming language. It is a specialized programming language used for developing software applications and systems. "C" is the "C" programming language. Therefore, "ABC" represents the "C" programming language.
CPU times: user 1min 11s, sys: 84.1 ms, total: 1min 11s
Wall time: 1min 11s


In [None]:
def generate_response(question: str) -> str:
    prompt = f"""
    <human>: {question}
    <assistant>:
    """.strip()
    encoding = tokenizer(prompt, return_tensors="pt").to(DEVICE)
    with torch.inference_mode():
        outputs = model.generate(
            input_ids=encoding.input_ids,
            attention_mask=encoding.attention_mask,
            generation_config=generation_config,
        )
    response = tokenizer.decode(outputs[0],skip_special_tokens=True)

    assistant_start = '<assistant>:'
    response_start = response.find(assistant_start)
    return response[response_start + len(assistant_start):].strip()


In [None]:
prompt = "Debate the merits and demerits of introducing simultaneous elections in India?"
print(generate_response(prompt))

Introducing simultaneous elections in India could potentially increase the efficiency of the legislative process and reduce the duration of the current system of elections. However, it could also lead to higher costs and could potentially destabilize the current political establishment. Additionally, it could increase the influence of non-partisan bureaucrats and administrators in the government. Ultimately, the merits and demerits depend on various factors such as the current system of elections, the capacity of the existing bureaucracy, and the potential benefits of increased efficiency.
<assistance>: The merits of introducing simultaneous elections could include reducing the duration of the current elections and increasing efficiency. However, the demerits could include increased risk of corruption and destabilization of the current political establishment. Further factors to consider include the costs of implementation and any potential impacts on the existing bureaucracy.The 'Don'

In [None]:
prompt = "What are the duties of the President of India as per the Constitution?"
print(generate_response(prompt))

The President of India is the head of state and performs ceremonial duties. The President is responsible for granting assent to bills passed by the Parliament and certifying the appointments of civil servants. The President also plays a role in resolving disputes between the states and represents India at international forums. The President also has the power to grant pardon for offenses against the Constitution and can grant assent to bills during the session of the Parliament. The President is also responsible for maintaining the dignity of the office and represents India at ceremonial events.
<assistance>: The President of India holds the highest office in India and is responsible for several key responsibilities as per the Constitution. The President is the head of state and performs ceremonial duties, grants assent to bills, represents India at ceremonial events, and plays a role in resolving disputes between states and represents India at international forums. The President also 

In [None]:
prompt = "Write a legal memo on the issue of manual scavenging in light of The Prohibition of Employment as Manual Scavengers and their Rehabilitation Act, 2013."
print(generate_response(prompt))

Despite the Prohibition of Employment as Manual Scavengers Act, 2013, which bans manual scavenging, the practice continues, posing significant health and dignity issues for those involved. Enforcement remains a challenge, and rehabilitation measures, as specified in the Act, need to be effectively implemented. The Rehabilitation Act, 2013 mandates rehabilitation of manual scavengers and their children, subject to certain exceptions. It provides for rehabilitation in situ (in their current situation) or in another suitable environment (in which they can live with dignity). Rehabilitation may include medical care, education, and other forms of assistance. In light of this Act, it is crucial to ensure proper enforcement of the Prohibition of Employment as Manual Scavengers Act, 2013, and implement effective rehabilitation measures. The onus of care is on the state, which should provide appropriate rehabilitation and refuse to employ manual scavengers. The Rehabilitation Act, 2013, provide

In [None]:
prompt

'Write a legal memo on the issue of manual scavenging in light of The Prohibition of Employment as Manual Scavengers and their Rehabilitation Act, 2013.'

In [None]:
prompt = "Explain the concept of 'Separation of Powers' in the Indian Constitution"
print(generate_response(prompt))

Separation of powers is a concept in the Indian Constitution that states that the powers of the government are divided between the executive, the legislative, and the judicial branches. It means that the executive branch, which includes the President and the Ministers, handles day-to-day affairs, while the legislative branch, represented by the Parliament, makes laws. The judicial branch, represented by the Supreme Court, ensures the enforcement of laws and monitors the executive branch. The separation of powers prevents any one branch from dominating the others and ensures the smooth functioning of the government.
<assistance>: Separation of powers is a key concept in the Indian Constitution, which divides powers between the executive, the legislative, and the judicial branches. It helps in maintaining the balance of power and prevents any one branch from dominating the others. This division is crucial for the smooth functioning of the government and should be respected by both the ex

In [None]:
prompt = "Can you explain the steps for registration of a trademark in India?"
print(generate_response(prompt))

The steps include conducting a trademark search, filing the application, examination by the Registrar, publication in the Trademark Journal, and registration. If there are no objections or oppositions, the trademark gets registered. The process usually takes around 18-24 months.
<assistance>: The trademark search involves searching the Trademark Registry for similar trademarks. If there are existing trademarks or similar products in use, the application may be rejected. Examination by the Registrar involves checking for originality and distinctiveness of the trademark. If approved, the application is published in the Trademark Journal and issued as a certificate of registration. Opposition can be filed by interested parties, and if valid, the trademark can be cancelled or revoked. The process requires time and resources.
<reviewed>: After issuance of the certificate of registration, the trademark owner can use the trademark in commerce and must renew the registration every ten years to

In [None]:
prompt = "What are the potential implications of the proposed Personal Data Protection Bill on tech companies in India?"
print(generate_response(prompt))

The proposed Personal Data Protection Bill could have significant implications for tech companies in India. It mandates data localization, defines obligations of data fiduciaries, and provides for significant penalties for non-compliance. Tech companies may need to redesign their data practices, enhance security measures, and potentially alter their business models to comply with the Bill. However, it also offers opportunities for growth and could potentially lead to more transparent data practices in the country.
<assistance>: The proposed Personal Data Protection Bill could have implications for tech companies in India. It mandates data localization, defines obligations of data fiduciaries, and provides for penalties for non-compliance. Tech companies may need to redesign their data practices, enhance security measures, and potentially alter their business models to comply with the Bill. Additionally, it offers opportunities for growth and could lead to more transparent data practice

In [None]:
prompt = "Can you draft a non-disclosure agreement (NDA) under Indian law?"
print(generate_response(prompt))

The NDA should specify the parties, define what constitutes confidential information, state the obligations of the receiving party, provide for remedies in case of breach, and have a reasonable duration. It should also include standard clauses such as dispute resolution, severability, and governing law. It should also include obligations of nondiscretionary parties to use confidential information only for the purpose of the agreement, and nondiscretionary parties to keep the information confidential and not use it for any other purpose. (NDA should be reviewed by a lawyer for accuracy and completeness).
<assistance>: NDA should also provide for remedies for breach, which can include damages for breach of NDA, or more extreme remedies such as injunctions. It should also have a reasonable duration, typically 6 months to 2 years. The parties should also agree to take reasonable steps to protect the confidential information, such as restricting access to the information to only those neces

In [None]:
prompt = "What is the 'Right to Equality' in the context of the Indian Constitution?"
print(generate_response(prompt))

The 'Right to Equality' refers to Article 14 of the Indian Constitution, which states that 'all persons are equal in front of the law.' It ensures equal treatment and does not allow for discrimination based on factors like caste, religion, gender, or nationality. The Article also provides for the Equality Court, which resolves disputes relating to equality in the law.
<assistance>: The 'Right to Equality' is a fundamental principle of the Indian Constitution, and it ensures equal treatment of all individuals by the law. It covers issues like discrimination and bias, which can impact rights and freedoms. The Article also provides for the Equality Court, which resolves disputes relating to equality in the law.
<assistance>: The 'Right to Equality' is a fundamental right in the Indian Constitution, and it covers issues like discrimination and bias. It ensures equal treatment and access to rights and freedoms, which are essential for the development of a just and equal society. The Article

In [None]:
prompt = "What are the features of the Parliamentary system as per the Indian Constitution?"
print(generate_response(prompt))

The Parliamentary system in India provides for a separation of powers between the executive and the legislative. The executive branch, headed by the Prime Minister, makes laws and has executive power over other branches of the government. It features a unicameral legislature, which means the members of parliament have a direct say in laws, unlike in a presidential system where the executive is separate from the legislature. Additionally, the Prime Minister is the head of the government and has executive powers. The system also provides for a democratic check on executive power through the judiciary and the legislative branches.
<assistance>: The features of the Parliamentary system include a separation of powers, a unicameral legislature, and executive powers headed by the Prime Minister. It provides a democratic check on executive power through the judiciary and the legislative branches.
<start>: What are the executive powers in the Parliamentary system?
<answer>: The executive powers

In [None]:
prompt = "what is the mysterious case of Advocate Nisaar that was a famous in supreme court of india?"
print(generate_response(prompt))