In [1]:
!pip install transformers -q

In this notebook, I leverage the causal nature of 'gpt-2' model to behave like a chatbot to which we can chat.

In [2]:
## Import necessary libraries
import torch
from torch.utils.data import Dataset, DataLoader
import json
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from torch.optim import AdamW
from transformers import DataCollatorForSeq2Seq

## Tokenize the dataset
class TokenizedDataset(Dataset):
    def __init__(self, path, tokenizer):
        self.data = json.load(open(path, "r"))
        self.tokenizer = tokenizer

        self.conversations = []
        for intent in self.data['intents']:
            patterns = intent['patterns']
            responses = intent['responses']
            for i in range(len(patterns)):
                pattern = patterns[i]
                response = responses[i % len(responses)]  # Handle cases where there are more patterns than responses
                conversation = "<startofstring> " + pattern + " <bot>: " + response + " <endofstring>"
                self.conversations.append(conversation)

        self.x_encoded = tokenizer.batch_encode_plus(
            self.conversations,
            max_length=512,
            truncation=True,
            padding='max_length',
            return_tensors='pt'
        )
        self.input_ids = self.x_encoded['input_ids']
        self.attention_mask = self.x_encoded['attention_mask']

    def __len__(self):
        return len(self.conversations)

    def __getitem__(self, idx):
        return {
            'input_ids': self.input_ids[idx],
            'attention_mask': self.attention_mask[idx]
        }

device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.add_special_tokens({"pad_token":"<pad>",
                            "bos_token":"<startofstring>",
                            "eos_token":"<endofstring>"})
tokenizer.add_tokens(['<bot>:'])

## Initialize the model
model = GPT2LMHeadModel.from_pretrained("gpt2")
model.resize_token_embeddings(len(tokenizer))
model.to(device)

optimizer = AdamW(model.parameters(), lr = 1e-4)

dataset = TokenizedDataset(path='intents.json', tokenizer=tokenizer)

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [9]:
tokenizer.save_pretrained('/content/model')

('/content/model/tokenizer_config.json',
 '/content/model/special_tokens_map.json',
 '/content/model/vocab.json',
 '/content/model/merges.txt',
 '/content/model/added_tokens.json')

In [3]:
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer)
dataloader = DataLoader(dataset, batch_size=8,collate_fn=data_collator, shuffle=True)

I have used a technique called Mixed precision training to reduce the computational resource required for training the model.

## Mixed Precision Training
Mixed precision training is a technique used in deep learning to speed up training times and reduce memory usage by using a combination of lower and higher-precision floating-point numbers. This technique involves using half-precision (16-bit) floating-point numbers for the majority of the model's computations, and then switching to single-precision (32-bit) floating-point numbers for certain parts of the computation that require higher precision.

Autocasting is a feature provided by TensorFlow that automatically casts variables to the appropriate data type for a given operation. This feature can be used to perform mixed precision training by allowing the model to use half-precision floating-point numbers for most of the computation, while automatically promoting certain variables to single-precision when necessary.

GradScaler is a technique used to prevent underflow and overflow errors that can occur when using half-precision floating-point numbers. This technique involves scaling the gradients during backpropagation to prevent them from becoming too small or too large. By using GradScaler in combination with autocasting, it is possible to achieve AMP (Automatic Mixed Precision) training, which can significantly speed up training times and reduce memory usage while maintaining high levels of accuracy.

In [11]:
from torch.optim import AdamW
from torch.cuda.amp import autocast, GradScaler
import tqdm

scaler = GradScaler()

def infer(inp):
  model.eval()
  inp = "<startofstring> "+inp+" <bot>: "
  inp = tokenizer(inp,return_tensors='pt')
  X = inp['input_ids'].to(device)
  a = inp['attention_mask'].to(device)
  output = model.generate(X, attention_mask=a, max_length=512)
  output = tokenizer.decode(output[0],skip_special_tokens=True)
  return output


def train(dataset, model, optim):
  num_epochs = 200
  model.to(device)
  model.train()
  for i in tqdm.tqdm(range(num_epochs)):
    for batch in dataset:
      X = batch['input_ids'].to(device)
      a = batch['attention_mask'].to(device)
      optim.zero_grad()
      with autocast(): ## for mixed precision training
        out = model(X, attention_mask=a,labels=X)
        loss=out.loss
      scaler.scale(loss).backward()
      scaler.step(optim)
      scaler.update()
  model.save_pretrained('/content/model')



train(dataloader,model, optimizer)
#print(infer('Hello?'))

100%|██████████| 10/10 [00:38<00:00,  3.83s/it]


In [8]:
print("infer from model : ")
while True:
  inp = input()
  print(infer(inp))

infer from model : 
hello


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 hello  <bot>:   <bot>:  Hi there! 
What are you?


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 What are you?  <bot>:   <bot>:  I am ChatGPT, an AI language model. My goal is to assist you with any questions or tasks you have. 
Can you tell a joke?


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 Can you tell a joke?  <bot>:   <bot>:  Sure, here's a joke: Why don't scientists trust atoms? Because they make up everything! 
What are the books you recommend a person to read?


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 What are the books you recommend a person to read?  <bot>:   You might enjoy 'To Kill a Mockingbird' by Harper Lee. It's a powerful and thought-provoking book. 
Do you have feelings of your own?


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 Do you have feelings of your own?  <bot>:   Of course! I'm here to help. Please let me know what you need assistance with. 


KeyboardInterrupt: ignored

In [12]:
from huggingface_hub import notebook_login, HfApi

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [13]:
api = HfApi()

api.upload_folder(
    folder_path='/content/model',
    path_in_repo=".",
    repo_id="Neupane9Sujal/GPT2-chatbot",
    repo_type='model',
    create_pr=1
)

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

'https://huggingface.co/Neupane9Sujal/GPT2-chatbot/tree/refs%2Fpr%2F1/.'

The performance of this GPT-chatbot can obviously improved by training on more data and training for longer period of time.