# Pretraining a Customer Support Model on X (former Twitter) Data
copyright 2023-2024, Denis Rothman

This is an educational notebook to show how to implement a Hugging Face RoRobertaForCausalLM model on messages on X(former Twitter). The goal is only to show the method(see limitations below).

**Pretraining a Generative AI model from scratch**

**Dataset:**Tweets from 20 Top Brands by Volume  
**Model:**  RobertaForCausalLM

**April 29, 2024 update for conflict resolution of** *Step 2 Installing Hugging Face transformer and datasets*  **between the most recent accelerate, Transformers, and datasets packages. It was necessary to freeze the versions pending unification of the most recent versions of Hugging Face installation packages**


![](https://i.imgur.com/nTv3Iuu.png)




The goal of the notebook is to train a Hugging Face RobertaForCausalLM model to simulate a customer support chat agent for X (former Twitter)

This notebook requires a GPU.

**Customer Support on Twitter**
Over 2 million tweets and replies from the biggest brands on Twitter

https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter

**Limitations**:

The scope of pretraining was limited to a subset of the dataset for time constraints. You can train the full dataset on Google Colab or another platform. You can also select another model if you find the generalized reponses insufficient.The reponses are only there to show how the system workds.

RoBERTa is not a standard generative AI model such as GPT models as in the Chapter07 directory. However, it can be implemented as a reasonably interesting autoregressive(token by token loop) model that illustrates how to begin to explore how generative AI works.

In the following chapters we will be using **GPT-4** and other **LLM** models. *However, exploring smaller open source models for a specific domain can sometimes provide everything we need for our project.*


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Kaggle credentials for authentification

In [2]:
import os
import json
with open(os.path.expanduser("drive/MyDrive/files/kaggle.json"), "r") as f:
    kaggle_credentials = json.load(f)

kaggle_username = kaggle_credentials["username"]
kaggle_key = kaggle_credentials["key"]

os.environ["KAGGLE_USERNAME"] = kaggle_username
os.environ["KAGGLE_KEY"] = kaggle_key

In [3]:
try:
  import kaggle
except:
  !pip install kaggle
  import kaggle

In [4]:
kaggle.api.authenticate()

#Step 1: Downloading the dataset

https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter

In [5]:
!kaggle datasets download -d thoughtvector/customer-support-on-twitter

Downloading customer-support-on-twitter.zip to /content
 96% 161M/169M [00:02<00:00, 80.5MB/s]
100% 169M/169M [00:02<00:00, 69.4MB/s]


In [6]:
import zipfile

with zipfile.ZipFile('/content/customer-support-on-twitter.zip', 'r') as zip_ref:
    zip_ref.extractall('/content/')

print("File Unzipped!")

File Unzipped!


#Step 2: Installing Hugging Face transformers and datasets

**April 2023 update From Hugging Face Issue 22816**

https://github.com/huggingface/transformers/issues/22816

"The PartialState import was added as a dependency on the transformers development branch yesterday. PartialState was added in the 0.17.0 release in accelerate, and so for the development branch of transformers, accelerate >= 0.17.0 is required.

Downgrading the transformers version removes the code which is importing PartialState."

Denis Rothman: The following cell imports the latest version of Hugging Face transformers but without downgrading it.

To adapt to the Hugging Face upgrade, A GPU accelerator was activated using the Google Colab Pro with the following NVIDIA GPU:
GPU Name: NVIDIA A100-SXM4-40GB

**April 29, 2024 update for conflict resolution of** *Step 2 Installing Hugging Face transformer and datasets*  **between the most recent accelerate, Transformers, and datasets packages. It was necessary to freeze the versions pending unification of the most recent versions of Hugging Face installation packages**

The conflict between the versions of the `hugging face-hub` in the `transformers` and `datasets` installations arises from the specific version requirements of these libraries and their dependencies on `huggingface-hub`. Here's a detailed explanation of the situation:

### Initial Setup and Conflict

1. **Transformers and Accelerate Installation**:
   - When the program initially installed `transformers` and `accelerate`, the `huggingface-hub` version 0.20.3 was installed as a dependency. This version was compatible with the requirements of both `transformers` and `accelerate` at that time.
   - Message: `Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.20.3)`

2. **Datasets Installation**:
   - Subsequently, when the `datasets` library was installed, it required a newer version of `huggingface-hub` (version 0.22.2). This was because `datasets` had been updated to depend on newer features or fixes introduced in `huggingface-hub` after version 0.20.3.
   - Message: `Successfully installed datasets-2.19.0 dill-0.3.8 huggingface-hub-0.22.2 multiprocess-0.70.16 xxhash-3.4.1`

3. **Version Conflict**:
   - The installation of `datasets` led to an upgrade of `huggingface-hub` from 0.20.3 to 0.22.2. This upgrade caused a conflict because `huggingface-hub` was already loaded into memory (imported) at the older version (0.20.3) as part of the Python runtime environment when `accelerate` and `transformers` were installed.
   - Warning Message: `WARNING: The following packages were previously imported in this runtime: [huggingface_hub] You must restart the runtime in order to use newly installed versions.`


### Resolution Strategy

To resolve this version conflict without requiring frequent restarts of the runtime, especially in environments like Jupyter notebooks or online IDEs, the program adjusted the library versions as follows:

- **Unified Library Version Installation**:
  - The program chose to install specific versions of `accelerate`, `transformers`, and `datasets` that are compatible with the same version of `huggingface-hub`. By pinning these libraries to versions that all require the same `huggingface-hub` version, it avoided the need for mid-session upgrades that necessitate a runtime restart.
  - Commands:
    ```bash
    !pip install accelerate==0.29.3
    !pip install Transformers==4.40.1
    !pip install datasets==2.16.0
    ```

By aligning the versions of `accelerate`, `transformers`, and `datasets` to those that share a common dependency version of `huggingface-hub`, the program ensures that all libraries are compatible without causing interruptions due to dependency conflicts.



In [None]:
!pip install accelerate==0.29.3
!pip install Transformers==4.40.1
!pip install datasets==2.16.0 #installing Hugging Face datasets for data loading and preprocessing

from accelerate import Accelerator

creating subdirectories to store the datasets, the logs and the trained model

In [8]:
!mkdir -p /content/model/dataset/
!mkdir -p /content/model/model/
!mkdir -p /content/model/logs/

# Step 3:  Loading and filtering the data

We will use a subset of the dataset to train the model.

In [9]:
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/twcs/twcs.csv')

# Check the first few rows to understand the data
print(df.head())

   tweet_id   author_id  inbound                      created_at  \
0         1  sprintcare    False  Tue Oct 31 22:10:47 +0000 2017   
1         2      115712     True  Tue Oct 31 22:11:45 +0000 2017   
2         3      115712     True  Tue Oct 31 22:08:27 +0000 2017   
3         4  sprintcare    False  Tue Oct 31 21:54:49 +0000 2017   
4         5      115712     True  Tue Oct 31 21:49:35 +0000 2017   

                                                text response_tweet_id  \
0  @115712 I understand. I would like to assist y...                 2   
1      @sprintcare and how do you propose we do that               NaN   
2  @sprintcare I have sent several private messag...                 1   
3  @115712 Please send us a Private Message so th...                 3   
4                                 @sprintcare I did.                 4   

   in_response_to_tweet_id  
0                      3.0  
1                      1.0  
2                      4.0  
3                      5.0  
4

Extracting relevant data

In this case, we are extracting the text

In [10]:
# Extract tweets from the 'text' column or any other relevant column
tweets = df['text'].dropna().tolist()  # This assumes the column with tweets is named 'text'

In [11]:
# Convert the list of tweets to a DataFrame
df_tweets = pd.DataFrame(tweets, columns=['text'])

# Save the DataFrame to a CSV file
df_tweets.to_csv('tweets.csv', index=False, encoding='utf-8')

In [12]:
# Checking the length of df
formatted_length = "{:,}".format(len(df_tweets))
print(formatted_length)

2,811,774


Checking the extraction

In [13]:
for tweet in tweets[:10]:  # This will display the first 5 tweets
    print(tweet)

@115712 I understand. I would like to assist you. We would need to get you into a private secured link to further assist.
@sprintcare and how do you propose we do that
@sprintcare I have sent several private messages and no one is responding as usual
@115712 Please send us a Private Message so that we can further assist you. Just click ‘Message’ at the top of your profile.
@sprintcare I did.
@115712 Can you please send us a private message, so that I can gain further details about your account?
@sprintcare is the worst customer service
@115713 This is saddening to hear. Please shoot us a DM, so that we can look into this for you. -KC
@sprintcare You gonna magically change your connectivity for me and my whole family ? 🤥 💯
@115713 We understand your concerns and we'd like for you to please send us a Direct Message, so that we can further assist you. -AA


filtering the extraction to clean it and apply lowercase conversion

In [14]:
import re

def filter_tweet(tweet):
    # Keep only characters a to z, spaces, and apostrophes, then convert to lowercase
    return re.sub(r'[^a-z\s\']', '', tweet.lower())

filtered_tweets = [filter_tweet(tweet) for tweet in tweets]

In [15]:
f=30
filtered_tweets = [tweet for tweet in filtered_tweets if len(tweet.split()) > f]  # Only keep tweets with more than f words

In [16]:
for filtered_tweet in filtered_tweets[:10]:  # This will display the first 5 tweets
    print(filtered_tweet)

marksandspencer i check with the gov office and legal they stated you are not right but its funny how the other stores dont but you do no wonder lidl and the rest are beating you
marksandspencer ou must charge at least p a bag including vat for carrier bags that are all of the following

unused  its new and hasnt already been used for sold goods to be taken away or delivered
plastic and  microns thick or less
it has handles an opening and isnt sealed
marksandspencer arent require charge  a bag
paper bags
shops in airports or on board trains aeroplanes or ships
bags which only contain certain items such as unwrapped food raw meat and fish where there is a food safety risk prescription medicines uncovered blades seeds bulbs amp s
 hi you can change your microsoft account email through the steps here httpstcodkehohboyy  if the email your son wants to change to is already associated with a microsoft account you'll need to follow those steps to switch the email address on that account too z

In [17]:
# Checking the length of dataset
formatted_length = "{:,}".format(len(filtered_tweets))
print(formatted_length)

228,637


save the dataset

In [18]:
import csv
# Save to CSV
with open('/content/model/dataset/processed_tweets.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    for tweet in filtered_tweets:
        writer.writerow([tweet])

check the file

In [19]:
import csv

# Read from CSV
with open('/content/model/dataset/processed_tweets.csv', 'r') as file:
    reader = csv.reader(file)

    # Use islice from itertools to only get the first 5 lines
    from itertools import islice
    for row in islice(reader, 5):
        print(row[0])

marksandspencer i check with the gov office and legal they stated you are not right but its funny how the other stores dont but you do no wonder lidl and the rest are beating you
marksandspencer ou must charge at least p a bag including vat for carrier bags that are all of the following

unused  its new and hasnt already been used for sold goods to be taken away or delivered
plastic and  microns thick or less
it has handles an opening and isnt sealed
marksandspencer arent require charge  a bag
paper bags
shops in airports or on board trains aeroplanes or ships
bags which only contain certain items such as unwrapped food raw meat and fish where there is a food safety risk prescription medicines uncovered blades seeds bulbs amp s
 hi you can change your microsoft account email through the steps here httpstcodkehohboyy  if the email your son wants to change to is already associated with a microsoft account you'll need to follow those steps to switch the email address on that account too z

#Step 4: Checking Resource Constraints: GPU and CUDA

In [20]:
!nvidia-smi

Mon Apr 29 10:02:14 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [21]:
#@title Checking that PyTorch Sees CUDA
import torch
torch.cuda.is_available()

True

#Step 5: Defining the configuration of the model

In [22]:
from transformers import RobertaConfig, RobertaForCausalLM

config = RobertaConfig(
    vocab_size=52_000,
    max_position_embeddings=514,
    num_attention_heads=12,
    num_hidden_layers=6,
    type_vocab_size=1,
    is_decoder=True,  # Set up the model for potential seq2seq use, allowing for autoregressive outputs
)

In [23]:
print(config)

RobertaConfig {
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": true,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.40.0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 52000
}



define and print model

In [24]:
# Create the RobertaForCausalLM model with the specified config
model = RobertaForCausalLM(config=config)
print(model)

RobertaForCausalLM(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(52000, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-5): 6 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): La

##  defining the tokenizer

In [25]:
from transformers import RobertaTokenizer

# Initialize the tokenizer using the 'roberta-base' pre-trained model
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

In [26]:
# Display special tokens
print("Special tokens:", tokenizer.special_tokens_map)

Special tokens: {'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': '<mask>'}


## Exploring the parameters

In [27]:
print(model.num_parameters())

83504416


In [28]:
LP=list(model.parameters())
lp=len(LP)
print(lp)
for p in range(0,lp):
  print(LP[p])

106
Parameter containing:
tensor([[ 0.0032, -0.0080, -0.0213,  ...,  0.0179, -0.0193,  0.0111],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0120, -0.0048,  0.0381,  ..., -0.0154, -0.0294,  0.0083],
        ...,
        [-0.0361,  0.0237,  0.0058,  ..., -0.0381,  0.0158,  0.0031],
        [-0.0177, -0.0491,  0.0271,  ...,  0.0064,  0.0392,  0.0137],
        [ 0.0212, -0.0072, -0.0150,  ...,  0.0189, -0.0352,  0.0299]],
       requires_grad=True)
Parameter containing:
tensor([[ 0.0136, -0.0152,  0.0201,  ...,  0.0170,  0.0056,  0.0302],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0519,  0.0031,  0.0043,  ..., -0.0148, -0.0083,  0.0344],
        ...,
        [-0.0020,  0.0281, -0.0140,  ..., -0.0007, -0.0116, -0.0140],
        [-0.0019, -0.0135,  0.0273,  ...,  0.0066, -0.0151, -0.0129],
        [-0.0211,  0.0238, -0.0053,  ...,  0.0228,  0.0074,  0.0394]],
       requires_grad=True)
Parameter containing:
tensor([

In [29]:
#Shape of each tensor in the model
LP = list(model.parameters())
for i, tensor in enumerate(LP):
    print(f"Shape of tensor {i}: {tensor.shape}")

Shape of tensor 0: torch.Size([52000, 768])
Shape of tensor 1: torch.Size([514, 768])
Shape of tensor 2: torch.Size([1, 768])
Shape of tensor 3: torch.Size([768])
Shape of tensor 4: torch.Size([768])
Shape of tensor 5: torch.Size([768, 768])
Shape of tensor 6: torch.Size([768])
Shape of tensor 7: torch.Size([768, 768])
Shape of tensor 8: torch.Size([768])
Shape of tensor 9: torch.Size([768, 768])
Shape of tensor 10: torch.Size([768])
Shape of tensor 11: torch.Size([768, 768])
Shape of tensor 12: torch.Size([768])
Shape of tensor 13: torch.Size([768])
Shape of tensor 14: torch.Size([768])
Shape of tensor 15: torch.Size([3072, 768])
Shape of tensor 16: torch.Size([3072])
Shape of tensor 17: torch.Size([768, 3072])
Shape of tensor 18: torch.Size([768])
Shape of tensor 19: torch.Size([768])
Shape of tensor 20: torch.Size([768])
Shape of tensor 21: torch.Size([768, 768])
Shape of tensor 22: torch.Size([768])
Shape of tensor 23: torch.Size([768, 768])
Shape of tensor 24: torch.Size([768])
Sh

In [30]:
#counting the parameters
np=0
for p in range(0,lp):#number of tensors
  PL2=True
  try:
    L2=len(LP[p][0]) #check if 2D
  except:
    L2=1             #not 2D but 1D
    PL2=False
  L1=len(LP[p])
  L3=L1*L2
  np+=L3             # number of parameters per tensor
  if PL2==True:
    print(p,L1,L2,L3)  # displaying the sizes of the parameters
  if PL2==False:
    print(p,L1,L3)  # displaying the sizes of the parameters

print(np)              # total number of parameters

0 52000 768 39936000
1 514 768 394752
2 1 768 768
3 768 768
4 768 768
5 768 768 589824
6 768 768
7 768 768 589824
8 768 768
9 768 768 589824
10 768 768
11 768 768 589824
12 768 768
13 768 768
14 768 768
15 3072 768 2359296
16 3072 3072
17 768 3072 2359296
18 768 768
19 768 768
20 768 768
21 768 768 589824
22 768 768
23 768 768 589824
24 768 768
25 768 768 589824
26 768 768
27 768 768 589824
28 768 768
29 768 768
30 768 768
31 3072 768 2359296
32 3072 3072
33 768 3072 2359296
34 768 768
35 768 768
36 768 768
37 768 768 589824
38 768 768
39 768 768 589824
40 768 768
41 768 768 589824
42 768 768
43 768 768 589824
44 768 768
45 768 768
46 768 768
47 3072 768 2359296
48 3072 3072
49 768 3072 2359296
50 768 768
51 768 768
52 768 768
53 768 768 589824
54 768 768
55 768 768 589824
56 768 768
57 768 768 589824
58 768 768
59 768 768 589824
60 768 768
61 768 768
62 768 768
63 3072 768 2359296
64 3072 3072
65 768 3072 2359296
66 768 768
67 768 768
68 768 768
69 768 768 589824
70 768 768
71 768 768

# Step 6: Creating and processing the dataset

In [32]:
#load dataset
from datasets import load_dataset
dataset = load_dataset('csv', data_files='/content/model/dataset/processed_tweets.csv', column_names=["text"])

Generating train split: 0 examples [00:00, ? examples/s]

In [33]:
# split datasets into train and eval
from datasets import DatasetDict

dataset = dataset['train'].train_test_split(test_size=0.1)  # 10% for evaluation
dataset = DatasetDict(dataset)

In [34]:
# Tokenize datasets:
# - If a record's length is less than `max_length`, it's padded to ensure all records have the same length.
# - If a record's length exceeds `max_length`, it's truncated to the specified max length.

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/205773 [00:00<?, ? examples/s]

KeyboardInterrupt: 

In [None]:
# datacollator to batch items together for training and evaluation
from transformers import DataCollatorForLanguageModeling

# Define the data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # For causal (autoregressive) language modeling
)

# Step 7: Initializing the trainer

The number of epochs can be empirically increased until the
accuracy versus training time reaches a limit.

In [None]:
# to display the time every x steps suring training
from transformers import Trainer
from datetime import datetime
from typing import Dict, Any

class CustomTrainer(Trainer):
    def log(self, logs: Dict[str, Any]) -> None:
        super().log(logs)
        if "step" in logs:  # Check if "step" key is in the logs dictionary
            step = int(logs["step"])
            if step % self.args.eval_steps == 0:
                print(f"Current time at step {step}: {datetime.now()}")

In [None]:
import logging
from transformers import Trainer, TrainingArguments

# Set up Python logging
logging.basicConfig(level=logging.INFO)

training_args = TrainingArguments(
    output_dir="/content/model/model/",
    overwrite_output_dir=True,
    num_train_epochs=2,                  # can be increased to increase accuracy if productive
    per_device_train_batch_size=64,      # batch size per device
    save_steps=10_000,                   # save a checkpoint every save_steps=10000
    save_total_limit=2,                  # the maximum number of checkpoint model files to keep
    logging_dir='/content/model/logs/',  # directory for storing logs
    logging_steps=100,                   # Log every 100 steps
    logging_first_step=True,             # Log the first step
    evaluation_strategy="steps",         # Evaluate every "eval_steps"
    eval_steps=500,                      # Evaluate every 500 steps
)

In [None]:
trainer = CustomTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset = tokenized_datasets["train"],
    eval_dataset = tokenized_datasets["test"]
)

# Step 8: Pretraining the model


In [None]:
%%time
trainer.train()

**Sample run information:**

CPU times: user 26min 37s, sys: 3.01 s, total: 26min 40s
Wall time: 26min 35s


TrainOutput(global_step=3216, training_loss=5.05824806411468, metrics={'train_runtime': 1595.3194,

'train_samples_per_second': 128.985, 'train_steps_per_second': 2.016, 'total_flos': 6822770940370944.0,

'train_loss': 5.05824806411468, 'epoch': 1.0})

display results

In [None]:
results = trainer.evaluate()
print(results)

evaluate the trainer

In [None]:
trainer.evaluate()

#Step 9: Saving the trained model (+tokenizer + config) to disk

In [None]:
trainer.save_model("/content/model/model/")

In [None]:
#Uncomment the following line to save the output for future use
#trainer.save_model("drive/MyDrive/files/model_C6/model/")

# Step 10: User Interface to Chat with the Generative AI Agent

In [None]:
# For standalone run : transformer library
#!pip install Transformers

In [None]:
#1.A.for standalone run : mount Google Drive and path to pretrained model
'''
from google.colab import drive
drive.mount('/content/drive')
model_path="drive/MyDrive/files/model_C6/model/"
'''

In [None]:
# 1.A For a run during the training session of this notebook,
# Load the trained model: model path
# local model path(comment for a standalone run):
model_path="/content/model/model/"

In [None]:
# 1.B Load the trained model and tokenizer : model and tokenizer
from transformers import RobertaConfig, RobertaForCausalLM
from transformers import RobertaTokenizer
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaForCausalLM.from_pretrained(model_path)

## Running samples


In [None]:
# 2. Tokenize an input prompt
prompt = "I would like to know why they moved us"
inputs = tokenizer(prompt, return_tensors="pt", max_length=50, truncation=True)

# 3. Generate a response from the model
output = model.generate(**inputs, max_length=100, temperature=0.9, num_return_sequences=1)

# 4. Decode the generated output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

print(generated_text)

Example:    
Input   
I would like to assist       
output:   
I would like to assist you please give us your full name address and phone number so we can look into this for you

## Interface

In [None]:
!pip install ipywidgets

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
from transformers import RobertaTokenizer, RobertaForCausalLM

# Define the function to generate response
def generate_response(prompt):
    inputs = tokenizer(prompt, return_tensors="pt", max_length=50, truncation=True)
    output = model.generate(**inputs, max_length=200, temperature=0.9, num_return_sequences=1)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text

# Create widgets
text_input = widgets.Textarea(
    description='Prompt:',
    placeholder='Enter your prompt here...'
)

button = widgets.Button(
    description='Generate',
    button_style='success'
)

output_text = widgets.Output(layout={'border': '1px solid black', 'height': '100px'})

# Define button click event handler
def on_button_clicked(b):
    with output_text:
        clear_output()
        response = generate_response(text_input.value)
        print(response)

button.on_click(on_button_clicked)

# Display widgets
display(text_input, button, output_text)