**Generative Text for Customer Support Automation**

Project Overview: Develop an AI-powered system to automate customer support interactions using generative models like GPT-3.5.

In [5]:
pip install transformers torch pandas


Note: you may need to restart the kernel to use updated packages.


In [6]:
import pandas as pd

# Create a sample customer support data
data = {
    'text': [
        "How can I reset my password?",
        "What are your opening hours?",
        "Can I change my subscription plan?",
        "How do I update my billing information?",
        "I have a problem with my order. Can you help?",
        "Where can I find the user manual?",
        "My account is locked. What should I do?",
        "How do I contact customer support?",
        "What is the refund policy?",
        "Can I track my shipment online?",
        "I need help with the installation process.",
        "How can I cancel my order?",
        "Is there a discount for bulk purchases?",
        "Can I change the delivery address?",
        "How do I use the promotional code?",
        "The product I received is damaged. What now?",
        "How long does the warranty last?",
        "Can I return a product without the receipt?",
        "How do I know if my payment was successful?",
        "What payment methods do you accept?"
    ]
}

# Convert to DataFrame
df = pd.DataFrame(data)
# Save to CSV
file_path = "customer_support_data.csv"
df.to_csv("customer_support_data.csv", index=False)


In [7]:
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

# Download NLTK resources
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\toshi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\toshi\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\toshi\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [8]:
# Data cleaning and preprocessing function
def preprocess_text(text):
    # Convert text to lowercase
    text = text.lower()
    # Remove punctuation and numbers
    text = re.sub(r'[^a-z\s]', '', text)
    # Tokenize text
    words = word_tokenize(text)
    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    words = [word for word in words if word not in stop_words]
    # Lemmatize words
    lemmatizer = WordNetLemmatizer()
    words = [lemmatizer.lemmatize(word) for word in words]
    # Join words back into a single string
    text = ' '.join(words)
    return text

# Apply preprocessing to the text data
df['text'] = df['text'].apply(preprocess_text)

# Save to CSV
file_path = "cleaned_customer_support_data.csv"
df.to_csv(file_path, index=False)
     


In [11]:
import pandas as pd
from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TrainingArguments
from transformers import TextDataset, DataCollatorForLanguageModeling

# Load your customer support data
data = pd.read_csv('customer_support_data.csv')  

# Save the text data to a file
with open("train.txt", "w") as f:
    for line in data['text']:
        f.write(line + "\n")

In [12]:
# Function to load dataset
def load_dataset(file_path, tokenizer, block_size=128):
    dataset = TextDataset(
        tokenizer=tokenizer,
        file_path=file_path,
        block_size=block_size
    )
    return dataset
     


In [13]:
# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

In [15]:
# Prepare the dataset and data collator
train_dataset = load_dataset("train.txt", tokenizer)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
)



In [16]:
!pip install accelerate -U



In [17]:
!pip uninstall -y accelerate transformers
!pip install accelerate transformers[torch] --quiet

Found existing installation: accelerate 0.32.1
Uninstalling accelerate-0.32.1:
  Successfully uninstalled accelerate-0.32.1
Found existing installation: transformers 4.42.4
Uninstalling transformers-4.42.4:
  Successfully uninstalled transformers-4.42.4


In [18]:
import accelerate
import transformers
import torch

print("Accelerate version:", accelerate.__version__)
print("Transformers version:", transformers.__version__)
print("Torch version:", torch.__version__)

Accelerate version: 0.32.1
Transformers version: 4.42.4
Torch version: 2.3.1+cpu


In [19]:
from packaging import version
import accelerate

required_version = "0.21.0"
installed_version = accelerate.__version__

if version.parse(installed_version) >= version.parse(required_version):
    print(f"Accelerate version {installed_version} is compatible.")
else:
    raise ImportError(
        f"Accelerate version {installed_version} is not compatible. "
        f"Please install accelerate>={required_version}."
    )

Accelerate version 0.32.1 is compatible.


In [20]:
!pip install transformers torch --quiet


In [21]:
# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)
     


In [22]:
!pip install pyarrow==10.0.1 datasets==2.11.0 --quiet

In [23]:
!pip uninstall -y pyarrow datasets cudf-cu12

Found existing installation: pyarrow 10.0.1
Uninstalling pyarrow-10.0.1:
  Successfully uninstalled pyarrow-10.0.1
Found existing installation: datasets 2.11.0
Uninstalling datasets-2.11.0:
  Successfully uninstalled datasets-2.11.0




In [27]:
%pip install --upgrade pip



Collecting pip
  Using cached pip-24.1.2-py3-none-any.whl.metadata (3.6 kB)
Using cached pip-24.1.2-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.3.1
    Uninstalling pip-23.3.1:
      Successfully uninstalled pip-23.3.1
Successfully installed pip-24.1.2
Note: you may need to restart the kernel to use updated packages.


In [28]:

!pip install pyarrow==14.0.1 datasets --quiet


In [29]:
# Initialize Accelerator
from accelerate import Accelerator
accelerator = Accelerator()

# Assuming you are using a huggingface transformer model
from torch.optim import AdamW
learning_rate = 1e-5 # Set your desired learning rate here
optimizer = AdamW(model.parameters(), lr=learning_rate)

# Prepare the model, optimizer, and dataloader for distributed training
model, optimizer, train_dataloader = accelerator.prepare(
    model,
    optimizer,
    train_dataset
)

In [31]:
for epoch in range(num_train_epochs):
    model.train()
    for step, batch in enumerate(train_dataloader):
        # Check if 'batch' is a dictionary before proceeding
        if isinstance(batch, dict):
            print(f"Step {step} batch keys: {batch.keys()}")  # Print keys of the batch dictionary
            for key in batch:
                # Check if the item is a tensor before printing its shape
                if isinstance(batch[key], torch.Tensor):
                    print(f"{key} shape: {batch[key].shape}")
                else:
                    print(f"{key} is not a tensor")
        else:
            print(f"Step {step}: Batch is not a dictionary, it's a {type(batch)}")
        break

Step 0: Batch is not a dictionary, it's a <class 'torch.Tensor'>


In [32]:
import torch

# Training loop
num_train_epochs = 1
for epoch in range(num_train_epochs):
    model.train()
    for step, batch in enumerate(train_dataloader):
        # Print batch shape
        print(f"Step {step} batch shape: {batch.shape}")

        # Ensure tensors have the correct dimensions
        if batch.dim() == 1:
            batch = batch.unsqueeze(0)

        # Move tensor to the appropriate device
        batch = batch.to(accelerator.device)

        # Assuming batch contains the input_ids and attention_mask
        # If batch is a tensor, you may need to generate attention masks
        attention_mask = torch.ones(batch.shape, device=batch.device)

        # Forward pass
        outputs = model(input_ids=batch, attention_mask=attention_mask, labels=batch) # Add labels for loss calculation
        # Check if model outputs a loss directly
        if hasattr(outputs, 'loss'):
            loss = outputs.loss
        else:
            # Calculate loss manually if needed (example with cross-entropy loss)
            loss_fn = torch.nn.CrossEntropyLoss()
            logits = outputs.logits  # Assuming your model outputs logits
            loss = loss_fn(logits.view(-1, logits.size(-1)), batch.view(-1))
        # Backward pass and optimization
        accelerator.backward(loss)
        optimizer.step()
        optimizer.zero_grad()

        if step % 100 == 0:
            print(f"Epoch {epoch}, Step {step}, Loss: {loss.item()}")



Step 0 batch shape: torch.Size([128])
Epoch 0, Step 0, Loss: 2.2592356204986572


In [33]:
# Save the model
model.save_pretrained("./gpt2-customer-support")
tokenizer.save_pretrained("./gpt2-customer-support")

('./gpt2-customer-support\\tokenizer_config.json',
 './gpt2-customer-support\\special_tokens_map.json',
 './gpt2-customer-support\\vocab.json',
 './gpt2-customer-support\\merges.txt',
 './gpt2-customer-support\\added_tokens.json')

In [34]:
# Load the fine-tuned model and tokenizer
model = GPT2LMHeadModel.from_pretrained("./gpt2-customer-support")
tokenizer = GPT2Tokenizer.from_pretrained("./gpt2-customer-support")

In [35]:
# Function to generate responses
def generate_response(prompt):
    inputs = tokenizer.encode(prompt, return_tensors='pt')
    outputs = model.generate(inputs, max_length=150, num_return_sequences=1)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response
     

In [38]:
def generate_response(prompt):
    # Tokenize the input prompt
    inputs = tokenizer(prompt, return_tensors='pt')

    # Generate attention mask
    attention_mask = inputs['attention_mask']

    # Generate response
    outputs = model.generate(
        input_ids=inputs['input_ids'],
        attention_mask=attention_mask,
        max_length=250,  # Increase the maximum length for a more detailed response
        min_length=50,   # Ensure a minimum length for the response
        num_return_sequences=1,
        pad_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=3,  # Prevents repetition of 3-grams
        num_beams=5,  # Beam search for better output
        temperature=0.7,  # Control the randomness of predictions
        top_k=50,  # Consider the top 50 tokens by probability
        top_p=0.9  # Nucleus sampling - consider the top 90% of probability mass
    )

    # Decode the output
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Add specific instructions for changing the password
    if "change my password" in prompt.lower() or "reset my password" in prompt.lower():
        response += "\n\nTo change your password, follow these steps:\n"
        response += "1. Log in to your account.\n"
        response += "2. Go to 'Account Settings' or 'Profile'.\n"
        response += "3. Click on 'Security' or 'Password Management'.\n"
        response += "4. Enter your current password.\n"
        response += "5. Enter your new password and confirm it.\n"
        response += "6. Save the changes.\n"
        response += "If you encounter any issues, please contact support for further assistance."

    return response

# Example usage
customer_query = "How can I reset my password?"
response = generate_response(customer_query)
print("Response:", response)


Response: How can I reset my password?

You can reset your password at any time by going to Settings > Security > Reset Password.

How do I change my password after I log in?


To change your password after you log in, follow these steps:

1. Go to your account settings page and click on "Change Password".

2. Click on the "Change password" button.


3. Enter your password and click "OK".


4. You will be prompted to enter your new password. Click "OK" to continue.

To change your password, follow these steps:
1. Log in to your account.
2. Go to 'Account Settings' or 'Profile'.
3. Click on 'Security' or 'Password Management'.
4. Enter your current password.
5. Enter your new password and confirm it.
6. Save the changes.
If you encounter any issues, please contact support for further assistance.


In [40]:
def generate_response(prompt):
    # Tokenize the input prompt
    inputs = tokenizer(prompt, return_tensors='pt')

    # Generate attention mask
    attention_mask = inputs['attention_mask']

    # Generate response
    outputs = model.generate(
        input_ids=inputs['input_ids'],
        attention_mask=attention_mask,
        max_length=250,  # Increase the maximum length for a more detailed response
        min_length=50,   # Ensure a minimum length for the response
        num_return_sequences=1,
        pad_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=3,  # Prevents repetition of 3-grams
        num_beams=5,  # Beam search for better output
        temperature=0.7,  # Control the randomness of predictions
        top_k=50,  # Consider the top 50 tokens by probability
        top_p=0.9  # Nucleus sampling - consider the top 90% of probability mass
    )

    # Decode the output
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Example usage
customer_query = "What is the refund policy?"
response = generate_response(customer_query)
print("Response:", response)

Response: What is the refund policy?

The refund policy allows you to cancel your account at any time without having to pay any additional fees.

What happens if I don't have my refund card?


If you don't receive your refund card within 30 days of receiving your refund, you will not be able to use it again until the refund is paid.


How do I cancel my refund?




You can cancel your refund by contacting us at:

https://www.facebook.com/events/104845678973944/

or by calling us at 1-800-845-9000

Please note that you will need to provide your name, address, phone number, email address, and any other information you may need to complete the form. If you have any questions, please do not hesitate to contact us. Thank you for your understanding.
