# Developing an AI-Powered SOP Assistance Chatbot Using BERT
**Introduction:**
Standard Operating Procedures (SOPs) and Personal Protective Equipment (PPE) are essential in ensuring safety and compliance in industrial tasks. This notebook outlines the process of creating an AI-powered chatbot that assists employees by providing relevant SOPs and PPE recommendations based on task descriptions. Leveraging BERT, a state-of-the-art NLP model, the chatbot classifies tasks and delivers precise instructions.

In [None]:
import pandas as pd
import random


df.read_csv('tasks_sop_ppe')

# Display the first few rows of the dataset
df.head()


Unnamed: 0,Task ID,Task Name,SOPs,PPEs
0,1,Task 1: Operation Mechanical Assembly,"SOP 171: Detailed Procedure on Task, SOP 315: ...","Goggles, Respirator, Safety Boots, Ear Plugs, ..."
1,2,Task 2: Operation Welding,"SOP 553: Detailed Procedure on Task, SOP 717: ...","Insulated Gloves, Safety Boots, Goggles"
2,3,Task 3: Operation Painting,"SOP 896: Detailed Procedure on Task, SOP 866: ...","Gloves, Apron, Safety Boots, Face Shield"
3,4,Task 4: Operation Painting,"SOP 415: Detailed Procedure on Task, SOP 510: ...","Apron, Safety Boots, Gloves"
4,5,Task 5: Operation Painting,"SOP 668: Detailed Procedure on Task, SOP 920: ...","Insulated Gloves, Helmet, Respirator, Apron, F..."


![picture](https://i.pinimg.com/736x/14/2a/d9/142ad96e2083d9b4a6ca25a60641f537.jpg)

# Step 2: Preparing Data for BERT Model Training
**Explanation**: We load the dataset and encode task descriptions into a format suitable for BERT. The data is split into training and testing sets, and we create custom datasets for use in the model training process.

**Technical Insight:** BERT (Bidirectional Encoder Representations from Transformers) requires text data to be tokenized and encoded before training. The data preparation process ensures that the model can learn the nuances of task descriptions effectively.

**Business Impact:** Preparing data for a machine learning model like BERT ensures that the chatbot will be capable of understanding and processing a wide range of task descriptions, leading to more accurate and reliable recommendations.

![picture](https://i.pinimg.com/736x/a0/a7/b7/a0a7b70d624fe0361c10de4b98292af8.jpg)

# **Step** 3: Fine-Tuning the BERT Model

**Explanation:** Here, we fine-tune the pre-trained BERT model on our task-specific dataset. The model learns to classify tasks based on descriptions, optimizing its performance through multiple epochs of training.


**Technical Insight**: Fine-tuning a pre-trained model like BERT on a specific dataset allows it to adapt to the particularities of the task at hand, resulting in a highly accurate classification model.


**Business Impact:** Fine-tuning the model ensures that the chatbot can accurately match tasks with SOPs and PPEs, improving workplace safety and efficiency by providing precise guidance.

![picture](https://i.pinimg.com/736x/6b/5b/ed/6b5bed998c7340caa5e0ed6aeb80477a.jpg)

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
import torch
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Load dataset and prepare labels
df = pd.read_csv('tasks_sop_ppe.csv')

# Encode labels
label_encoder = LabelEncoder()
df['label'] = label_encoder.fit_transform(df['Task ID'])

# Prepare data for model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def encode_data(texts, labels):
    return tokenizer(texts.tolist(), padding=True, truncation=True, return_tensors="pt"), torch.tensor(labels.tolist())

texts = df['Task Name']
labels = df['label']

# Split data into train and test sets
texts_train, texts_test, labels_train, labels_test = train_test_split(texts, labels, test_size=0.2, random_state=42)
train_encodings, train_labels = encode_data(texts_train, labels_train)
test_encodings, test_labels = encode_data(texts_test, labels_test)

class SOPDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

# Create dataset
train_dataset = SOPDataset(train_encodings, train_labels)
test_dataset = SOPDataset(test_encodings, test_labels)

# Load pre-trained BERT model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=len(df['label'].unique()))

# Set up Trainer
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=2,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

# Fine-tune the model
trainer.train()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
  item['labels'] = torch.tensor(self.labels[idx])


Step,Training Loss


TrainOutput(global_step=40, training_loss=5.360018920898438, metrics={'train_runtime': 103.0441, 'train_samples_per_second': 3.105, 'train_steps_per_second': 0.388, 'total_flos': 1317894021120.0, 'train_loss': 5.360018920898438, 'epoch': 2.0})

**Step 4: Implementing the Chatbot with NLP Integration**

![picture](https://i.pinimg.com/736x/3b/c2/f0/3bc2f0dc39af614c49814a6f0ac134ab.jpg)



**Explanation:** We implement a chatbot that utilizes the fine-tuned BERT model to classify tasks based on user input. The chatbot then retrieves and displays the relevant SOPs and PPEs, providing employees with the information they need to perform tasks safely.

**Technical Insight:** The chatbot uses the BERT model to process and classify natural language input, making it a powerful tool for automating the retrieval of task-specific safety information.

**Business Impact:** By integrating AI with everyday workplace tools, the chatbot enhances operational safety and efficiency, ensuring that employees have easy access to critical safety information.


# Step 5: Running and Interacting with the Chatbot

Deploying the chatbot in the workplace enhances safety compliance and streamlines operations, providing employees with instant access to crucial information.

![picture](https://i.pinimg.com/736x/75/e9/f7/75e9f7256e35da9be424fe9a4b6a6137.jpg)

In [None]:
# Function to classify task based on input text
def classify_task(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_task_id = torch.argmax(logits, dim=1).item()
    return label_encoder.inverse_transform([predicted_task_id])[0]

# Chatbot function with ML integration
def chatbot():
    print("Welcome to the AI-Powered SOP Assistance Chatbot with ML!")
    while True:
        user_input = input("\nEnter the task description or 'exit' to quit: ")
        if user_input.lower() == 'exit':
            print("Goodbye! Stay safe!")
            break

        # Classify the task using the NLP model
        predicted_task_id = classify_task(user_input)
        task_data = df[df['Task ID'] == predicted_task_id]

        if not task_data.empty:
            task_info = task_data.iloc[0]
            sops = task_info['SOPs'].split(", ")
            ppes = task_info['PPEs'].split(", ")
            task_name = task_info['Task Name']

            print(f"\nFor the task '{task_name}', here are the relevant SOPs and PPEs:")
            print("SOPs:")
            for sop in sops:
                print(f"- {sop}")
            print("\nPPEs:")
            for ppe in ppes:
                print(f"- {ppe}")

            # Offer additional help
            follow_up = input("\nDo you need step-by-step instructions for the SOP? (yes/no): ")
            if follow_up.lower() == 'yes':
                for sop in sops:
                    print(f"\nStep-by-step instructions for {sop}:")
                    print("Step 1: Ensure all tools are ready.")
                    print("Step 2: Follow safety protocols.")
                    print("Step 3: Complete the task as per the SOP.")
                    print("Step 4: Review your work for quality assurance.")
            else:
                print("Okay, let me know if you need anything else.")
        else:
            print(f"Sorry, no information found for the task '{user_input}'. Please try another task or check with your supervisor.")

# Run the chatbot
if __name__ == "__main__":
    chatbot()


Welcome to the AI-Powered SOP Assistance Chatbot with ML!

Enter the task description or 'exit' to quit: welding related

For the task 'Task 177: Operation Mechanical Assembly', here are the relevant SOPs and PPEs:
SOPs:
- SOP 964: Detailed Procedure on Task
- SOP 866: Detailed Procedure on Task
- SOP 230: Detailed Procedure on Task
- SOP 393: Detailed Procedure on Task
- SOP 581: Detailed Procedure on Task

PPEs:
- Insulated Gloves
- Ear Plugs
- Respirator

Do you need step-by-step instructions for the SOP? (yes/no): yes

Step-by-step instructions for SOP 964: Detailed Procedure on Task:
Step 1: Ensure all tools are ready.
Step 2: Follow safety protocols.
Step 3: Complete the task as per the SOP.
Step 4: Review your work for quality assurance.

Step-by-step instructions for SOP 866: Detailed Procedure on Task:
Step 1: Ensure all tools are ready.
Step 2: Follow safety protocols.
Step 3: Complete the task as per the SOP.
Step 4: Review your work for quality assurance.

Step-by-step inst