# Training LLM for Hotel Customer care

Hotel HuTao offers an online service where customers can interact with customer care representatives to resolve their doubts and queries. Maintaining a 24/7 live chat-based service with human representatives is costly and resource-intensive. To address this, we will train a large language model (LLM) using previously collected conversations to handle customer queries more efficiently.

In this notebook, we will utilize the dataset `Octopussss/HuTao_CustomerSupport_Hotel` from Hugging Face. This dataset contains historical conversations between customers and Hotel HuTao's customer care representatives, which will be used to train our model.

Let's begin by exploring the dataset and preparing it for training.

Sample dataset format:
```json
[
    {
        "query": #query,
        "response": #response,
    },
    {
        ....
    }
]
```

In [1]:
# prepare seed examples in required format
import json
import datasets
from pprint import pprint

data = datasets.load_dataset("Octopussss/HuTao_CustomerSupport_Hotel")
n_seed_samples = 10
data = [
    dict(query=d['Questions'], response=d['Response']) 
        for d in data['train'].to_list()[0:n_seed_samples]
]
json.dump(data, open('hotel_seed_data.json', 'w'))
pprint(data[-1])

{'query': 'Is parking available at the hotel?',
 'response': 'Yoh, now why might you be looking for parking, hm? Oh, you '
             "didn't know? Well, let me tell you, parking at the Continental "
             "Hotel is as smooth as a summer breeze! We've got a lovely "
             'parking lot right out front, with plenty of spots for our '
             "esteemed guests. And don't worry, it's all well-lit and secure, "
             'so you can rest easy knowing your vehicle is safe and sound. '
             'Now, would you like me to give you directions to the parking '
             'lot?'}


Create a client class with your own Leeroo API

In [2]:
import os
import time
from leeroo_client.client import LeerooClient

leeroo_api_key = "your-api-key-here"
client = LeerooClient(
    leeroo_api_key,
)

User: leeroo-mistral Logged in!


For designing the experiments, you need to provide us:

- `evaluation_description` (optional): A short summary of your application, and what are important evaluation factors in your mind. Just describe them in natural language.  
- `workflow_name` : The name of this experiment. This will be later saved along with the id of workflow.  
- `seed_data_path` (optional): Your dataset for the desired application. The dataset should follow JSON format with `query` and `response` as fields.

In [3]:
evaluation_policy = \
"""
    Evaluation Policy for Customer Care Responses
    - Responses should capture the user's attention and encourage further interaction.
    - Aim for a friendly and approachable tone to foster a positive customer experience.
    - Avoid jargon and overly technical language unless necessary for the context.
    - Incorporate light-hearted humor where appropriate to create a pleasant interaction, while ensuring it remains suitable for all audiences.
    - Humor should enhance the conversation without detracting from the core message.
    - Personalize responses when possible to show genuine care for the customerâ€™s needs.
"""

In [7]:
workflow_configs = client.initialize_workflow_configs(
    evaluation_description= evaluation_policy,
    workflow_name="TheChatGiggle",
    seed_data_path="hotel_seed_data.json",
    budget=2 # days
) 
workflow_configs

<Response [200]>


{'data_gen_config': {'task_description': "\n    Evaluation Policy for Customer Care Responses\n    - Responses should capture the user's attention and encourage further interaction.\n    - Aim for a friendly and approachable tone to foster a positive customer experience.\n    - Avoid jargon and overly technical language unless necessary for the context.\n    - Incorporate light-hearted humor where appropriate to create a pleasant interaction, while ensuring it remains suitable for all audiences.\n    - Humor should enhance the conversation without detracting from the core message.\n    - Personalize responses when possible to show genuine care for the customerâ€™s needs.\n",
  'no_samples': 5,
  'seed_path': 'dager/backend/uploads/hotel_seed_data.json'},
 'experiment_config': {'0': {'model_args': {'model_id': 'mistralai/Mistral-7B-v0.1',
    'quantized': True},
   'training_methods_args': {'r': 32,
    'lora_alpha': 32,
    'lora_dropout': 0.05,
    'target_modules': ['q_proj', 'k_proj

ðŸš€ Once you're happy with hyper-parameters, you can submit the training workflow. It will **automatically execute experiments, evaluate them, and pick the best model** based your customized evaluation system!

In [8]:
# Submit workflow for execution
running_workflow_status = client.submit_workflow(
    workflow_configs=workflow_configs
)
print(" Workflow running state:", running_workflow_status)

<Response [200]>
 Workflow running state: {'workflow_runnning_state_id': '1721076129'}


You can get the status of all your workflows, by running the following command:

- `runing_workflows`: shows the training workflows with `running` status.  
- `finished_workflows`: shows executed workflows

In [9]:
# Retrieve user's workflows
user_workflows = client.all_workflows()

print( f"Total finished workflows : {len(user_workflows['finished_workflows'])}")
print( f"Total running workflows : {len(user_workflows['running_workflows'])}")

user_workflows['running_workflows']

<Response [200]>
Total finished workflows : 4
Total running workflows : 1


[{'user_id': 'leeroo-mistral',
  'workflow_runnning_state_id': '1721076129',
  'workflow_name': 'TheChatGiggle',
  'workflow_start_timestamp': 1721076129.664791,
  'status': 'running'}]

If you need further details on the status of a specific workflow, you can run the following function:

- `status`: overal status of workflow
- `workflow_node_status`: status of all nodes
- `workflow_name`: name of your workflow
- `workflow_running_state_id`: id of your workflow

In [24]:
# Check status of the running workflow
workflow_status = client.get_workflow_status('1721076129')
workflow_status

<Response [200]>


{'user_id': 'leeroo-mistral',
 'workflow_runnning_state_id': '1721076129',
 'workflow_name': 'TheChatGiggle',
 'workflow_start_timestamp': 1721076129.664791,
 'status': True,
 'workflow_node_status': {'DataGenConfig-172107612957887knqr': 'Executed',
  'DataPrepConfig-172107612957891nkbk': 'Executed',
  'SFTrainingConfig-172107612957902iqis': 'Executed',
  'EvalResponseGenConfig-172107612957904hhle': 'Executed',
  'EvalConfig-172107612957912ljmi': 'Executed',
  'PickBestConfig-172107612957893eeim': 'Executed'},
 'workflow_completed_timestamp': 1721079157.899672}

In [25]:
# Deploy the workflow
workflow_id = '1721076129'
deployment_status = client.deploy_workflow(
    workflow_id
)
print(deployment_status)

<Response [200]>
{'cluster_name': 'DeploymentState-1721079764.328797', 'status': 'Deployment started'}


In [27]:
client.get_workflow_deployment_status('DeploymentState-1721079764.328797')

<Response [200]>


{'cluster_name': 'DeploymentState-1721079764.328797',
 'ip': '54.227.170.247',
 'gradio-playground': 'http://54.227.170.247:8000',
 'api-access': 'http://54.227.170.247:9000',
 'status': 'Deployed'}

In [45]:
# Get Model id
import requests
model_id = requests.get( "http://54.227.170.247:9000/v1/models").json()['data'][0]['id']
model_id

'checkpoint.hf'

In [48]:
# Inference
url = "http://54.227.170.247:9000/v1/chat/completions"
data = {
    "model": model_id,
    "messages": [{"role": "user", "content": "Is parking available at the hotel?"}],
    "max_tokens": 20,
    "temperature": 0.9
}
response = requests.post(url, json=data)
print(response.json()['choices'][0]['message'])

{'role': 'assistant', 'content': 'Yes, we offer parking at our hotel. However, we recommend that you check with the management regarding'}


In [3]:
client.kill_deployment(
    'DeploymentState-1721079764.328797'
)

<Response [200]>


{'cluster_name': 'DeploymentState-1721079764.328797',
 'status': 'Deployment Killed'}