![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# 使用 watsonx 和 `ibm/granite-13b-instruct-v1` 等基础模型从文本分析租车用户满意度

#### 声明

- 仅使用在watsonx可用的项目和空间。


## Notebook内容

这篇notebook包含步骤和代码，支持演示在watsonx中进行文本情感分析。介绍数据获取、模型测试和评分。
对于Python有一定的熟练度会有帮助。这篇notebook使用Pythong 3.10。


## 学习目标

这篇notebook的目标是演示如何使用`ibm/granite-13b-instruct-v1`模型从文本来分析用户满意度。


## 内容

这篇notebook包含下列部分：

- [配置](#setup)
- [导入数据](#data)
- [watsonx的基础模型](#models)
- [模型测试](#predict)
- [评分](#score)
- [总结](#summary)

<a id="setup"></a>
## 配置环境

在开始使用notebook的示例代码之前，需要完成下列配置任务：

-  关联一个 <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> 实例 (关于如何创建实例的信息 <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-service-instance.html?context=analytics" target="_blank" rel="noopener no referrer">here</a>).


### 安装并导入 `datasets` 和依赖包

In [1]:
%pip install wget | tail -n 1
%pip install datasets | tail -n 1
%pip install scikit-learn | tail -n 1
%pip install "ibm-watson-machine-learning>=1.0.326" | tail -n 1

Successfully installed wget-3.2
Note: you may need to restart the kernel to use updated packages.
Successfully installed datasets-2.14.6 dill-0.3.7 multiprocess-0.70.15 xxhash-3.4.1
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Successfully installed ibm-cos-sdk-2.13.2 ibm-cos-sdk-core-2.13.2 ibm-cos-sdk-s3transfer-2.13.2 ibm-watson-machine-learning-1.0.327 jmespath-1.0.1 lomond-0.3.3 pandas-1.5.3
Note: you may need to restart the kernel to use updated packages.


### 定义 WML credentials
这个单元定义使用watsonx基础模型推理所需要的WML credentials。

**操作:** 提供IBM Cloud用户API Key，详情参考
[documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).

In [2]:
import getpass

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}

Please enter your WML api key (hit enter):  ········


### 定义 project id
基础模型需要project id，用于给模型调用提供上下文信息。我们将从运行此笔记本的项目中获取id。否则，请另外提供project id。

In [3]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

Please enter your project_id (hit enter):  54428a2c-4b11-491e-a32b-12575fa676e2


<a id="data"></a>
## 数据导入

下载`car_rental_training_data` 数据集. 该数据集提供了有关客户对汽车租赁反馈的洞察。它包含一个标签内容：不满意、满意。

In [4]:
import wget
import pandas as pd

filename = 'car_rental_training_data.csv'
url = 'https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/data/cars-4-you/car_rental_training_data.csv'

if not os.path.isfile(filename): 
    wget.download(url, out=filename)

df = pd.read_csv("car_rental_training_data.csv", sep=';')
data = df[['Customer_Service', 'Satisfaction']]

100% [..........................................................] 79518 / 79518

查看下载的数据。

In [5]:
data.head()

Unnamed: 0,Customer_Service,Satisfaction
0,I thought the representative handled the initi...,0
1,I have had a few recent rentals that have take...,0
2,car cost more because I didn't pay when I rese...,0
3,I didn't get the car I was told would be avail...,0
4,If there was not a desired vehicle available t...,1


准备训练和测试数据集。

In [6]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.2)
comments = list(test.Customer_Service)
satisfaction = list(test.Satisfaction)

In [7]:
test.shape

(98, 2)

<a id="models"></a>
## `watsonx.ai`的基础模型

#### 列出可用的模型

所有可用的模型都显示在ModelTypes类别下。
更多信息可参考 [documentation](https://ibm.github.io/watson-machine-learning-sdk/foundation_models.html#ibm_watson_machine_learning.foundation_models.utils.enums.ModelTypes).

In [8]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

print([model.name for model in ModelTypes])

['FLAN_T5_XXL', 'FLAN_UL2', 'MT0_XXL', 'GPT_NEOX', 'MPT_7B_INSTRUCT2', 'STARCODER', 'LLAMA_2_70B_CHAT', 'GRANITE_13B_INSTRUCT', 'GRANITE_13B_CHAT']


需要定义用于模型推理的 `model_id`:

In [9]:
model_id = ModelTypes.GRANITE_13B_INSTRUCT

### 定义模型参数

你可能需要调整模型 `parameters` 用于不同的模型和任务, 请参考 [documentation](https://ibm.github.io/watson-machine-learning-sdk/foundation_models.html#metanames.GenTextParamsMetaNames).

In [10]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.MIN_NEW_TOKENS: 0,
    GenParams.MAX_NEW_TOKENS: 1,
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.REPETITION_PENALTY: 1
}

### 初始化模型
用之前定义的参数初始化 `Model` 类。

In [11]:
from ibm_watson_machine_learning.foundation_models import Model

model = Model(
    model_id=model_id, 
    params=parameters, 
    credentials=credentials,
    project_id=project_id)

### 模型细节

In [12]:
model.get_details()

{'model_id': 'ibm/granite-13b-instruct-v1',
 'label': 'granite-13b-instruct-v1',
 'provider': 'IBM',
 'source': 'IBM',
 'short_description': 'The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well-suited for generative tasks.',
 'long_description': 'Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.',
 'task_ids': ['question_answering',
  'summarization',
  'classification',
  'generation',
  'extraction'],
 'tasks': [{'id': 'question_answering', 'ratings': {'quality': 3}},
  {'id': 'summarization', 'ratings': {'quality': 2}},
  {'id': 'retrieval_augmented_generation', 'ratings': {'quality': 2}},
  {'id': 'classification', 'ratings': {'quality': 3}},
  {'id': 'generation'},
  {'id': 'extraction', 'ratings': {'quality': 2}}],
 'model_li

<a id="predict"></a>
## 分析满意度

#### 准备提示并生成文本

In [13]:
instruction = """Determine if the customer was satisfied with the experience based on the comment. Return simple yes or no.
Comment:The car was broken. They couldn't find a replacement. I've waster over 2 hours.
Satisfied:no"""

In [14]:
prompt1 = "\n".join([instruction, "Comment:" + comments[2], "Satisfied:"])
print(prompt1)

Determine if the customer was satisfied with the experience based on the comment. Return simple yes or no.
Comment:The car was broken. They couldn't find a replacement. I've waster over 2 hours.
Satisfied:no
Comment:We needed a minivan on very short notice to drive out of town to a funeral.  Enterprise staff worked hard to find us one, and they did. We smashed the car into a parking garage pole.  Since we had purchased the comprehensive insurance we didn't have to go through all of the B.S.
Satisfied:


分析测试数据的情感

In [15]:
print(model.generate_text(prompt=prompt1))

yes


### 计算准确率 (Accuracy)

In [16]:
sample_size = 10
prompts_batch = ["\n".join([instruction, "Comment:" + comment, "Satisfied:"]) for comment in comments[:sample_size]]
results = model.generate_text(prompt=prompts_batch)

In [17]:
print(prompts_batch[0] + results[0])

Determine if the customer was satisfied with the experience based on the comment. Return simple yes or no.
Comment:The car was broken. They couldn't find a replacement. I've waster over 2 hours.
Satisfied:no
Comment:they were fine.
Satisfied:no


In [18]:
from sklearn.metrics import accuracy_score

label_map = {0: "no", 1: "yes"}
y_true = [label_map[sat] for sat in satisfaction][:sample_size]

print('accuracy_score', accuracy_score(y_true, results))

accuracy_score 0.7


In [19]:
print('true', y_true, '\npred', results)

true ['yes', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'no'] 
pred ['no', 'yes', 'yes', 'no', 'no', 'no', 'yes', 'no', 'yes', 'no']


## 分析中文数据的用户满意度

#### 数据导入

In [37]:
data_cn = pd.read_csv("car_rental_training_data_CN.csv")
data_cn.head()

Unnamed: 0,ID,Gender,Status,Children,Age,Customer_Status,Car_Owner,Customer_Service,Customer_Service_CN,Satisfaction,Business_Area,Action
0,83,Female,M,2,48.85,Inactive,Yes,I thought the representative handled the initi...,我认为这位代表对最初情况的处理很糟糕。公司没有车了，当天也没有车进来。然后代表试图在另一家特...,0,Product: Availability/Variety/Size,Free Upgrade
1,1307,Female,M,0,55.0,Inactive,No,I have had a few recent rentals that have take...,我最近有几次出租花了很长时间，而且没有道歉。在最近的案例中，代理商随后在升级优惠券上向我提供...,0,Product: Availability/Variety/Size,Voucher
2,1737,Male,M,0,42.35,Inactive,Yes,car cost more because I didn't pay when I rese...,车费比较高，因为我预订的时候没有付钱,0,Product: Availability/Variety/Size,Free Upgrade
3,3721,Male,M,2,61.71,Inactive,Yes,I didn't get the car I was told would be avail...,我没有得到我被告知可以使用的汽车。我的报价中添加了一些隐藏费用。,0,Product: Availability/Variety/Size,Free Upgrade
4,11,Male,S,2,56.47,Active,No,If there was not a desired vehicle available t...,如果没有所需的可用车辆，销售代表会探索所有选项，包括竞争对手来协助寻找可用车辆。这种水平的服...,1,Product: Availability/Variety/Size,


In [38]:
train, test = train_test_split(data_cn, test_size=0.2)
comments = list(test.Customer_Service_CN)
satisfaction = list(test.Satisfaction)

#### 初始化模型

In [39]:
model_id = ModelTypes.LLAMA_2_70B_CHAT
model = Model(
    model_id=model_id, 
    params=parameters, 
    credentials=credentials,
    project_id=project_id)

#### 准备提示并生成文本

In [45]:
instruction = """基于给出的评论判断用户对这次体验是否满意。仅回答是或否。
输入: 这辆车坏了。他们不能找到替代方案。我浪费了两个小时。
输出: 否
输入: 客户服务非常有帮助。他们知道我正在探望我 91 岁的母亲，并免费升级了一辆对她来说更舒适的汽车。
输出: 是"""

In [46]:
prompt1 = "\n".join([instruction, "输入: " + comments[2], "输出: "])
print(prompt1)

基于给出的评论判断用户对这次体验是否满意。仅回答是或否。
输入: 这辆车坏了。他们不能找到替代方案。我浪费了两个小时。
输出: 否
输入: 客户服务非常有帮助。他们知道我正在探望我 91 岁的母亲，并免费升级了一辆对她来说更舒适的汽车。
输出: 是
输入: 我们花了将近三个小时才拿到车！这太荒谬了。
输出: 


In [47]:
print(model.generate_text(prompt=prompt1))

否


#### 计算准确率

In [48]:
sample_size = 10
prompts_batch = ["\n".join([instruction, "输入: " + comment, "输出: "]) for comment in comments[:10]]
results = model.generate_text(prompt=prompts_batch)

In [49]:
from sklearn.metrics import accuracy_score

label_map = {0: "否", 1: "是"}
y_true = [label_map[sat] for sat in satisfaction][:sample_size]

print('accuracy_score', accuracy_score(y_true, results))

accuracy_score 0.9


<a id="summary"></a>
## 总结和下一步

 你顺利完成了这篇notebook！
 
 你学到了如何通过watsonx.ai的基础大模型分析租车用户满意度。
 
 Check out our _[Online Documentation]()_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### 作者

**Mateusz Szewczyk**, Software Engineer at Watson Machine Learning.

**Lukasz Cmielowski**, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.

Copyright © 2023 IBM. This notebook and its source code are released under the terms of the MIT License.