![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# 使用 watsonx 和 `ibm/granite-13b-instruct-v1` 等基礎模型從文字分析租車使用者滿意度

#### 聲明

- 僅使用在watsonx可用的項目和空間。


## Notebook内容

該筆記本包含步驟和程式碼，支持在watsonx中進行文本情感分析。介紹數據獲取、模型測試和評分。
對於Python有一定的熟練度會有幫助。這篇notebook使用Pythong 3.10。


## 學習目標

這篇notebook的目標是示範如何使用`ibm/granite-13b-instruct-v1`模型從文字來分析使用者滿意度。


## 内容

這篇notebook包含下列部分：

- [配置](#setup)
- [導入數據](#data)
- [watsonx的基礎模型](#models)
- [模型測試](#predict)
- [評分](#score)
- [總結](#summary)

<a id="setup"></a>
## 配置環境

在開始使用notebook的範例程式碼之前，需要完成下列設定任務：

-  連結一個 <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> 實例 (關於如何建立實例的資訊 <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-service-instance.html?context=analytics" target="_blank" rel="noopener no referrer">here</a>).


### 安裝並匯入 `datasets` 和依賴套件

In [4]:
%pip install wget | tail -n 1
%pip install datasets | tail -n 1
%pip install scikit-learn | tail -n 1
%pip install "ibm-watson-machine-learning>=1.0.326" | tail -n 1


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Successfully installed wget-3.2
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Successfully installed datasets-2.14.6 dill-0.3.7 multiprocess-0.70.15 xxhash-3.4.1
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade p

### 定義 WML credentials
這個單元定義使用watsonx基礎模型推理所需的WML credentials。

**操作:** 提供IBM Cloud用戶API Key，詳情參考
[documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).

In [1]:
import getpass

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}

### 定義 project id
基礎模型需要project id，用於提供模型呼叫上下文資訊。我們將從運行此筆記本的項目中獲取id。否則，請另外提供project id。

In [2]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

<a id="data"></a>
## 數據導入

下載`car_rental_training_data` 資料集. 此資料集提供了有關客戶對汽車租賃回饋的洞察。該資料集包含一個標籤內容：不滿意、滿意。

In [5]:
import wget
import pandas as pd

filename = 'car_rental_training_data.csv'
url = 'https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/data/cars-4-you/car_rental_training_data.csv'

if not os.path.isfile(filename): 
    wget.download(url, out=filename)

df = pd.read_csv("car_rental_training_data.csv", sep=';')
data = df[['Customer_Service', 'Satisfaction']]

查看下載的數據。

In [6]:
data.head()

Unnamed: 0,Customer_Service,Satisfaction
0,I thought the representative handled the initi...,0
1,I have had a few recent rentals that have take...,0
2,car cost more because I didn't pay when I rese...,0
3,I didn't get the car I was told would be avail...,0
4,If there was not a desired vehicle available t...,1


準備訓練和測試資料集。

In [6]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.2)
comments = list(test.Customer_Service)
satisfaction = list(test.Satisfaction)

In [7]:
test.shape

(98, 2)

<a id="models"></a>
## `watsonx.ai`的基礎模型

#### 列出可用的模型

所有可用的模型都顯示在ModelTypes類別下。
更多資訊可參考 [documentation](https://ibm.github.io/watson-machine-learning-sdk/foundation_models.html#ibm_watson_machine_learning.foundation_models.utils.enums.ModelTypes).

In [8]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

print([model.name for model in ModelTypes])

['FLAN_T5_XXL', 'FLAN_UL2', 'MT0_XXL', 'GPT_NEOX', 'MPT_7B_INSTRUCT2', 'STARCODER', 'LLAMA_2_70B_CHAT', 'GRANITE_13B_INSTRUCT', 'GRANITE_13B_CHAT']


需要定義用於模型推理的 `model_id`:

In [9]:
model_id = ModelTypes.GRANITE_13B_INSTRUCT

### 定義模型參數

你可能需要調整模型 `parameters` 用於不同的模型和任務, 請參考 [documentation](https://ibm.github.io/watson-machine-learning-sdk/foundation_models.html#metanames.GenTextParamsMetaNames).

In [10]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.MIN_NEW_TOKENS: 0,
    GenParams.MAX_NEW_TOKENS: 1,
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.REPETITION_PENALTY: 1
}

### 初始化模型
用之前定義的參數初始化 `Model` 類別。

In [11]:
from ibm_watson_machine_learning.foundation_models import Model

model = Model(
    model_id=model_id, 
    params=parameters, 
    credentials=credentials,
    project_id=project_id)

### 模型細節

In [12]:
model.get_details()

{'model_id': 'ibm/granite-13b-instruct-v1',
 'label': 'granite-13b-instruct-v1',
 'provider': 'IBM',
 'source': 'IBM',
 'short_description': 'The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well-suited for generative tasks.',
 'long_description': 'Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.',
 'task_ids': ['question_answering',
  'summarization',
  'classification',
  'generation',
  'extraction'],
 'tasks': [{'id': 'question_answering', 'ratings': {'quality': 3}},
  {'id': 'summarization', 'ratings': {'quality': 2}},
  {'id': 'retrieval_augmented_generation', 'ratings': {'quality': 2}},
  {'id': 'classification', 'ratings': {'quality': 3}},
  {'id': 'generation'},
  {'id': 'extraction', 'ratings': {'quality': 2}}],
 'model_li

<a id="predict"></a>
## 分析滿意度

#### 準備提示並產生文本

In [13]:
instruction = """Determine if the customer was satisfied with the experience based on the comment. Return simple yes or no.
Comment:The car was broken. They couldn't find a replacement. I've waster over 2 hours.
Satisfied:no"""

In [14]:
prompt1 = "\n".join([instruction, "Comment:" + comments[2], "Satisfied:"])
print(prompt1)

Determine if the customer was satisfied with the experience based on the comment. Return simple yes or no.
Comment:The car was broken. They couldn't find a replacement. I've waster over 2 hours.
Satisfied:no
Comment:We needed a minivan on very short notice to drive out of town to a funeral.  Enterprise staff worked hard to find us one, and they did. We smashed the car into a parking garage pole.  Since we had purchased the comprehensive insurance we didn't have to go through all of the B.S.
Satisfied:


分析測試數據的情感

In [15]:
print(model.generate_text(prompt=prompt1))

yes


### 計算準確率 (Accuracy)

In [16]:
sample_size = 10
prompts_batch = ["\n".join([instruction, "Comment:" + comment, "Satisfied:"]) for comment in comments[:sample_size]]
results = model.generate_text(prompt=prompts_batch)

In [17]:
print(prompts_batch[0] + results[0])

Determine if the customer was satisfied with the experience based on the comment. Return simple yes or no.
Comment:The car was broken. They couldn't find a replacement. I've waster over 2 hours.
Satisfied:no
Comment:they were fine.
Satisfied:no


In [18]:
from sklearn.metrics import accuracy_score

label_map = {0: "no", 1: "yes"}
y_true = [label_map[sat] for sat in satisfaction][:sample_size]

print('accuracy_score', accuracy_score(y_true, results))

accuracy_score 0.7


In [19]:
print('true', y_true, '\npred', results)

true ['yes', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'no'] 
pred ['no', 'yes', 'yes', 'no', 'no', 'no', 'yes', 'no', 'yes', 'no']


## 分析中文數據的使用者滿意度

#### 數據導入

In [8]:
data_tw = pd.read_csv("car_rental_training_data_TW.csv")
data_tw.head()

Unnamed: 0,ID,Gender,Status,Children,Age,Customer_Status,Car_Owner,Customer_Service,Customer_Service_TW,Satisfaction,Business_Area,Action
0,83,Female,M,2,48.85,Inactive,Yes,I thought the representative handled the initi...,我認為這位代表對最初情況的處理很糟糕。 公司沒有車了，當天也沒有車進來。 然後代表試圖在另一...,0,Product: Availability/Variety/Size,Free Upgrade
1,1307,Female,M,0,55.0,Inactive,No,I have had a few recent rentals that have take...,我最近有幾次出租花了很長時間，而且沒有道歉。 在最近的案例中，代理商隨後在升級優惠券上向我提...,0,Product: Availability/Variety/Size,Voucher
2,1737,Male,M,0,42.35,Inactive,Yes,car cost more because I didn't pay when I rese...,車費比較高，因為我預訂的時候沒有付錢,0,Product: Availability/Variety/Size,Free Upgrade
3,3721,Male,M,2,61.71,Inactive,Yes,I didn't get the car I was told would be avail...,我沒有得到我被告知可以使用的汽車。 我的報價中添加了一些隱藏費用。,0,Product: Availability/Variety/Size,Free Upgrade
4,11,Male,S,2,56.47,Active,No,If there was not a desired vehicle available t...,如果沒有所需的可用車輛，銷售代表會探索所有選項，包括競爭對手來協助尋找可用車輛。 這種水平的...,1,Product: Availability/Variety/Size,


In [38]:
train, test = train_test_split(data_tw, test_size=0.2)
comments = list(test.Customer_Service_TW)
satisfaction = list(test.Satisfaction)

#### 初始化模型

In [39]:
model_id = ModelTypes.LLAMA_2_70B_CHAT
model = Model(
    model_id=model_id, 
    params=parameters, 
    credentials=credentials,
    project_id=project_id)

#### 準備提示並產生文本

In [None]:
instruction = """基於給予的評論判斷使用者對這次體驗是否滿意。僅回答是或否。
輸入: 這輛車壞了。他們不能找到替代方案。我浪費了兩個小時。
輸出: 否
輸入: 客戶服務非常有幫助。他們知道我正在探望我 91 歲的母親，並免費升級了一輛對她來說更舒適的汽車。
輸出: 是"""

In [46]:
prompt1 = "\n".join([instruction, "输入: " + comments[2], "输出: "])
print(prompt1)

基于给出的评论判断用户对这次体验是否满意。仅回答是或否。
输入: 这辆车坏了。他们不能找到替代方案。我浪费了两个小时。
输出: 否
输入: 客户服务非常有帮助。他们知道我正在探望我 91 岁的母亲，并免费升级了一辆对她来说更舒适的汽车。
输出: 是
输入: 我们花了将近三个小时才拿到车！这太荒谬了。
输出: 


In [47]:
print(model.generate_text(prompt=prompt1))

否


#### 計算準確率

In [48]:
sample_size = 10
prompts_batch = ["\n".join([instruction, "输入: " + comment, "输出: "]) for comment in comments[:10]]
results = model.generate_text(prompt=prompts_batch)

In [49]:
from sklearn.metrics import accuracy_score

label_map = {0: "否", 1: "是"}
y_true = [label_map[sat] for sat in satisfaction][:sample_size]

print('accuracy_score', accuracy_score(y_true, results))

accuracy_score 0.9


<a id="summary"></a>
## 總結和下一步

 你順利完成了這篇notebook！
 
 你學到如何透過watsonx.ai的基礎大模型分析租車用戶滿意度。
 
 Check out our _[Online Documentation]()_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### 作者
**Bucky Lee**, AI Engineer

**Mateusz Szewczyk**, Software Engineer at Watson Machine Learning.

**Lukasz Cmielowski**, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.

Copyright © 2023 IBM. This notebook and its source code are released under the terms of the MIT License.