基于规则的简单聊天机器人构建

第一课的代码非常的简单,

就是给大家展示一下rule-based的玩法,

以及几个角度的升级。

首先，我们看一个

目标和任务：打招呼


### 基本就是小学生级别的 问什么 答什么

In [1]:
import random

# 打招呼
greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']
# 回复打招呼
random_greeting = random.choice(greetings)

# 对于“你怎么样？”这个问题的回复
question = ['How are you?','How are you doing?']
# “我很好”
responses = ['Okay',"I'm fine"]
# 随机选一个回
random_response = random.choice(responses)

# 机器人跑起来
while True:
    userInput = input(">>> ")
    if userInput in greetings:
        print(random_greeting)
    elif userInput in question:
        print(random_response)
    # 除非你说“拜拜”
    elif userInput == 'bye':
        break
    else:
        print("I did not understand what you said")

>>> Hi
hi
>>> Hi
hi
>>> hello
hi
>>> How are you?
I'm fine
>>> How are you?
I'm fine
>>> How are you?
I'm fine
>>> What's your name?
I did not understand what you said
>>> bye


升级I:
显然 这样的rule太弱智了，我们需要更好一点的“精准对答”

比如

透过关键词来判断这句话的意图是什么（intents）

In [8]:
from nltk import word_tokenize
import random

# 打招呼
greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']
# 回复打招呼
random_greeting = random.choice(greetings)

# 对于“假期”的话题关键词
question = ['break','holiday','vacation','weekend']
# 回复假期话题
responses = ['It was nice! I went to Paris',"Sadly, I just stayed at home"]
# 随机选一个回
random_response = random.choice(responses)

# 机器人跑起来
while True:
    userInput = input('>>>')
    # 清理一下输入，看看都有哪些词
    cleaned_input = word_tokenize(userInput)
    # 这里，比较一下关键词，确定他属于哪个问题
    # isdisjoint: 是否无交集
    if not set(cleaned_input).isdisjoint(greetings):
        print(random_greeting)
    elif not set(cleaned_input).isdisjoint(question):
        print(random_response)
    elif userInput == 'bye':
        break
    else:
        print('I did not understand what you said')

>>>hello
hi
>>>how are you
I did not understand what you said
>>>exit
I did not understand what you said
>>>bye


In [12]:
word_tokenize('how are you? fine, thank you!')

['how', 'are', 'you', '?', 'fine', ',', 'thank', 'you', '!']

升级II：
光是会BB还是不行，得有知识体系！才能解决用户的问题。

我们可以用各种数据库，建立起一套体系，然后通过搜索的方式，来查找答案。

比如，最简单的就是Python自己的graph数据结构来搭建一个“地图”。

依据这个地图，我们可以清楚的找寻从一个地方到另一个地方的路径，

然后作为回答，反馈给用户。

In [13]:
# 建立一个基于目标行业的database
# 比如 这里我们用python自带的graph
graph = {'上海': ['苏州', '常州'],
         '苏州': ['常州', '镇江'],
         '常州': ['镇江'],
         '镇江': ['常州'],
         '盐城': ['南通'],
         '南通': ['常州']}

# 明确如何找到从A到B的路径
def find_path(start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    if start not in graph:
        return None
    for node in graph[start]:
        if node not in path:
            newpath = find_path(node, end, path)
            if newpath: 
                return newpath
    return None

In [14]:
print(find_path('上海', "镇江"))

['上海', '苏州', '常州', '镇江']
