##  基于模式匹配的对话机器人实现

### Pattern Match

机器能否实现对话，这个长久以来是衡量机器人是否具有智能的一个重要标志。 Alan Turing早在其文中就提出过一个测试机器智能程度的方法，该方法主要是考察人类是否能够通过对话内容区分对方是机器人还是真正的人类，如果人类无法区分，我们就称之为具有”智能“。而这个测试，后来被大家叫做”图灵测试“，之后也被翻拍成了一步著名电影，叫做《模拟游戏》。 



既然图灵当年以此作为机器是否具备智能的标志，这项任务肯定是复杂的。自从 1960s 开始，诸多科学家就希望从各个方面来解决这个问题，直到如今，都只能解决一部分问题。 目前对话机器人的建立方法有很多，今天的作业中，我们为大家提供一共快速的基于模板的对话机器人配置方式。

此次作业首先希望大家能够读懂这段程序的代码，其次，在此基于我们提供的代码，**能够把它改造成汉语版本，实现对话效果。** 

为了实现模板的判断和定义，我们需要定义一个特殊的符号类型，这个符号类型就叫做"variable"， 这个"variable"用来表示是一个占位符。例如，定义一个目标: "I want X"， 我们可以表示成 "I want ?X", 意思就是?X是一个用来占位的符号。

如果输入了"I want holiday"， 在这里 'holiday' 就是 '?X'

In [1]:
def is_variable(pat):
    return pat.startswith('?') and all(s.isalpha() for s in pat[1:])
    

In [2]:
is_variable('?ddd?')

False

In [3]:
def pat_match(pat,saying):
    if is_variable(pat[0]):
        return True
    elif pat[0]!=saying[0]:
        return False
    else:
        return pat_match(pat[1:],saying[1:])

In [4]:
pat_match('I come from ?x'.split(),'I come from shanghai'.split())

True

In [5]:
pat_match('I comed from ?x'.split(),'I is comed from is shanghai'.split())

False

### 获得匹配的变量

以上的函数能够判断两个 pattern 是不是相符，但是我们更加希望的是获得每个variable对应的是什么值。

我们对程序做如下修改:

In [6]:
def pat_match(pat,saying):
    if is_variable(pat[0]):
        return pat[0],saying[0]
    elif pat[0]!=saying[0]:
        return False
    else:
        return pat_match(pat[1:],saying[1:])

In [7]:
pat_match('I come from ?X'.split(),'I come from shanghai'.split())

('?X', 'shanghai')

In [8]:
pat_match('?x equal ?x'.split(),'2+2 equal 4'.split())

('?x', '2+2')

但是，如果我们的 Pattern 中具备两个变量，那么以上程序就不能解决了，我们可以对程序做如下修改:

In [9]:
def pat_match(pat,saying):
    if not pat or not saying :
        return []
    elif is_variable(pat[0]):
        return [(pat[0],saying[0])]+pat_match(pat[1:],saying[1:])
    elif pat[0] !=saying[0]:
        return []
    else:
        return pat_match(pat[1:],saying[1:])


In [10]:
pat_match('?x equal ?x'.split(),'2+2 equal 4'.split())

[('?x', '2+2'), ('?x', '4')]

如果我们知道了每个变量对应的是什么，那么我们就可以很方便的使用我们定义好的模板进行替换：

为了方便接下来的替换工作，我们新建立两个函数，一个是把我们解析出来的结果变成一个 dictionary，一个是依据这个 dictionary 依照我们的定义的方式进行替换。

In [11]:
def pat_to_dict(pat):
    return {k:v for k,v in pat}

In [12]:
pat_to_dict(pat_match('?x equal ?x'.split(),'2+2 equal 4'.split()))

{'?x': '4'}

In [13]:
def subsitite(rule,parse_rule):
    if not rule :
        return []
    else:
        return[parse_rule.get(rule[0],rule[0])]+subsitite(rule[1:],parse_rule)
    # 字典的get方法当只有一个参数时 是返回相应的键对应的值 
    # 有两个参数时 第一个参数代表 要查找的键值 第二个参数 代表当所查找的键不存在时 默认的返回值

In [14]:
dicts={'zhang':10,
     'li':20}

In [15]:
he=[dicts.get('wo','wo')]
he

['wo']

In [16]:
dicts.get('zhang','zhang')

10

In [17]:
subsitite('I come from ?x , so you come from ?y'.split(),pat_to_dict(pat_match(
    'I borned ?x , and my girlfriend borned ?y'.split(),'I borned anhui , and my girlfriend borned bozhou '.split())))

['I', 'come', 'from', 'anhui', ',', 'so', 'you', 'come', 'from', 'bozhou']

In [18]:
pat_to_dict(pat_match(
    'I borned ?x ,and my girlfriend borned ?x'.split(),'I borned anhui ,and my girlfriend borned bozhou '.split()))

{'?x': 'bozhou'}

In [19]:
pat_lsit=pat_match(
    "I borned ?x ,and my girlfriend borned ?y".split(),"I borned anhui ,and my girlfriend borned bozhou ".split()
)
pat_lsit

[('?x', 'anhui'), ('?y', 'bozhou')]

In [20]:
pat_to_dict(pat_lsit)

{'?x': 'anhui', '?y': 'bozhou'}

In [21]:
# def pat_to_dict(pat):
#     return {k:v for k,v in pat}
def pat_to_dict(patterns):
    return {k: v for k, v in patterns}

In [22]:
pat_to_dict(pat_lsit)

{'?x': 'anhui', '?y': 'bozhou'}

In [23]:
pat_to_dict(pat_match("?X greater than ?Y".split(), "3 greater than 2".split()))

{'?X': '3', '?Y': '2'}

In [24]:
pat_match("?X greater than ?Y".split(), "3 greater than 2".split())

[('?X', '3'), ('?Y', '2')]

In [25]:
' '.join(subsitite('I ?x people ,girlfriend ?y people'.split(),parse_rule=pat_to_dict(pat_lsit)))

'I anhui people ,girlfriend bozhou people'

In [26]:
pat_to_dict(pat_lsit)

{'?x': 'anhui', '?y': 'bozhou'}

In [27]:
get_pattern={
    'I need ?x':['Image you will get ?x soon','why do you need ?x ?'],
    'My ?x told me something':['Talk about more about your ?x','how do you think about your ?x']
}
    

In [28]:
import random
def get_response(saying,rule):
    """
    >>>please implemment the code to get response as followings:
    >>>get_response('I need ihpone')
    >>>Image you will get ?x soon
    """
    for key in get_pattern:
        result=pat_match(key.split(),saying.split())
        if result is not None:
            break
    return ' '.join(subsitite(random.choice(get_pattern[key]).split(),parse_rule=pat_to_dict(result)))
                

In [29]:
saying='I need iphone'
for key in get_pattern:
    #print(key)
    result=pat_match(key.split(),saying.split())
    #print(result)
    if result is  not None:
        break
    
print(result)
print(key)
print(pat_to_dict(result))
#print(random.choice(get_pattern[key]))
' '.join(subsitite(random.choice(get_pattern[key]).split(),parse_rule=pat_to_dict(result)))

[('?x', 'iphone')]
I need ?x
{'?x': 'iphone'}


'why do you need iphone ?'

In [30]:
result=get_response('I need iphone',rule=None)
result

'Image you will get iphone soon'

### Segment Match

我们上边的这种形式，能够进行一些初级的对话了，但是我们的模式逐字逐句匹配的， "I need iPhone" 和 "I need ?X" 可以匹配，但是"I need an iPhone" 和 "I need ?X" 就不匹配了，那怎么办？ 

为了解决这个问题，我们可以新建一个变量类型 "?\*X", 这种类型多了一个星号(\*),表示匹配多个

首先，和前文类似，我们需要定义一个判断是不是匹配多个的variable

In [31]:
def is_pattern_segment(pattern):
    return pattern.startswith('?*') and all(a.isalpha() for a in pattern[2:])

In [32]:
is_pattern_segment('?*dddddd?')

False

In [33]:
from collections import defaultdict

然后我们把之前的 pat_match程序改写成如下， 主要是增加了 is_pattern_segment的部分.

In [34]:
def segment_match(pattern,saying):
    seg_pat,rest=pattern[0],pattern[1:]
    seg_pat=seg_pat.replace('?*','?')
    
    if not rest :
        return (seg_pat,saying),len(saying)
    for i,token in enumerate(saying):
        if rest[0]==token and is_match(rest[1:],saying[(i+1):]):
            return (seg_pat,saying[:i]),i
    return (seg_pat,saying),0

def is_match(rest,saying):
    if not rest and not saying:
        return True
    if not all(a.isalpha() for a in rest[0] ):
        return True
    if rest[0] != saying[0]:
        return False
    return is_match(rest[1:],saying[1:])
            

In [68]:
fail=False
def pat_match_with_seg(pattern,saying):
    if not pattern or not saying:
        return []
    
    pat=pattern[0]
    if is_variable(pat):
        return [(pat,saying[0])]+pat_match_with_seg(pattern[1:],saying[1:])
    elif is_pattern_segment(pat):
        match,index=segment_match(pattern,saying)
        if index != 0:
            return [match] + pat_match_with_seg(pattern[1:], saying[index:])
        else:
            return fail
    elif pat==saying[0]:
        return pat_match_with_seg(pattern[1:],saying[1:])
    else :
        return fail


这段程序里比较重要的一个新函数是 segment_match，这个函数输入是一个以 segment_pattern开头的模式，尽最大可能进行，匹配到这个边长的变量对于的部分。

In [69]:
segment_match('?*P is very good'.split(), "My dog and my cat is very good".split())

(('?P', ['My', 'dog', 'and', 'my', 'cat']), 5)

In [70]:
pat_match_with_seg('?*P is very good'.split(), "My dog and my cat is very good".split())

[('?P', ['My', 'dog', 'and', 'my', 'cat'])]

In [71]:
pat_match_with_seg('?*P is very good and ?*X'.split(), "My dog is very good and my cat is very cute".split())

[('?P', ['My', 'dog']), ('?X', ['my', 'cat', 'is', 'very', 'cute'])]

In [72]:
 rule_pattern={'?*x hello ?*y': ['How do you do', 'Please state your problem'],
    '?*x I want ?*y': ['what would it mean if you got ?y', 'Why do you want ?y', 'Suppose you got ?y soon'],
   "I was ?*X": ["Were you really ?X ?", "I already knew you were ?X ."]}

In [73]:
def pat_to_dict1(pattern):
    return {k: ' '.join (v) if isinstance(v,list) else v  for k,v in pattern}

In [74]:
pat_to_dict1(pat_match_with_seg('?*P is very good and ?*X'.split(), "My dog is very good and my cat is very cute".split()))

{'?P': 'My dog', '?X': 'my cat is very cute'}

In [75]:
' '.join(subsitite(rule='?P is very cute , ?X  but I dislike it!'.split(),
                  parse_rule=pat_to_dict1(
                      pat_match_with_seg('?*P is very good and ?*X'.split(),
                                         "My dog is very good and my cat is very cute".split()))))

'My dog is very cute , my cat is very cute but I dislike it!'

In [76]:
segment_match('?*x I want ?*y','xiao ming I wants oppo')

(('?', 'xiao ming I wants oppo'), 0)

In [77]:
' '.join(subsitite(rule='?x hello ?y'.split(),
                  parse_rule=pat_to_dict1(
                      pat_match_with_seg('?*x is very good and ?*y'.split(),
                                         "My dog is very good and my cat is very cute".split()))))

'My dog hello my cat is very cute'

In [78]:
pat_match_with_seg('?*x is very good and ?*y'.split(),
                                         "My dog is very good and my cat is very cute".split())

[('?x', ['My', 'dog']), ('?y', ['my', 'cat', 'is', 'very', 'cute'])]

In [79]:
pat_match_with_seg('?*x I want ?*y'.split(),'Junh I want apple'.split())

[('?x', ['Junh']), ('?y', ['apple'])]

In [83]:
pat_match_with_seg('?*x I want ?*y'.split(),'xiao ming I wants oppo'.split())

False

In [84]:
rules = {
    "?*X hello ?*Y": ["Hi, how do you do?"],
    "I was ?*X": ["Were you really ?X ?", "I already knew you were ?X ."],
    '?*x I want ?*y': ['what would it mean if you got ?y', 'Why do you want ?y', 'Suppose you got ?y soon']
}

In [88]:
get_response1('I am zhangbin!',rules)

'对不起，我暂时不能理解您的指令'

In [89]:
def get_response1(saying, response_rules):
    for k in response_rules.keys():
        join_pat = pat_match_with_seg(k.split(), saying.split())
        #print(join_pat)
        if not join_pat: 
            #print('下一个')
            continue
        return ' '.join(subsitite(random.choice(response_rules[k]).split(), pat_to_dict1(join_pat)))
    return "对不起，我暂时不能理解您的指令"

In [90]:
get_response1('I was xiao ming oppo',rules)

'I already knew you were xiao ming oppo .'

In [61]:
import jieba

In [62]:
# def cut(string):
#     return list(jieba.cut(string))


In [94]:
def get_response1(saying,rules):
    for k,v in rules.items():
        join_pat=pat_match_with_seg(cut(k),cut(saying))
        print(join_pat)
        if not join_pat:
            continue
        return ' '.join(subsitite(cut(random.choice(v)),pat_to_dict1(join_pat)))
    return "对不起，我暂时不能理解您的对话"

In [95]:
rule_responses = {
    '?*x你好?*y': ['你好呀', '请告诉我你的问题'],
    '?*x我想?*y': ['你觉得?y有什么意义呢？', '为什么你想?y', '你可以想想你很快就可以?y了'],
    '?*x我想要?*y': ['?x想问你，你觉得?y有什么意义呢?', '为什么你想?y', '?x觉得... 你可以想想你很快就可以有?y了', '你看?x像?y不', '我看你就像?y'],
    '?*x喜欢?*y': ['喜欢?y的哪里？', '?y有什么好的呢？', '你想要?y吗？'],
    '?*x讨厌?*y': ['?y怎么会那么讨厌呢?', '讨厌?y的哪里？', '?y有什么不好呢？', '你不想要?y吗？'],
    '?*xAI?*y': ['你为什么要提AI的事情？', '你为什么觉得AI要解决你的问题？'],
    '?*x机器人?*y': ['你为什么要提机器人的事情？', '你为什么觉得机器人要解决你的问题？'],
    '?*x对不起?*y': ['不用道歉', '你为什么觉得你需要道歉呢?'],
    '?*x我记得?*y': ['你经常会想起这个吗？', '除了?y你还会想起什么吗？', '你为什么和我提起?y'],
    '?*x如果?*y': ['你真的觉得?y会发生吗？', '你希望?y吗?', '真的吗？如果?y的话', '关于?y你怎么想？'],
    '?*x我?*z梦见?*y':['真的吗? --- ?y', '你在醒着的时候，以前想象过?y吗？', '你以前梦见过?y吗'],
    '?*x妈妈?*y': ['你家里除了?y还有谁?', '嗯嗯，多说一点和你家里有关系的', '她对你影响很大吗？'],
    '?*x爸爸?*y': ['你家里除了?y还有谁?', '嗯嗯，多说一点和你家里有关系的', '他对你影响很大吗？', '每当你想起你爸爸的时候， 你还会想起其他的吗?'],
    '?*x我愿意?*y': ['我可以帮你?y吗？', '你可以解释一下，为什么想?y'],
    '?*x我很难过，因为?*y': ['我听到你这么说， 也很难过', '?y不应该让你这么难过的'],
    '?*x难过?*y': ['我听到你这么说， 也很难过',
                 '不应该让你这么难过的，你觉得你拥有什么，就会不难过?',
                 '你觉得事情变成什么样，你就不难过了?'],
    '?*x就像?*y': ['你觉得?x和?y有什么相似性？', '?x和?y真的有关系吗？', '怎么说？'],
    '?*x和?*y都?*z': ['你觉得?z有什么问题吗?', '?z会对你有什么影响呢?'],
    '?*x和?*y一样?*z': ['你觉得?z有什么问题吗?', '?z会对你有什么影响呢?'],
    '?*x我是?*y': ['真的吗？', '?x想告诉你，或许我早就知道你是?y', '你为什么现在才告诉我你是?y'],
    '?*x我是?*y吗': ['如果你是?y会怎么样呢？', '你觉得你是?y吗', '如果你是?y，那一位着什么?'],
    '?*x你是?*y吗':  ['你为什么会对我是不是?y感兴趣?', '那你希望我是?y吗', '你要是喜欢， 我就会是?y'],
    '?*x你是?*y' : ['为什么你觉得我是?y'],
    '?*x因为?*y' : ['?y是真正的原因吗？', '你觉得会有其他原因吗?'],
    '?*x我不能?*y': ['你或许现在就能?*y', '如果你能?*y,会怎样呢？'],
    '?*x我觉得?*y': ['你经常这样感觉吗？', '除了到这个，你还有什么其他的感觉吗？'],
    '?*x我?*y你?*z': ['其实很有可能我们互相?y'],
    '?*x你为什么不?*y': ['你自己为什么不?y', '你觉得我不会?y', '等我心情好了，我就?y'],
    '?*x好的?*y': ['好的', '你是一个很正能量的人'],
    '?*x嗯嗯?*y': ['好的', '你是一个很正能量的人'],
    '?*x不嘛?*y': ['为什么不？', '你有一点负能量', '你说 不，是想表达不想的意思吗？'],
    '?*x不要?*y': ['为什么不？', '你有一点负能量', '你说 不，是想表达不想的意思吗？'],
    '?*x有些人?*y': ['具体是哪些人呢?'],
    '?*x有的人?*y': ['具体是哪些人呢?'],
    '?*x某些人?*y': ['具体是哪些人呢?'],
    '?*x每个人?*y': ['我确定不是人人都是', '你能想到一点特殊情况吗？', '例如谁？', '你看到的其实只是一小部分人'],
    '?*x所有人?*y': ['我确定不是人人都是', '你能想到一点特殊情况吗？', '例如谁？', '你看到的其实只是一小部分人'],
    '?*x总是?*y': ['你能想到一些其他情况吗?', '例如什么时候?', '你具体是说哪一次？', '真的---总是吗？'],
    '?*x一直?*y': ['你能想到一些其他情况吗?', '例如什么时候?', '你具体是说哪一次？', '真的---总是吗？'],
    '?*x或许?*y': ['你看起来不太确定'],
    '?*x可能?*y': ['你看起来不太确定'],
    '?*x他们是?*y吗？': ['你觉得他们可能不是?y？'],
    '?*x': ['很有趣', '请继续', '我不太确定我很理解你说的, 能稍微详细解释一下吗?']
}


In [100]:
get_response1('我可能你',rules=rule_responses)

False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False


IndexError: list index out of range

In [91]:
def cut(string):
    str_list = []
    a_list = list(jieba.cut(string))
    for i in range(a_list.count(' ')):
        a_list.remove(' ')
#     for i,v in enumerate(a_list):
    i = 0
    try:
        while(a_list[i]):
            try:
                if a_list[i] == '?' and a_list[i+1] == '*':
                    str_list.append(''.join(a_list[i:i+3]))
                    i += 3
                elif a_list[i] == '?' and 'x' <= a_list[i+1] <= 'z':
                    str_list.append(''.join(a_list[i:i+2]))
                    i += 2
                else:
                    str_list.append(a_list[i])
                    i += 1
            except IndexError:
                str_list.append(a_list[i])
                break
    except IndexError:
        pass

    return str_list

In [101]:
fail = [True, None]

def pat_match_with_seg1(pattern, saying):
    if pattern and not saying: return fail
    elif not saying: return []
     
    pat = pattern[0]
    
    if is_variable(pat):
        return [(pat, saying[0])] + pat_match_with_seg(pattern[1:], saying[1:])
    elif pat == saying[0]:
        return pat_match_with_seg(pattern[1:], saying[1:])
    elif is_pattern_segment(pat):
        match, index = segment_match(pattern, saying)
        return [match] + pat_match_with_seg(pattern[1:], saying[index:])
    else:
        return fail

In [102]:
def cut_jieba(string):
    return list(jieba.cut(string))

In [103]:
def cut_jieba(string):
    return list(jieba.cut(string))

In [104]:
def cut_jieba_pat(string):
    tmp =[]
    string_list = string.split()
    for i in string_list:
        if i.startswith('?'):
            tmp.append(i)
        else:
            tmp += cut_jieba(i)
    return tmp

In [105]:
import re
def replace_pat_string(string):
    print(re.sub(r'(?P<n1>\?\S)', ' \g<n1> ', string))
    return re.sub(r'(?P<n1>\?\S)', ' \g<n1> ', string).split()

In [108]:
def get_ZH_response(saying, rules):
    for i in rules.keys():
#         print(pat_match_with_seg(cut_jieba_pat(i), cut_jieba(saying)))
        if not pat_match_with_seg1(cut_jieba_pat(i), list(jieba.cut(saying)))[-1]:
            continue
        return ''.join(subsitite(replace_pat_string(random.choice(rules[i])), pat_to_dict1(pat_match_with_seg1(i.split(), list(jieba.cut(saying))))))

In [109]:
get_ZH_response('就喜欢小贝', rule_responses)

很有趣


'很有趣'