# 基于模式匹配的对话机器人实现

# Implementation of dialog robot based on pattern matching

----

## Pattern Match

In [None]:
机器能否实现对话，这个长久以来是衡量机器人是否具有智能的一个重要标志。 Alan Turing早在其文中就提出过一个测试机器智能程度的方法，该方法主要是考察人类是否能够通过对话内容区分对方是机器人还是真正的人类，如果人类无法区分，我们就称之为具有”智能“。而这个测试，后来被大家叫做”图灵测试“，之后也被翻拍成了一步著名电影，叫做《模拟游戏》。

既然图灵当年以此作为机器是否具备智能的标志，这项任务肯定是复杂的。自从 1960s 开始，诸多科学家就希望从各个方面来解决这个问题，直到如今，都只能解决一部分问题。 目前对话机器人的建立方法有很多，今天的作业中，我们为大家提供一共快速的基于模板的对话机器人配置方式。

Pattern: (我想要A)<br>
Response: (如果你有 A，对你意味着什么呢？)

Input: (我想要度假)<br>
Response: (如果你有度假，对你意味着什么呢？)

为了实现模板的判断和定义，我们需要定义一个特殊的符号类型，这个符号类型就叫做"variable"， 这个"variable"用来表示是一个占位符。例如，定义一个目标: "I want X"， 我们可以表示成 "I want ?X", 意思就是?X是一个用来占位的符号。

如果输入了"I want holiday"， 在这里 'holiday' 就是 '?X'

def is_variable(pat):
    return pat.startswith('?') and all(s.isalpha() for s in pat[1:])

In [2]:
eg = '?holiday'

In [3]:
is_variable(eg)

True

In [4]:
def pat_match(pattern, saying):
    if is_variable(pattern[0]): return True
    else:
        if pattern[0] != saying[0]: return False
        else:
            return pat_match(pattern[1:], saying[1:])

In [5]:
is_variable(eg[0])

True

In [6]:
saying = 'I want ?holiday'
pattern = '?'

In [7]:
pat_match(pattern, saying)

True

### 例如 For Example

In [8]:
pat_match('I want ?X'.split(), "I want holiday".split())

True

In [9]:
'I want ?X'.split()

['I', 'want', '?X']

In [10]:
"I want holiday".split()

['I', 'want', 'holiday']

In [11]:
pat_match('I have dreamed a ?X'.split(), "I dreamed about dog".split())

False

In [12]:
pat_match('I dreamed about ?X'.split(), "I dreamed about dog".split())

True

## 获取匹配的变量 Get the matching variable

以上的函数能够判断两个 pattern 是不是相符，但是我们更加希望的是获得每个variable对应的是什么值。

我们对程序做如下修改:

In [13]:
def pat_match(pattern, saying):
    if is_variable(pattern[0]):
        return pattern[0], saying[0]
    else:
        if pattern[0] != saying[0]: return False
        else:
            return pat_match(pattern[1:], saying[1:])

In [14]:
pattern = 'I want ?X'.split()
saying = "I want holiday".split()

In [15]:
pat_match(pattern, saying)

('?X', 'holiday')

In [16]:
pat_match("?X equals ?X".split(), "2+2 equals 2+2".split())

('?X', '2+2')

但是，如果我们的 Pattern 中具备两个变量，那么以上程序就不能解决了，我们可以对程序做如下修改:

In [17]:
def pat_match(pattern, saying):
    if not pattern or not saying: return []
    
    if is_variable(pattern[0]):
        return [(pattern[0], saying[0])] + pat_match(pattern[1:], saying[1:])
    else:
        if pattern[0] != saying[0]: return []
        else:
            return pat_match(pattern[1:], saying[1:])

于是，我们可以获得：

In [18]:
pat_match("?X greater than ?Y".split(), "3 greater than 2".split())

[('?X', '3'), ('?Y', '2')]

***Detail Analysis***

In [19]:
pattern = "?X greater than ?Y".split()
pattern

['?X', 'greater', 'than', '?Y']

In [20]:
saying = "3 greater than 2".split()
saying

['3', 'greater', 'than', '2']

In [21]:
pattern[0]

'?X'

In [22]:
saying[0]

'3'

In [23]:
pattern[1]

'greater'

In [24]:
saying[1]

'greater'

In [25]:
pat_match('I have dreamed a ?X'.split(), "I dreamed about dog".split())

[]

In [26]:
pattern = 'I have dreamed a ?X'.split()
pattern

['I', 'have', 'dreamed', 'a', '?X']

In [27]:
saying = "I dreamed about dog".split()
saying

['I', 'dreamed', 'about', 'dog']

In [28]:
pattern[1]

'have'

In [29]:
saying[1]

'dreamed'

如果我们知道了每个变量对应的是什么，那么我们就可以很方便的使用我们定义好的模板进行替换。

为了方便接下来的替换工作，我们新建立两个函数，一个是把我们解析出来的结果变成一个 dictionary，一个是依据这个 dictionary 依照我们的定义的方式进行替换。

In [30]:
def pat_to_dict(patterns):
    return {k: v for k, v in patterns}

***Detail Analysis***

In [31]:
p = [(1,2), (3,4), (5,6)]

In [32]:
pat_to_dict(p)

{1: 2, 3: 4, 5: 6}

In [33]:
for k, v in p:
    print(k)
    print(v)
    print('----')

1
2
----
3
4
----
5
6
----


In [34]:
def substitute(rule, parsed_rules):
    if not rule: return []
    
    return [parsed_rules.get(rule[0], rule[0])] + substitute(rule[1:], parsed_rules)

In [35]:
got_patterns = pat_match("I want ?X".split(), "I want iPhone".split())
got_patterns

[('?X', 'iPhone')]

In [36]:
substitute("What if you mean if you got a ?X".split(), pat_to_dict(got_patterns))

['What', 'if', 'you', 'mean', 'if', 'you', 'got', 'a', 'iPhone']

***Detail Analysis***

In [37]:
rule = "What if you mean if you got a ?X".split()
rule

['What', 'if', 'you', 'mean', 'if', 'you', 'got', 'a', '?X']

In [38]:
rule[0]

'What'

In [39]:
parsed_rules = pat_to_dict(got_patterns)
parsed_rules

{'?X': 'iPhone'}

In [40]:
[parsed_rules.get(rule[0])]

[None]

In [41]:
[parsed_rules.get(rule[0], rule[0])]

['What']

为了将以上输出变成一句话，也很简单，我们使用 Python 的 join 方法即可：

In [42]:
john_pat = pat_match('?P needs ?X'.split(), "John needs resting".split())

In [43]:
' '.join(substitute("What if you mean if you got a ?X".split(), pat_to_dict(got_patterns)))

'What if you mean if you got a iPhone'

In [44]:
john_pat = pat_match('?P needs ?X'.split(), "John needs vacation".split())

In [45]:
substitute("Why does ?P need ?X ?".split(), pat_to_dict(john_pat))

['Why', 'does', 'John', 'need', 'vacation', '?']

In [46]:
' '.join(substitute("Why does ?P need ?X ?".split(), pat_to_dict(john_pat)))

'Why does John need vacation ?'

----

## Issue 1

编写一个程序, get_response(saying, response_rules)输入是一个字符串 + 我们定义的 rules，例如上边我们所写的 pattern， 输出是一个回答。

那么如果我们现在定义一些patterns，就可以实现基于模板的对话生成了:

### Character变量完整版

In [47]:
# return boolean
def is_variable(pat):
    return pat.startswith('?') and all(s.isalpha() for s in pat[1:])

In [48]:
# return pattern dictionary
def pat_to_dict(pattern):
    return {k: v for k, v in pattern}

In [49]:
# return pattern list, type = [()]
def pat_match(rule, saying):
    if not rule or not saying: return []
    
    if is_variable(rule[0]):
        return [(rule[0], saying[0])] + pat_match(rule[1:], saying[1:])
    else:
        if rule[0] != saying[0]: return []
        else:
            return pat_match(rule[1:], saying[1:])

In [50]:
# return str list
def substitute(rule, pattern_dic):
    if not rule: return []
    
    return [pattern_dic.get(rule[0], rule[0])] + substitute(rule[1:], pattern_dic)

In [51]:
import random
def get_response(saying, rules):
    """" please implement the code, to get the response as followings:
    
    >>> get_response('I need iPhone') 
    >>> Image you will get iPhone soon
    >>> get_response("My mother told me something")
    >>> Talk about more about your monther.
    """
    response = 'Sorry, I can not understand!'
    
    for rule, responses in rules.items():
        pattern = pat_match(rule.split(), saying.split())
        
        if pattern:
            pattern_dic = pat_to_dict(pattern)
            res_rule = random.choice(responses).split()
            response = ' '.join(substitute(res_rule, pattern_dic))
            
    return response

***Test***

In [52]:
# Set rules
defined_patterns = {
    "I need ?X": ["Image you will get ?X soon", "Why do you need ?X ?"], 
    "My ?X told me something": ["Talk about more about your ?X", "How do you think about your ?X ?"]
}

In [53]:
# Test 1
saying = "My mother told me something"
get_response(saying, defined_patterns)

'Talk about more about your mother'

In [54]:
# Test 2
saying = 'I need iPhone'
get_response(saying, defined_patterns)

'Image you will get iPhone soon'

In [55]:
# Test 3
saying = 'I love iPhone'
get_response(saying, defined_patterns)

'Sorry, I can not understand!'

----

## Segment Match

我们上边的这种形式，能够进行一些初级的对话了，但是我们的模式逐字逐句匹配的， "I need iPhone" 和 "I need ?X" 可以匹配，但是"I need an iPhone" 和 "I need ?X" 就不匹配了，那怎么办？

为了解决这个问题，我们可以新建一个变量类型 "?\*X", 这种类型多了一个星号(\*),表示匹配多个

首先，和前文类似，我们需要定义一个判断是不是匹配多个的variable

In [56]:
def is_variable(pat):
    return pat.startswith('?') and all(s.isalpha() for s in pat[1:])

In [57]:
def is_pattern_segment(pattern):
    return pattern.startswith('?*') and all(a.isalpha() for a in pattern[2:])

In [58]:
is_pattern_segment('?*P')

True

这段程序里比较重要的一个新函数是 segment_match，这个函数输入是一个以 segment_pattern开头的模式，尽最大可能进行，匹配到这个边长的变量对应的部分。

In [59]:
fail = [True, None]

def segment_match(pattern, saying):
    seg_pat, rest = pattern[0], pattern[1:]
    seg_pat = seg_pat.replace('?*', '?')

    if not rest: return (seg_pat, saying), len(saying)    
    
    for i, token in enumerate(saying):
        if rest[0] == token and is_match(rest[1:], saying[(i + 1):]):
            return (seg_pat, saying[:i]), i
    
    return fail

# Check whether the rest of pattern matches with the rest of saying
def is_match(rest, saying):
    if not rest and not saying:
        return True
    if not all(a.isalpha() for a in rest[0]):
        return True
    if rest[0] != saying[0]:
        return False
    return is_match(rest[1:], saying[1:])

然后我们把之前的 pat_match程序改写成如下， 主要是增加了 is_pattern_segment的部分.

In [60]:
def pat_match_with_seg(pattern, saying):
    if not pattern or not saying: return []
    
    pat = pattern[0]
    
    if is_variable(pat):
        return [(pat, saying[0])] + pat_match_with_seg(pattern[1:], saying[1:])
    
    elif is_pattern_segment(pat):
        if segment_match(pattern, saying) != fail:
            match, index = segment_match(pattern, saying)
            return [match] + pat_match_with_seg(pattern[1:], saying[index:])
        else: 
            return segment_match(pattern, saying)
    
    elif pat == saying[0]:
        return pat_match_with_seg(pattern[1:], saying[1:])
    
    else:
        return fail  

In [61]:
segment_match('?*P is very good'.split(), "My dog and my cat is very good".split())

(('?P', ['My', 'dog', 'and', 'my', 'cat']), 5)

***Detail Analysis***

In [62]:
pattern = '?*P is very good'.split()
saying = "My dog and my cat is very good".split()
print('pattern:', pattern)
print('saying:', saying)

pattern: ['?*P', 'is', 'very', 'good']
saying: ['My', 'dog', 'and', 'my', 'cat', 'is', 'very', 'good']


In [63]:
seg_pat, rest = pattern[0], pattern[1:]
print('seg_pat:', seg_pat)
print('rest:', rest)

seg_pat: ?*P
rest: ['is', 'very', 'good']


In [64]:
seg_pat = seg_pat.replace('?*', '?')
print('seg_pat:', seg_pat)

seg_pat: ?P


In [65]:
rest[0]

'is'

In [66]:
all(a.isalpha() for a in rest[0])

True

In [67]:
for i, token in enumerate(saying):
    if rest[0] == token and is_match(rest[1:], saying[(i + 1):]):
        print(rest[1:], saying[(i+1):])

['very', 'good'] ['very', 'good']


In [68]:
# rest = []
# if not rest:
#     print((seg_pat, saying), len(saying))

In [69]:
# rest[0] = 'go??od'
# if not all(a.isalpha() for a in rest[0]):
#     print('True')
# else:
#     print('None')

In [70]:
pat_match_with_seg("?*X greater than ?*Y".split(), "My mother told me something".split())

[True, None]

现在，我们就可以做到以下的匹配模式了:

In [71]:
pat_match_with_seg('?*P is very good and ?*X'.split(), 
                   "My dog is very good and my cat is very cute".split())

[('?P', ['My', 'dog']), ('?X', ['my', 'cat', 'is', 'very', 'cute'])]

***Detail Analysis***

In [72]:
pattern = '?*P is very good and ?*X'.split()
print('pattern:', pattern)

pattern: ['?*P', 'is', 'very', 'good', 'and', '?*X']


In [73]:
saying = "My dog is very good and my cat is very cute".split()
print('saying:', saying)

saying: ['My', 'dog', 'is', 'very', 'good', 'and', 'my', 'cat', 'is', 'very', 'cute']


In [74]:
pat = pattern[0]
print('pat:', pat)

pat: ?*P


In [75]:
saying[0]

'My'

如果我们继续定义一些模板，我们进行匹配，就能够进行更加复杂的问题了:

In [76]:
response_pair = {
    'I need ?X': ["Why do you neeed ?X"],
    "I dont like my ?X": ["What bad things did ?X do for you?"]
}

In [77]:
pattern = pat_match_with_seg('I need ?*X'.split(), "I need an iPhone".split())
substitute("Why do you need ?X".split(), pat_to_dict(pattern))

['Why', 'do', 'you', 'need', ['an', 'iPhone']]

***Detail Analysis***

In [78]:
pat_match_with_seg('I need ?*X'.split(), "I need an iPhone".split())

[('?X', ['an', 'iPhone'])]

In [79]:
pat_to_dict(pattern)

{'?X': ['an', 'iPhone']}

In [80]:
"Why do you need ?X".split()

['Why', 'do', 'you', 'need', '?X']

我们会发现，pat_to_dict在这个场景下会有有一点小问题，没关系，修正一些:

In [81]:
def pat_to_dict(patterns):
    return {k: ' '.join(v) if isinstance(v, list) else v for k, v in patterns}

In [82]:
pattern = pat_match_with_seg('I need ?*X'.split(), "I need an iPhone".split())
substitute("Why do you need ?X".split(), pat_to_dict(pattern))

['Why', 'do', 'you', 'need', 'an iPhone']

----

### Segment变量完整版

In [83]:
# return boolean
def is_variable(pat):
    return pat.startswith('?') and all(s.isalpha() for s in pat[1:])

In [84]:
# return boolean
def is_pattern_segment(pattern):
    return pattern.startswith('?*') and all(a.isalpha() for a in pattern[2:])

In [85]:
# Return pair of pattern_segment, matched pattern value, and len
# type = ((str, [str]), int)

fail = [True, None]

def segment_match(pattern, saying):
    seg_pat, rest = pattern[0], pattern[1:]
    seg_pat = seg_pat.replace('?*', '?')

    if not rest: return (seg_pat, saying), len(saying)    
    
    for i, token in enumerate(saying):
        if rest[0] == token and is_match(rest[1:], saying[(i + 1):]):
            return (seg_pat, saying[:i]), i
    
    return fail

# Check whether the rest of pattern matches with the rest of saying
# Return boolean
def is_match(rest, saying):
    if not rest and not saying:
        return True
    if not all(a.isalpha() for a in rest[0]):
        return True
    if rest[0] != saying[0]:
        return False
    return is_match(rest[1:], saying[1:])

In [86]:
def pat_match_with_seg(pattern, saying):
    if not pattern or not saying: return []
    
    pat = pattern[0]
    
    if is_variable(pat):
        return [(pat, saying[0])] + pat_match_with_seg(pattern[1:], saying[1:])
    
    elif is_pattern_segment(pat):
        if segment_match(pattern, saying) != fail:
            match, index = segment_match(pattern, saying)
            return [match] + pat_match_with_seg(pattern[1:], saying[index:])
        else: 
            return segment_match(pattern, saying)
    
    elif pat == saying[0]:
        return pat_match_with_seg(pattern[1:], saying[1:])
    
    else:
        return fail  

In [87]:
def pat_to_dict(pattern):
    return {k: ' '.join(v) if isinstance(v, list) else v for k, v in pattern}  

In [88]:
def substitute(rule, pattern_dic):
    if not rule: return []
    
    return [pattern_dic.get(rule[0], rule[0])] + substitute(rule[1:], pattern_dic)

In [89]:
import random
def get_response(saying, rules):
    response = 'Sorry, I can not understand!'
    
    for rule, responses in rules.items():
        pattern = pat_match_with_seg(rule.split(), saying.split())
        
        if pattern and pattern != fail:
            pattern_dic = pat_to_dict(pattern)
            res_rule = random.choice(responses).split()
            response = ' '.join(substitute(res_rule, pattern_dic))
            
    return response

***Test***

如果我们定义这样的一个模板:

In [90]:
# Set rules
rule_responses = {
    '?*x hello ?*y': ['How do you do', 'Please state your problem'],
    '?*x I want ?*y': ['what would it mean if you got ?y', 'Why do you want ?y', 'Suppose you got ?y soon'],
    '?*x if ?*y': ['Do you really think its likely that ?y', 'Do you wish that ?y', 'What do you think about ?y', 'Really-- if ?y'],
    '?*x no ?*y': ['why not?', 'You are being a negative', 'Are you saying \'No\' just to be negative?'],
    '?*x I was ?*y': ['Were you really', 'Perhaps I already knew you were ?y', 'Why do you tell me you were ?y now?'],
    "I was ?*X": ["Were you really ?X ?", "I already knew you were ?X ."]
}

In [91]:
# Test 1
saying = "I am mike, hello"
for i in range(3):
    print(get_response(saying, rule_responses))

How do you do
Please state your problem
Please state your problem


In [92]:
# Test 2
saying = "Mom I want an iphone"
for i in range(3):
    print(get_response(saying, rule_responses))

Why do you want an iphone
what would it mean if you got an iphone
Why do you want an iphone


In [93]:
# Test 3
saying = "I love iphone"
get_response(saying, rule_responses)

'Sorry, I can not understand!'

----

## Issue 2

改写以上程序，将程序变成能够支持中文输入的模式。 提示: 你可以需用用到 jieba 分词

In [94]:
import random
import jieba

fail = [True, 'None']

# Check if the pattern is variable ?X
# return boolean
def is_variable(pat):
    return pat.startswith('?') and all(s.isalpha() for s in pat[1:])

# Check if the pattern is segment variable ?*X
# return boolean
def is_pattern_segment(pattern):
    return pattern.startswith('?*') and all(a.isalpha() for a in pattern[2:])


# Return pair of pattern_segment, matched pattern value, and len
# type = ((str, [str]), int)
def segment_match(pattern, saying):
    seg_pat, rest = pattern[0], pattern[1:]
    seg_pat = seg_pat.replace('?*', '?')

    if not rest: return (seg_pat, saying), len(saying)    
    
    for i, token in enumerate(saying):
        if rest[0] == token and is_match(rest[1:], saying[(i + 1):]):
            return (seg_pat, saying[:i]), i
    
    return fail


# Check whether the rest of pattern matches with the rest of saying
# Return boolean
def is_match(rest, saying):
    if not rest and not saying:
        return True
    if not all(a.isalpha() for a in rest[0]):
        return True
    if rest[0] != saying[0]:
        return False
    return is_match(rest[1:], saying[1:])


# Match the pattern variables with input string and return the pair list
def pat_match_with_seg(pattern, saying):
    if not pattern or not saying: return []
    
    pat = pattern[0]
    
    if is_variable(pat):
        return [(pat, saying[0])] + pat_match_with_seg(pattern[1:], saying[1:])
    
    elif is_pattern_segment(pat):
        if segment_match(pattern, saying) != fail:
            match, index = segment_match(pattern, saying)
            return [match] + pat_match_with_seg(pattern[1:], saying[index:])
        else: 
            return segment_match(pattern, saying)
    
    elif pat == saying[0]:
        return pat_match_with_seg(pattern[1:], saying[1:])
    
    else:
        return fail


def pat_to_dict(pattern):
    return {k: ' '.join(v) if isinstance(v, list) else v for k, v in pattern}  


def substitute(rule, pattern_dic):
    if not rule: return []
    return [pattern_dic.get(rule[0], rule[0])] + substitute(rule[1:], pattern_dic)


def get_response(saying, rules):
    response = '对不起，我不明白你的意思，能说的详细一点吗？'
    replace1 = {'? * x': '?*x', '? * y': '?*y', '? * z': '?*z'}
    replace2 = {'? x': '?x', '? y': '?y', '? z': '?z'} 
    
    saying_cut = [word for word in jieba.cut(saying)]
    
    for rule, responses in rules.items():
        rule_str = ' '.join(jieba.cut(rule))
        for k, v in replace1.items():
            rule_str = rule_str.replace(k, v)
        
        pattern = pat_match_with_seg(rule_str.split(), saying_cut)
        # print(pattern)
        
        if pattern and pattern != fail and 'None' not in pattern:
            # print('inside:',pattern)
            pattern_dic = pat_to_dict(pattern)
            res_rule = random.choice(responses)
            res_rule_str = ' '.join(jieba.cut(res_rule))
            for k, v in replace2.items():
                res_rule_str = res_rule_str.replace(k, v)
                
            response = ''.join(substitute(res_rule_str.split(), pattern_dic))
            
    return response

***Test***

In [95]:
rules_dic = {
    '?*x你好?*y': ['你好呀', '请告诉我你的问题'],
    '?*x我想?*y': ['你觉得?y有什么意义呢？', '为什么你想?y', '你可以想想你很快就可以?y了'],
    '?*x我想要?*y': ['?x想问你，你觉得?y有什么意义呢?', '为什么你想?y', '?x觉得... 你可以想想你很快就可以有?y了', '你看?x像?y不', '我看你就像?y'],
    '?*x喜欢?*y': ['喜欢?y的哪里？', '?y有什么好的呢？', '你想要?y吗？'],
    '?*x讨厌?*y': ['?y怎么会那么讨厌呢?', '讨厌?y的哪里？', '?y有什么不好呢？', '你不想要?y吗？'],
    '?*xAI?*y': ['你为什么要提AI的事情？', '你为什么觉得AI要解决你的问题？'],
    '?*x机器人?*y': ['你为什么要提机器人的事情？', '你为什么觉得机器人要解决你的问题？'],
    '?*x对不起?*y': ['不用道歉', '你为什么觉得你需要道歉呢?'],
    '?*x我记得?*y': ['你经常会想起这个吗？', '除了?y你还会想起什么吗？', '你为什么和我提起?y'],
    '?*x如果?*y': ['你真的觉得?y会发生吗？', '你希望?y吗?', '真的吗？如果?y的话', '关于?y你怎么想？'],
    '?*x我?*z梦见?*y':['真的吗? --- ?y', '你在醒着的时候，以前想象过?y吗？', '你以前梦见过?y吗'],
    '?*x妈妈?*y': ['你家里除了?y还有谁?', '嗯嗯，多说一点和你家里有关系的', '她对你影响很大吗？'],
    '?*x爸爸?*y': ['你家里除了?y还有谁?', '嗯嗯，多说一点和你家里有关系的', '他对你影响很大吗？', '每当你想起你爸爸的时候， 你还会想起其他的吗?'],
    '?*x我愿意?*y': ['我可以帮你?y吗？', '你可以解释一下，为什么想?y'],
    '?*x我很难过，因为?*y': ['我听到你这么说， 也很难过', '?y不应该让你这么难过的'],
    '?*x难过?*y': ['我听到你这么说， 也很难过', '不应该让你这么难过的，你觉得你拥有什么，就会不难过?',
                 '你觉得事情变成什么样，你就不难过了?'],
    '?*x就像?*y': ['你觉得?x和?y有什么相似性？', '?x和?y真的有关系吗？', '怎么说？'],
    '?*x和?*y都?*z': ['你觉得?z有什么问题吗?', '?z会对你有什么影响呢?'],
    '?*x和?*y一样?*z': ['你觉得?z有什么问题吗?', '?z会对你有什么影响呢?'],
    '?*x我是?*y': ['真的吗？', '?x想告诉你，或许我早就知道你是?y', '你为什么现在才告诉我你是?y'],
    '?*x我是?*y吗': ['如果你是?y会怎么样呢？', '你觉得你是?y吗', '如果你是?y，那一位着什么?'],
    '?*x你是?*y吗':  ['你为什么会对我是不是?y感兴趣?', '那你希望我是?y吗', '你要是喜欢， 我就会是?y'],
    '?*x你是?*y' : ['为什么你觉得我是?y'],
    '?*x因为?*y' : ['?y是真正的原因吗？', '你觉得会有其他原因吗?'],
    '?*x我不能?*y': ['你或许现在就能?*y', '如果你能?*y,会怎样呢？'],
    '?*x我觉得?*y': ['你经常这样感觉吗？', '除了到这个，你还有什么其他的感觉吗？'],
    '?*x我?*y你?*z': ['其实很有可能我们互相?y'],
    '?*x你为什么不?*y': ['你自己为什么不?y', '你觉得我不会?y', '等我心情好了，我就?y'],
    '?*x好的?*y': ['好的', '你是一个很正能量的人'],
    '?*x嗯嗯?*y': ['好的', '你是一个很正能量的人'],
    '?*x不嘛?*y': ['为什么不？', '你有一点负能量', '你说 不，是想表达不想的意思吗？'],
    '?*x不要?*y': ['为什么不？', '你有一点负能量', '你说 不，是想表达不想的意思吗？'],
    '?*x有些人?*y': ['具体是哪些人呢?'],
    '?*x有的人?*y': ['具体是哪些人呢?'],
    '?*x某些人?*y': ['具体是哪些人呢?'],
    '?*x每个人?*y': ['我确定不是人人都是', '你能想到一点特殊情况吗？', '例如谁？', '你看到的其实只是一小部分人'],
    '?*x所有人?*y': ['我确定不是人人都是', '你能想到一点特殊情况吗？', '例如谁？', '你看到的其实只是一小部分人'],
    '?*x总是?*y': ['你能想到一些其他情况吗?', '例如什么时候?', '你具体是说哪一次？', '真的---总是吗？'],
    '?*x一直?*y': ['你能想到一些其他情况吗?', '例如什么时候?', '你具体是说哪一次？', '真的---总是吗？'],
    '?*x或许?*y': ['你看起来不太确定'],
    '?*x可能?*y': ['你看起来不太确定'],
    '?*x他们是?*y吗？': ['你觉得他们可能不是?y？']
}

In [96]:
# Test 1
saying = "我讨厌下雨"
for i in range(3):
    print(get_response(saying, rules_dic))

Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/jc/l9vx9tp979g0tm976wjrgwkr0000gn/T/jieba.cache
Loading model cost 0.827 seconds.
Prefix dict has been built succesfully.


下雨怎么会那么讨厌呢?
你不想要下雨吗？
你不想要下雨吗？


In [97]:
# Test 2
saying = "明天我想游泳"
for i in range(3):
    print(get_response(saying, rules_dic))

为什么你想游泳
为什么你想游泳
你觉得游泳有什么意义呢？


In [98]:
# Test 3
saying = "明天如果下雨"
for i in range(3):
    print(get_response(saying, rules_dic))

真的吗？如果下雨的话
关于下雨你怎么想？
真的吗？如果下雨的话


In [99]:
# Test 4
saying = "自然语言处理"
get_response(saying, rules_dic)

'对不起，我不明白你的意思，能说的详细一点吗？'

***Detail Analysis***

In [100]:
saying = "我讨厌下雨"
rule = '?*x讨厌?*y'

In [101]:
saying_cut = [word for word in jieba.cut(saying)]

In [102]:
saying_cut

['我', '讨厌', '下雨']

In [103]:
rule_cut = [word for word in jieba.cut(rule)]
rule_cut

['?', '*', 'x', '讨厌', '?', '*', 'y']

In [104]:
rule_str = ' '.join(jieba.cut(rule))
rule_str

'? * x 讨厌 ? * y'

In [105]:
replace = {'? * x': '?*x',
          '? * y': '?*y',
          '? * z': '?*z'}

In [106]:
for k, v in replace.items():
    rule_str = rule_str.replace(k, v)

In [107]:
rule_str

'?*x 讨厌 ?*y'

In [108]:
pattern = pat_match_with_seg(rule_str.split(), saying_cut)
pattern

[('?x', ['我']), ('?y', ['下雨'])]

In [109]:
print(pattern and pattern != fail)

True


In [110]:
pattern_dic = pat_to_dict(pattern)
pattern_dic

{'?x': '我', '?y': '下雨'}

In [111]:
res_rule = '?y怎么会那么讨厌呢?'

In [112]:
res_rule_str = ' '.join(jieba.cut(res_rule))
res_rule_str

'? y 怎么 会 那么 讨厌 呢 ?'

In [113]:
replace2 = {'? x': '?x', '? y': '?y', '? z': '?z'}

In [114]:
for k, v in replace2.items():
    res_rule_str = res_rule_str.replace(k, v)

In [115]:
res_rule_str

'?y 怎么 会 那么 讨厌 呢 ?'

In [116]:
substitute(res_rule_str.split(), pattern_dic)

['下雨', '怎么', '会', '那么', '讨厌', '呢', '?']

In [117]:
''.join(substitute(res_rule_str.split(), pattern_dic))

'下雨怎么会那么讨厌呢?'

----

----

## Issue 4

1. 这样的程序有什么优点？有什么缺点？你有什么可以改进的方法吗？
2. 什么是数据驱动？数据驱动在这个程序里如何体现？
3. 数据驱动与 AI 的关系是什么？

### Ans 1

**优点:**<br>
基于既定的规则，能够根据输入的语句进行匹配，匹配成功后能够从备选的回复语句中随机回答，有一定的趣味性，且可以多次运行。

**缺点:**
1. 需要花非大量时间预设匹配规则和回复语句。
2. 若规则过多，可能会造成规则匹配不准确和匹配时间过长的问题。

**改进想法:**<br>
尝试制定语法规则，这样可以减少机械性的匹配规则。

### Ans 2

**数据驱动：**<br>
我的理解是通过收集海量数据，进行清洗整理之后，提取相关的信息，通过对历史数据的训练和拟合，形成自动化的决策，一旦有新的数据输入，系统可以自动进行决策。

**在该程序中的体现：**<br>
通过输入的语句（即数据），系统进行规制匹配并返回特定的语句（即决策）。

### Ans 3

我的理解是对历史数据的训练和拟合，都需要用到机器学习的方法，可以说，如果没有AI，就没有自动决策的系统，也就称不上数据驱动。