# Dialogue system and chatbot

## definition

| Attributes          | Dialogue Systems                              |
|---------------------|-----------------------------------------------|
| Input               | Series of utterances in a conversation        |
| Output              | Series of responses maintaining a conversation|
| Scope               | multi-turn                                    |
| Type                | Task-driven, open-ended (ChitChat)            |
| Components          | Natural Language Understanding (NLU), Dialogue Management, Natural Language Generation (NLG) |
| Algorithm           | rule-based, IR-based, neural                  |
| Evaluation Metrics  | Dialogue Success Rate, slot error rate, User Satisfaction |
| Benchmark Datasets  | MultiWOZ, DSTC, Persona-Chat, Movie Dialogue, Ubuntu Dialogue, Topical Chat, Switchboard |
| Information Source  | dialog corpus, user utterances, Knowledge bases, web, external APIs, predefined rules, internal state |


## history

| Year | Era | Dialogue Systems | Dialogue Dataset |
| ---- | --- | ---------------- | ---------------- |
| 1960s | Knowledge-based and Close-domain | ELIZA (psychotherapy) | No |
| 1970s-1980s | More Complex Knowledge-based and Close-domain | SHRDLU (blocks world), PARRY (schizophrenia) | No |
| 1990s | IR-based | Close-domain, TRAINS (transportation), A.L.I.C.E | Switchboard, ATIS |
| 2000s | IR-based and Open-domain | POMDP | No |
| 2010s-Present | Neural | Apple's Siri, Google Now, Microsoft's Cortana, Amazon's Alexa, OpenAI's ChatGPT | Movie Dialog, Ubuntu Dialogue, MultiWOZ |


## dialog

The words "dialog" (American English) and "dialogue" (British English) are used interchangeably, refer to a conversation between two individuals.

### speech act

Constative: Speaker commits to something being the case.(answer, claim, confirm, deny, disagree, state) 描述性 (回答, 声明, 确认, 否认, 不同意, 陈述) 发言者承认某事是如此。

Directive: Speaker attempts to get listener to do something. (advise, ask, forbid, invite, order, request) 指示性 (建议, 询问, 禁止, 邀请, 命令, 请求) 发言者试图让听者做某事。

Commissive: Speaker commits to a future course of action. (promise, plan, bet, oppose)  承诺性 (承诺, 计划, 打赌, 反对) 发言者承诺未来的行动方向。

Acknowledgment: Speaker expresses attitude to listener wrt. some social action (apologize, greet, thank, accept apology)— 承认性 (道歉, 问候, 感谢, 接受道歉) 发言者对听者在某些社交行为上表示态度

### grouding 基础

For communication to be successful, both parties have to know that they understand each other or where they misunderstand each other.

Common ground: The set of mutually agreed beliefs among the parties in a dialogue

- their own beliefs about the state of affairs that they're talking about.

— beliefs about the other party’s beliefs about the state of affairs.

— Both parties also maintain beliefs about the other party’s beliefs about their own beliefs.

### initiative 对话发起者

- user-initiative

- system-initiative: job interview, survey questionaire (fill form)

- mixed-initiative

### inference

speaker doesn’t answer the question directly, but assumes the provided information allows the agent to infer the requested information

e.g. airline booking

Agent: And, **what day** in May did you want to travel? 

Customer: OK uh I need to be there for a meeting that’s **from the 12th to the 15th**.

Gricean Maxims (philosopher H.P. Grice): guidelines for effective communication. The maxims are:

- Maxim of Quantity: Make your contribution as informative as required, but not more or less than is required.

- Maxim of Quality: Do not say what you believe to be false, and do not say that for which you lack adequate evidence.

- Maxim of Relevance: Be relevant. In other words, your contribution should relate to the purpose of the conversation.

- Maxim of Manner: Be clear, brief, and orderly. Avoid obscurity and ambiguity.

### structure

**Adjacency pair**: one speaker's utterance (the first part) prompts a response from another speaker (the second part). The pair is tightly coupled and highly predictable, and they come in many types. 

Here are a few examples:

Question-Answer:

A: "How are you?"
B: "I'm good, thanks."

Greeting-Greeting:

A: "Hello!"
B: "Hi there!"

Offer-Acceptance/Refusal:

A: "Would you like some coffee?"
B: "Yes, please." / "No, thank you."

Invitation-Acceptance/Decline:

A: "Would you like to come to the party tonight?"
B: "I'd love to!" / "I'm sorry, I can't."

Compliment-Response:

A: "I love your dress."
B: "Thank you!"

opening and closing

presequence: initial actions that set up or prepare for a main action in a conversation. "Are you there?" 在吗 "Can I ask you something?"

feedback: listeners provide backchannel/continuer (nod, uh-huh, yeah, right), or assessments (great) to show they are paying attention or understand.

topic management: initiation, shift, and closure of topics during a conversation.

subdialogue

- clarification

    User: What do you have going to **UNKNOWN WORD** on the 5th? 

    System: Let’s see, going **where** on the 5th?

    User: Going to Hong Kong. 

    System: OK, here are some flights...

- correction

    Agent: OK. There's **two non-stops**

    Client: **Actually**, what day of the week is the 15th? 
    
    Agent: It’s a Friday.

    Client: Uh hmm. I would consider staying there an extra day til  Sunday. 
    
    Agent: OK...OK. On Sunday I have ...

## evaluation

task-driven dialog: dialogue success rate, slot error rate

chatbot: human (participant, observer), adversarial evaluation

## open-ended dialog system

application: entertainment (chitchat 闲聊), therapy (ELIZA, PARRY)

algorithm: rule-based (ELIZA, PARRY), corpus-based (Informationretrieval, generation (seq2seq and RL))

## task-driven dialog system

### application

- virtual assistant
	
	products: Apple Siri, Amarzon Alexa, Microsoft Cortana, Google Now
	
	daily life tasks: play music, set alarm, check weather

- customer service
	
	customer inquiries: ATT HMIHY

	travel arragement (book ticket/hotel/airline/restaurant)

- E-commerce: payment and buy product

- education: tutorial

- health care: mental health support, make appointment for hospital visit

### dialogue-state framework

![image.png](attachment:image.png)

components: 

- user (environment)

- ASR (automatic speech recognition)

- NLU (natural language understanding):

    - **intent detection** 意图识别: conversation is viewed as multiple frames, each consisting of slots that need to be filled with information from the user's utterance. each slot is a question to user.

    - slot filling: BIO sequence labelling task. a tagging format used in NER. 'B' : beginning of an entity, 'I' : inside of an entity, 'O' : a token is outside any entity.

- Dialogue management

    - Dialogue state tracker: maintains the current state of the dialogue

    - Dialogue policy: determine actions at each step of the conversation, based on current state of dialogue. uses policies which could be rule-based, ML or reinforcement learning.

        - GUS policy: ask questions until the frame was filled then do database query

        - RL policy

- NLG (natural language generation)

- speech synthesis

- agent

![image.png](attachment:image.png)