A first pass at designing a voice interface
-----------------------

Today we are going to cobble together advice from a number of different sources about how to design a voice experience (in our case, an Alexa "Skill"). The sources we've drawn from refer to generic applications and not necessarily those that serve journalistic ends, but I think the overall advice is solid. You are free to follow it or to come up with your own approach. (If you choose the latter, please document what steps you followed to we can talk about your process as well.)

>Like many technologies that seem fresh off the presses (virtual reality, anyone?), voice user interfaces have been in the public consciousness for decades and in research circles even longer. Bell Laboratories debuted their “Audrey” system (the first voice controlled UI) in 1952, predating even Star Trek’s aspirational voice controlled computer!
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[From *Voice User Interface Design: New Solutions to Old Problems*](https://medium.com/microsoft-design/voice-user-interface-design-new-solutions-to-old-problems-baa36a64b3e4)

<br><br>
<img src=https://cdn-images-1.medium.com/max/1600/1*_7DuZTz-nVAkLnuHHcZY5A.jpeg style="width: 60%; border: #000000 1px outset;"/>

<br><br>
Since that point, there have been several "milestones" in conversational interfaces. [This summary](https://www.altexsoft.com/blog/business/a-comprehensive-guide-to-chatbots-best-practices-for-building-conversational-interfaces/) offers a timeline that actually starts with ELIZA, whom we met last week.

<br><br>
<img src=https://www.altexsoft.com/media/2017/05/History-of-chatbots-1024x447.jpg style="width: 70%; border: #000000 1px outset;"/>
<br><br>

Befire jumping ahead 65 years or so from "Audrey" and "Eliza," you could reasonably be wondering why it seems that so many digital conversationalists are female. I'd point you to an article in the Atlantic ["Why Do So Many Digital Assistants Have Feminine Names?"](https://www.theatlantic.com/technology/archive/2016/03/why-do-so-many-digital-assistants-have-feminine-names/475884/) which sites everything from academic studies that we take orders or attend to warning messages when they are delivered with a female voice, to outright sexism. I've seen a couple articles suggest that insted ["Genderless bots are the wave of the future"](http://www.thedrum.com/news/2017/04/28/genderless-bots-are-the-wave-the-future). These two articles are by no means meant to be representative of the writing on this. It is just hard not to see "Audrey" and "ELIZA" and "Alexa" and wonder.

** Everything old is new again. ** 

*The following introduction was written by Suman DebRoy. A similar accounting of the rise of voice interfaces [can be found here at moz.com](https://moz.com/blog/voice-strategy-guide)*
A new paradigm is ushered because there were inefficiencies in the previous paradigm, either present in the design or caused by evolutionary usage. Lets start with that 1980s supercomputer in your pocket- the smartphone. Think about the apps on your phone. You can launch every app independenlty. But there are also digital assistants trying to become a central intelligene in your phone, through which you communicate with some of the apps. Apple’s Siri, Amazon’s Alexa, Facebook’s M, Google Now and Microsoft’s Cortana all provide a single interface to control specific app capabilities. However, none of them allow us to do anything drastically advanced other than reducing the number of taps we make on a phone.

What exactly is better than.. *theres an app for that* ? The way humans perceive personal assistants is changing, as the word personal starts to take precedence over the assistant role. It may be only a matter of time until conversational agents invade consumer markets. There is [growing anticipation](http://observer.com/2016/01/2016-will-be-the-year-of-conversational-commerce/) because stats around user interactions with chat bots vs. app usage is incredible. 

Both the AppStore and Google Play host over 1.5 M apps each. Yet *on average, the number of apps downloaded by a person in the US every month is zero*. It seems like another paradigm shift is happening - the onset of messaging. Here are four main indicators that messaging is making apps irrelevant: 

1. App download slows considerably: 
    - Apps aren’t dying. But the entire space is collapsing, just like so many other industries before it. Its too crowded now, too hard to break in, numerous forced taps just for on-boarding and countless separate interfaces to keep track. Apps come with their own friction components — walled gardens, sign-up drags, untimely push notifications and re-installs. Both app makers and app users are getting increasingly frustrated with the ecosystem.
    - App transition is costly. As a panacea, bots within WeChat enable its 600m monthly users to book taxis, or check in for flights, or buy cinema tickets, or manage banking and reserve doctors’ appointments without ever leaving the app. 
2. User Retention is poor: 
    - It is incredibly hard to make an app and keep people interested or engaged. On average, the Daily Average Users of an app drops to 77% within the first 3 days, and by a stunning 95% in first 3 months. 
3. Artificial Intelligence is improving: 
    - There are many things happening in the AI space, in the subfields of computer vision, natural language processing, algorithmic art, speech recognition etc. The field most applicable to bots is natural language understanding. 
    - Current state of bot intelligence is somewhat of an ugly marriage of bits of AI which kind of works and lots of hand coding ([but this can change soon, yes scientists are on it](http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/)). To be honest, the bot world AI is still waiting for a  Pokémon Go moment with a giant breakout hit, but [we are getting there (in a fun way)](https://www.theguardian.com/technology/2016/jun/28/chatbot-ai-lawyer-donotpay-parking-tickets-london-new-york?CMP=share_btn_tw)
4. Messaging usage is outgrowing app usage:
    - The big 4 messsaging platforms now have more users than the big 4 social networks! Can you name them ?
    - As an example, 40% of US teens use [Kik](https://en.wikipedia.org/wiki/Kik_Messenger) ! 
    

**Bot design**

As I mentioned, [Amazon offers advice about voice interfaces and designing conversations](https://developer.amazon.com/designing-for-voice). 
They suggest you begin with the purpose of your skill and then form stories about your users, what makes them interact with your interface -- "what people need to and can do."

>**Identify the purpose and capabilities**
<br>
Describe one or more scenarios in which people will find your skill useful and desirable. Determine the capabilities of the skill by asking the following questions:
* What is the purpose of the skill? Why will people want to use it?
* What will the person be doing before, during, and after interacting with the skill?
* What will people get from the skill that they cannot get another way?
<br><br>

>**Identify the user stories**
<br>
Based on the purpose and capabilities of the skill, identify individual steps and actions.
* What can a user do, or not do, with the skill?
* What information is the person expected to have available?
* What are the ways a user can invoke the skill?
* What features directly support the purpose?
* Is there information that you need from other experiences, for example from a website or from a mobile app?

They then ask you to outline basic scripts that "show the conversation betwen the user and Alexa, like in a movie or a play." [Have a look at their guidelines and samples here.](https://developer.amazon.com/designing-for-voice/design-process/) They close the design process by outlining what you need to do to expand your basic scripts to fully articulate your application, getting prepared for when users don't act in the ways you expect.

As you formulate your scripts, it might be good to spend some time thinking about *what constitutes **identifiable elements** of "conversation"*. We can then, next time, translate these concepts into code. Again, we are not trying to have these bots "pass" for human, but instead be able to carry out basic conversational structure to get a task done, say. Interestingly, however, people will try to converse with a bot EVEN IF they know that the bot's soul purpose is clearly transactional. 

<br>
<br>

| Element of Conversation | Possible Techniques to Compute/ Quantify |
| ------ | ----------- |
|1. Notifications/ Recalling relevant things   |  Time Series Analysis, Alerting, Keyword caches |
|2. Learning topics in context | Topic Mining/Modeling - extract the topic from the words in text |
|3. Understanding Social Networks (offline and online)  | Network Science, the study of the structure of how things are connected and how information flows through it |
|4. Responding to Emotion  | Sentiment Analysis
|5. Having Episodic Memory  | Some kind of graphical model, [see Aditi's data post](https://medium.com/@aditinair/episodic-memory-modeling-for-conversational-agents-7c82e25b06b4#.9k65cziqw). |
|6. Portraying Personality  | Decision Tree, which is a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. |

<br>
<br>

**Scripted Bots, Decision Trees and Hybrids**

The simplest bots are scripted bots -- like ELIZA.  Their entire interaction is hard-coded (the “script”) that determines what the bot can and cannot do.  The “script” is a [decision tree](https://en.wikipedia.org/wiki/Decision_tree) where responding to one question takes you down a specific path, which opens up a new, pre-determined set of possibilities.

For example, here's how you could design dialogue for a bot that helps you decide if you should have a cookie:
<br>
<br>

<img src= "http://static.twentytwowords.com/wp-content/uploads/cookie.gif" style="width: 40%;"/>
<br>
<br>

In hybrid bots, some part of the interaction is driven by scripts (pre-written) but the majority is driven by an algorithm that isn't necessarily hardcoded. Instead, the model is instead generated from the current state of the data and then fitted into a conversation template, making it sound like a natural language response. 

This method can lead to hybrid bots, or fully automated ones:
- extreme mashup + silly fun , like [NYTimes Minus Contex](https://twitter.com/NYTMinusContext) (this is automated)
- scripted-funny + data-driven responses, like: poncho, xiaoice 
- scripted-onboarding + data-driven responses, like: digg, tay 

As Natural Language Processing becomes smarter, we will be able to do more fun things with bots at scale. The two key areas that is making lots of progress is (1) NL understanding: i.e. figuring out what the chat message means in context, and (2) NL Generation, i.e. generate a NL report from data (think about baseball or financial reports written by AI).

**Some basic principles**

In consulting a number of online guides like the text 
[Designing Bots](http://shop.oreilly.com/product/0636920057741.do) or a number of online blog posts 
[here](https://voiceui.fjordnet.com/), [here](https://careerfoundry.com/en/blog/ux-design/ultimate-guide-to-voice-ui-design) and [here](https://www.altexsoft.com/blog/business/a-comprehensive-guide-to-chatbots-best-practices-for-building-conversational-interfaces/), we found considerable consistency in approach. 

Most recommend defining clearly what you want your skill to do and to map out the flow of a person's conversation with the bot. The tools like decision trees will help you here. There is also a concern with the "easiest" way to do something. People, they suggest, don't want to listen to menus like you would in a phone tree, but instead they want things to play out in an order that makes sense. They also suggest endowing your bot with a personality... I suppose we should talk about that. 