# Checking IBM Watson ASR differences between the Vive headset microphone and a radio mic

Here's the original text:

*We present an immersive multi-person game developed for testing models of non-verbal behaviour in conversation. People interact in a virtual environment using avatars that are driven, by default, by their real-time head and hand movements. However, on the press of a button each participant's real movements can be substituted by "fake" avatar movements generated by algorithms. The object of the game is to score points in two ways a) by faking without being detected and b) by detecting when others are faking. This enables what amounts to a non-verbal Turing test in which the effectiveness of different algorithms for controlling non-verbal behaviour can be directly tested and evaluated in live interaction.*

## Audio files:

`audio/emily/headset_mic.wav`

<audio controls src="audio/emily/headset_mic.wav"></audio>

`audio/emily/radio_mic.wav`

<audio controls src="audio/emily/radio_mic.wav"></audio>

In [1]:
import difflib
import json
import os

In [2]:
TRANSCRIPTS_DIR = 'transcripts'
TARGET = '''
We present an immersive multi-person game developed for testing models
of non-verbal behaviour in conversation. People interact in a virtual
environment using avatars that are driven, by default, by their real-time
head and hand movements. However, on the press of a button each
participant's real movements can be substituted by "fake" avatar
movements generated by algorithms. The object of the game is to score
points in two ways a) by faking without being detected and b) by
detecting when others are faking. This enables what amounts to a
non-verbal Turing test in which the effectiveness of different algorithms
for controlling non-verbal behaviour can be directly tested and evaluated
in live interaction.
'''
TARGET = [''.join([c for c in word if c.isalnum()]) for word in TARGET.split()]

Write the target to file for conveneince.

In [3]:
with open('target.txt', 'w') as f:
    for word in TARGET:
        print(word, file=f)

In [4]:
def dedup_words(iterable):
    known_ids = set()
    for item in iterable:
        id_ = item['id']
        if id_ not in known_ids:
            known_ids.add(id_)
            yield item['word']

In [5]:
def load_transcript(filepath):
    with open(filepath) as f:
        full_transcript = [json.loads(l) for l in f.readlines()]
        
    words = [d['word'] for d in full_transcript]
    
    return {
        'full': full_transcript,
        'words': words,
        'first_appearance': list(dedup_words(full_transcript)),
        'last_appearance': list(dedup_words(full_transcript[::-1]))[::-1],
    }


def get_transcript_name(transcript_filepath):
    parts = transcript_filepath.split(os.sep)
    parts.pop(0)
    filename = parts[-1]
    extless_filename, _ = os.path.splitext(filename)
    parts[-1] = extless_filename
    return os.path.join(*parts)

In [6]:
def load_transcripts(transcripts_dir):
    transcripts = {}
    for dirpath, _, filenames in os.walk(transcripts_dir):
        for filename in filenames:
            transcript_filepath = os.path.join(dirpath, filename)
            transcript_name = get_transcript_name(transcript_filepath)
            transcripts[transcript_name] = load_transcript(transcript_filepath)
    return transcripts

In [7]:
ts = load_transcripts(TRANSCRIPTS_DIR)

# Diff

## Code meaning

(copied from [python's difflib docs](https://docs.python.org/3.7/library/difflib.html#difflib.Differ))

- `-` line unique to sequence 1
- `+` line unique to sequence 2

In [8]:
def diff(a, b):
    for line in difflib.ndiff(a, b):
        line = line.strip()
        if line and not line.startswith('? '):
            yield line

This is the diff of the first appearance of each word. Words have IDs, so any future occurance is ignored.

In [9]:
for line in diff(ts['emily/headset_mic']['first_appearance'], ts['emily/radio_mic']['first_appearance']):
    print(line)

we
present
- anime
- a
- must
- mall
- passing
+ and
+ the
+ mass
+ multi
+ part
game
+ device
+ for
+ teh
+ model
+ from
+ non
+ verbal
+ but
- to
- to
- protest
- more
- not
- by
- behavior
- and
- co
- and
- interact
in
a
+ uh
+ into
- two
+ to
+ a
+ virtual
and
by
- you
+ using
- avatars
+ avatar
that
+ trip
+ by
- tried
- to
- I
- do
- but
- there
+ the
- real
+ by
+ the
+ built
time
had
- me
- on
- Han
- made
- however
+ moved
+ and
+ how
+ moved
+ how
on
the
- plus
+ press
of
a
button
each
- about
- two
- me
+ for
+ is
+ move
can
be
- so
+ self
by
+ faith
- thank
- on
- the
- top
- movements
- Jen
- by
- Aug
- the
- old
of
+ a
+ Tom
+ is
+ our
+ go
+ over
+ it
the
game
- is
+ disc
to
score
points
- and
- to
+ into
+ a
ways
- eight
- bye
- for
- without
- being
- to
- he
- dissect
- when
- all
- the
- faking
a
- now
- to
- amount
+ by
+ faith
+ with
+ be
to
be
- in
+ by
+ just
+ went
+ all
+ this
+ fake
+ this
+ name
+ one
+ amounts
+ to
+ be
+ a
- non
+ no
- bug
- cherry
- becau

This is the diff of the last appearance of each word, which is obviously more "stable" and with therefore higher agreement between the mics.

In [10]:
for line in diff(ts['emily/headset_mic']['last_appearance'], ts['emily/radio_mic']['last_appearance']):
    print(line)

- be
+ we
present
and
a
massive
multi
passing
game
developed
+ for
- protesting
+ testing
models
of
non
verbal
behavior
in
conversation
- people
+ uh
interact
in
a
virtual
environment
by
using
avatars
that
+ driven
- trend
- and
by
default
- by
+ but
- their
+ the
real
time
had
movements
and
- handmade
- mints
+ hand
+ movements
however
on
the
press
of
a
button
each
participants
will
movements
can
be
substituted
by
+ fake
+ avatar
- thank
- on
- the
- Tom
movements
generated
by
algorithms
the
object
of
the
game
is
to
score
points
in
two
ways
eight
by
faking
without
being
detected
- de
+ beat
+ by
dissecting
when
others
a
faking
- isn't
- able
- to
+ this
+ enables
+ what
amounts
to
be
a
non
- bubble
- cheering
+ verbal
+ Turing
test
in
which
the
effectiveness
of
different
algorithms
for
controlling
non
verbal
behavior
can
be
directly
tested
and
evaluated
- and
- light
+ in
+ life
interaction


# Edit distance

The number of '+' and '-' lines in a diff.

In [11]:
def edit_distance(a, b):
    return len([_ for x in diff(a, b) if (x.startswith('+') or x.startswith('-'))])

In [12]:
edit_distance(ts['emily/headset_mic']['first_appearance'], ts['emily/radio_mic']['first_appearance'])

161

In [13]:
edit_distance(ts['emily/headset_mic']['first_appearance'], TARGET)

162

In [14]:
edit_distance(ts['emily/radio_mic']['first_appearance'], TARGET)

153

In [15]:
edit_distance(ts['emily/headset_mic']['last_appearance'], ts['emily/radio_mic']['last_appearance'])

41

In [16]:
edit_distance(ts['emily/headset_mic']['last_appearance'], TARGET)

78

In [17]:
edit_distance(ts['emily/radio_mic']['last_appearance'], TARGET)

57

# Word count

In [18]:
len(TARGET)

111

In [19]:
len(ts['emily/headset_mic']['first_appearance'])

119

In [20]:
len(ts['emily/radio_mic']['first_appearance'])

118