<div style="margin: 0 auto 10px; height: 70px; border: 2px solid gray; border-radius: 6px;">
  <div style="float: left; margin: 5px 10px 5px 10px; "><img src="img/bfh.jpg" /></div>
  <div style="float: right; margin: 20px 30px 0; font-size: 15pt; font-weight: bold; color: #98b7d2;"><a href="https://moodle.bfh.ch/course/view.php?id=39255" style="color: #98b7d2;">BTE5476 - Project-Oriented Digital Signal Processing </a></div>
</div>
<div style="clear: both; font-size: 30pt; font-weight: bold; color: #64788b; margin-left: 30px;">
    DTMF Dialing - Part 1
</div>

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as sp
import IPython
from scipy.io import wavfile
import os
import csv

In [None]:
plt.rcParams["figure.figsize"] = (9,2.5)

# Introduction

<div style="float: right; margin: 10px;"><img src="img/phone.jpg" width="250"></div>

DTMF (Dual-Tone Multi-Frequency) is a signaling protocol used to transmit simple numeric information over the frequency band provided by analog telephone lines, that is, between 300 Hz and 3400 Hz. When you use the keypad of an analog phone such as the one shown on the right, the sequence of dialed digits is transmitted to the phone company's switches in the form of audible _dial tones_. Today, cell phones and office phones are directly connected to digital networks and therefore no longer use DTMF for dialing. But DTMF is still frequently used in automated attendant systems (i.e., those phone menus where you are told to "press 1 to talk to customer service" etc.)


## DTMF specifications

In DTMF the phone's keypad is arranged in a $4\times 3$ grid and each key is associated to a unique *pair* frequencies, as shown by this table:


|            | **1209 Hz** | **1336 Hz** | **1477 Hz** |
|------------|:-----------:|:-----------:|:-----------:|
| **697 Hz** |      1      |      2      |      3      |
| **770 Hz** |      4      |      5      |      6      |
| **852 Hz** |      7      |      8      |      9      |
| **941 Hz** |      *      |      0      |      #      |


When a key is pressed, two oscillators operating at the frequencies associated to the key send their output over the phone line. For instance, if the digit '1' is pressed, the oscillators will produce the following continuous-time signal
$$
    x(t) = \sin(2\pi\cdot 1209\cdot t) + \sin(2\pi\cdot697\cdot t)
$$

When dialing a multi-digit number, successive dial tones must be separated by a silent gap; although the official standard does not set standard timings, a DTMF receiver should be designed according to the following specifications:
 * valid dial tones can be as short as 40ms 
 * the silent gap between tones can also be as short as 40ms
 * actual tone frequencies can deviate up to $\pm 1.5\%$ from their nominal values 

# Digital DTMF

DTMF was developed in the late 1950s and the first commercial DTMF phones hit the market in the 1960s. At the time, the system was implemented using analog hardware and the various frequencies were generated by a set of individual electronic oscillators.

<div style="float: left; margin: 0px;"><img src="img/mt8870.jpg" width="150"></div>

Obviously this is no longer the case and today DTMF signals are generated and decoded by dedicated (and extremely inexpensive) [DSP chips](https://www.microsemi.com/product-directory/dtmf-dual-tone-multiple-frequency-receivers/4317-mt8870d). In this notebook we will implement our own digital DTMF algorithms but, before anything else, let's review the relationship between the DTMF frequency values in Hz as specified by the standard and the digital frequencies that we will need to use in discrete time. This is relatively straightforward even without any formal knowledge of sampling and interpolation since all the signals involved are pure sinusoids.

## Digital to analog 

A digital-to-analog (D/A) converter, such as the soundcard in your PC, creates its analog output by interpolating the incoming digital samples at a rate of $F_s$ samples per second; this rate is the "clock" of D/A converter and, although it is an _interpolation_ rate, it is usually referred to as the _sampling rate_ or sampling _frequency_ of the system, using the same term that we use for an analog-to-digital converter.

When a soundcard with interpolation rate $F_s$ "plays" a discrete-time sequence of the form $x[n] = \cos(\omega_0 n)$ (that is, a discrete-time sinusoid with digital frequency $\omega_0 \in [-\pi, \pi]$), it outputs the continuous-time sinusoid $x(t) = \cos(2\pi f_0 t)$ where

$$
    f_0 = \frac{\omega_0}{2\pi}F_s. \tag{1}
$$

This means that the analog frequency of the output depends _both_ on the frequency of the discrete-time sinusoid _and_ on the interpolation rate of the soundcard, which is usually a design parameter. In general, we want to keep all sampling rates as low as possible since the power consumption of an D/A chip is approximately proportional to $F_s^2$. 

As an example, here you can listen to how the pitch changes when the _same_ discrete-time sinusoid is played by the soundcard at different interpolation rates (and note how the duration of the audio also changes, obviously): 

In [None]:
w = 2 * np.pi * 0.05 
x = np.sin(w * np.arange(0, 8000))

for Fs in [8000, 16000, 4000]:
    print(f'Using an interpolation rate of {Fs} samples per second:')
    display(IPython.display.Audio(x, rate=Fs, normalize=False))

**Exercise: minimal rate.** What is the minimal interpolation rate required by to implement a digital DTMF transmitter?

## Analog to digital

The soundcard in your PC also works as an analog-to-digital (A/D) converter: it records an incoming audio signal by measuring (that is, by _sampling_) its amplitude $F_s$ times per second. 

If the input is a sinusoid of the form $x(t) = \sin(2\pi f_0 t)$ the resulting discrete-time signal will be 

$$
    x[n] = \sin(\omega_0 n) \qquad \text{with} \qquad \omega_0 = 2\pi\frac{f_0}{F_s}.
$$

As long as the a sampling frequency is larger than _twice_ the frequency of the input sinusoid, the sequence of samples is a perfect representation of the analog waveform in the sense that $x[n]$ can be interpolated back into $x(t)$ _exactly_ by a D/A converter also operating at $F_s$ samples per second.

**Exercise: aliasing.** Consider D/A converter connected in cascade to a A/D converter; both converters operate at the same rate $F_s$. Assume that the input to the cascade is the signal $x(t) = \sin(2\pi f_0 t)$ with $f_0 = 1.6F_s$.

What is the analog signal at the output of the cascade?

## Final design parameter

Although in theory we could use a lower value, in the rest of the notebook we will use $F_s = 8000$:
 * since the telephone channel is "natuarally" bandlimited to 4000 Hz, with a sampling frequency of 8 kHz no additional anti-aliasing filter is needed
 * in most soundcards, this is the lowest available sampling rate.

In [None]:
Fs = 8000

# The encoder

In the next exercise, you task will be to implement a DTMF encoder as a Python function: the function takes a string of key values as input and returns their DTMF encoding as discrete-time audio signal that can be played at a rate of $F_s$ samples/second.

To get you started, here is a partial implementation where:
 * the DTMF frequency pairs are available as a dictionary, indexed by the key values
 * the durations (in seconds) of the tones and the silence gap are also specified.

**Exercise: implement a DTMF encoder.** Complete the function below so that it returns the DTMF encoding of a series of key values, passed as a string. The encoding should be padded with 250 milliseconds of silence both at the beginning and at the end.

In [None]:
def DTMF_encode(digits: str, Fs=8000) -> np.ndarray: 
    PADDING_SEC = 0.25
    TONE_SEC, SPACE_SEC = 0.2, 0.1
    DTMF_FREQS = {
        '1': (697, 1209), '2': (697, 1336), '3': (697, 1477),
        '4': (770, 1209), '5': (770, 1336), '6': (770, 1477),
        '7': (852, 1209), '8': (852, 1336), '9': (852, 1477),
        '*': (941, 1209), '0': (941, 1336), '#': (941, 1477),        
    }
    
    # index range for tone intervals
    #n = np.arange(...)
    
    #  output signal, start with initial silence
    #x = np.zeros(...))
    
    for k in digits:
        try:
            # select the DTMF frequencies
            ... 
            # append tones and space to output
            # x = np.r_[ x, ... ]
        except KeyError:
            print(f'invalid key: {k}')
            return None
    # append final silence and return
    return #...

Let's test it and evaluate it "by ear":

In [None]:
x = DTMF_encode('123##45', Fs=Fs)
IPython.display.Audio(x, rate=Fs)

# The decoder

As is always the case in telecommunication systems, the encoder is the easy part; but now we need to build a decoder. Designing a robust decoder is difficult because there are a lot of things that can degrade the quality of the received signals; for instance:
 * there will always be some noise in the signal, and sometimes LOTS of noise
 * the signal could be affected by nonlinear distortion
 * the durations of tones and gaps can vary a lot
 * if the intenal clocks at encoder and decoder are running at slightly different speed, the DTMF frequencies will deviate from their nominal values.

This week, your take-home exercise is the implementation of a DTMF decoder. To get you started, let's look at the DTMF signal in more detail and see how we can process it. 

## Something from last week

Here is a function that returns the magnitude spectrum of a signal over the positive frequency axis together with the axis labels in Hz; we implemented this in last week's notebook, where you can review the details if needed:

In [None]:
def spectral_power(x: np.ndarray, Fs: float) -> (np.ndarray, np.ndarray):
    L = len(x) // 2
    return np.linspace(0, Fs/2, L+1), np.abs(np.fft.fft(x)[:L+1] / len(x)) ** 2

## Time-Frequency Analysis of a DTMF signal

A DTMF signal carries information both in time and in frequency: key values are encoded by frequency pairs while the sequence of keys is encoded in time. But this creates a problem: if we look at the spectrum of a DTMF signal we can easily see what keys have been pressed but we can't determine their order or their multiplicity:

In [None]:
a = DTMF_encode('159', Fs)
b = DTMF_encode('915915', Fs)

plt.subplot(1,2,1)
plt.plot(*spectral_power(a, Fs))
plt.subplot(1,2,2)
plt.plot(*spectral_power(b, Fs));

On the other hand, if we look at the signals in time we can see the number of key presses and their order but we cannot easily see the DTMF frequencies:

In [None]:
plt.subplot(1,2,1)
plt.plot(np.arange(0, len(a)) / Fs, a)
plt.subplot(1,2,2)
plt.plot(np.arange(0, len(b)) / Fs, b);

The solution is **time-frequency** analysis and the most common tool for this is the *spectrogram*. In a spectrogram we split the input signal into shorter segments (or *chunks*) and we compute the magnitude of the DFT for each segment independently. The result is a 2D matrix where each magnitude value is indexed by the chunk number and by the DFT index. To visualize it, the magnitude values are *color-coded* so that low values appear dark and high values appear bright.

Here are the spectrograms of the previous signals, and you can see how both the time and the frequency information are clearly visible.

In [None]:
plt.subplot(1,2,1)
f, t, X = sp.spectrogram(a, Fs)
plt.pcolormesh(t, f, X)
plt.subplot(1,2,2)
f, t, X = sp.spectrogram(b, Fs)
plt.pcolormesh(t, f, X);

## Signal segmentation

Our time-frequency analysis suggests that, in order to decode a DTMF signal, we need to isolate each tone interval before we look at its frequency content. Since we know that tone intervals are separated by silent gaps, a simple segmentation strategy is the following:
 * for each input sample compute the *local power* of the signal
 * look for power level transitions from low (silence gap) to high (tone intervals).

To monitor the signal's power level remember that:
 * energy is the sum of the squared samples
 * power is energy averaged in time,
   
With this, the simplest and most common method to compute the local power is to square each incoming sample and filter the sequence of squared samples with a narrowband lowpass filter such as a moving average or leaky integrator:

In [None]:
# using a moving average with 5ms delay
M = int(Fs * 0.01)
local_energy = sp.lfilter(np.ones(M) / M, 1, a ** 2)
plt.plot(a);
plt.plot(local_energy);

In [None]:
# using a leaky integrator with the same delay
lam = M / (M + 2) 
local_energy = sp.lfilter(1 - lam, [1, -lam], a ** 2)
plt.plot(a);
plt.plot(local_energy);

## State machines

Most digital signal processing applications need to work in real time, that is, they need to process an input stream of samples continuously, one sample at a time. Last week, for instance, we saw how to implement filters as Python *classes* so that a filter could be called repeatedly on individual samples; using classes allowed the filter to preserve the contents of its delay blocks across calls (that is, the filter's internal *state* was persistent).  

In more complex systems that implement some decision logic, the processing algorithm needs to behave differently according to the *current* properties of the input; for instance, when decoding DTMF, we want to look at the frequency content in tone intervals but do nothing for the silence gaps. 

The most common **design pattern** in this case is a *state machine*; every time a new sample arrives:
 * we first perform the required pre-processing steps (e.g. compute the local energy)
 * we then apply a different set of processing steps that depend on an internal *state*
 * finally, we check if we neet to switch to a different state for the next iteration.

For example, if we want to isolate the tone segments in a DTMF signal, we can use a machine with three states:
 1. waiting for the input power to increase
 2. checking that the power increase lasts long enough to indicate a tone
 3. waiting for the power to decrease and signal the end of the tone

Here is an example that produces a segmentation of a DTMF signal based on a user-defined power threshold:

In [None]:
def tone_clips(x: np.ndarray, t_pow: float, Fs: float) -> list[tuple[int, int]]:
    # gain for leaky integrator to compute power level 
    LI_GAIN = 0.01 
    # minimum length (ms) for tone detection
    MIN_LEN = int(20 * Fs / 1000) 
    
    ret = []
    power = 0
    # delay of leaky integrator
    delay = (1 - LI_GAIN) / LI_GAIN 
    
    # state machine with three states
    WAIT_TONE, CHECK_TONE, WAIT_END = 0, 1, 2
    state= WAIT_TONE
    
    for n in range(0, len(x)):
        # compute power with leaky integrator
        power = (1 - LI_GAIN) * power + LI_GAIN * x[n] * x[n]
        # check if power is above threshold
        power_high = power > t_pow

        if state == WAIT_TONE:  # we are waiting for the power to go high
            if power_high:
                count = 0
                state = CHECK_TONE
        elif state == CHECK_TONE:  # make sure power stays high before switching
            if not power_high:
                state = WAIT_TONE  # false alarm: go back to waiting
            else:
                count += 1
                if count > MIN_LEN:
                    state = WAIT_END  # power was high for more than 40ms, it's a tone
        else:  # wait for the end of the tone interval
            if power_high:
                count += 1
            else:
                ret.append( (n-count-delay, n-delay) )
                state = WAIT_TONE
    return ret

In [None]:
def show_tone_clips(x: np.ndarray, t_pow: float, Fs: float):
    plt.plot(x)
    for clip in tone_clips(x, t_pow, Fs):
        plt.axvline(clip[0], color='green')        
        plt.axvline(clip[1], color='red')

In [None]:
show_tone_clips(x, 0.5, Fs)

## Your turn

You now have pretty much all the elements to implement a simple DTMF decoder. Once you have it working, you can try it with the set of test signals available in the data directory of this notebook. The test signals contains some of the typical impairments that a real-world decoder has to face: noise, detuning, timing issues, attenuation. How many of them can your decoder get right? Find out by running the following cell, and don't worry if there are mistakes, some of the test signals are really difficult!)

In [None]:
# function prototype for your decoder

def DTMF_decode(x: np.ndarray, Fs: float, 
                # additional arguments here if needed
               ) -> list[str]:
    keys = ''
    ...
    return keys

In [None]:
def read_test_file(filename: str) -> np.ndarray:
    # helper function to load a DTMF test file
    fs, x = wavfile.read(os.path.join('data', filename))    
    # normalize audio data to [-1, 1] if necessary
    if x.dtype is np.dtype(np.int16) :   
        x = x / 32767.0
    return fs, x

In [None]:
def test_decoder(basename: str):
    # decode all test files with the same basename and check the results
    with open(os.path.join('data', f'{basename}.log'), 'r') as csvfile:
        test_files = csv.DictReader(csvfile, fieldnames=['ix', 'keys'], delimiter=';')
        for file in test_files:
            fs, s = read_test_file(f'{basename}{file["ix"]}.wav')
            # use your DTMF decoding function here!
            res = DTMF_decode(s, fs, 
                              # other parameters if needed
                              )
            flag = 'OK' if str(res).strip() == str(file["keys"]).strip() else "ERROR"
            print(f'{file["ix"]}: {flag}   (encoded: {file["keys"]}, decoded: {res})')

In [None]:
test_decoder('test')