# STT(Speech to Text)
This notebook  will illustrates how to implement STT(Speech to Text) use the [ReSpeaker 4-Mic Array](https://www.seeedstudio.com/ReSpeaker-4-Mic-Array-for-Raspberry-Pi-p-2941.html).

This example notebook does the following.

* import python libraries
* select rpi switch and using MicroblazeLibrary
* initialize AC108 Voice Capture ADCs
* capture audio data
* play audio data
* convert audio format
* recognize

It uses the [ReSpeaker 4-Mic Array](https://www.seeedstudio.com/ReSpeaker-4-Mic-Array-for-Raspberry-Pi-p-2941.html) to capture audio and recognize the audio data.
![PYNQ Z2 and ReSpeaker](./data/respeaker_pynq.jpg)

The overlay includes a custom IP core to transfer audio data.
![Block Design](./data/block_design.png)

 ### 1. ReSpeaker 4-Mic Array Introduction
 ReSpeaker 4-Mic Array is a 4 microphone expansion board designed for AI and voice applications. This means that you can build a more powerful and flexible voice product that integrates Amazon Alexa Voice Service, Google Assistant, and so on.

There are several algorithms such as DOA, VAD, NS and KWS we can use with the 4 mic array.
![PYNQ Z2 and ReSpeaker](./data/4_mic_array.jpg)

### 2. Prepare the overlay
Download the overlay first, then select the shared pin to be connected to RPI header (by default, the pins will be connected to PMODA instead).

In [1]:
from pynq import PL
from pynq import Overlay
from pynq import MMIO
import numpy as np
import matplotlib.pyplot as plt
import wave
from IPython.display import Audio as IPAudio
import audioop
from soundfile import SoundFile
from aip import AipSpeech
from respeaker import *

### 3. Initialize hardware
Load overlay and intialize the ReSpeaker by I2C.

The block design includes a ReSpeaker IP core to transfer the PCM TDM format audio data. 

In [2]:
ol = Overlay("./overlays/respeaker_wifi.bit")
ol.download()
ac108_init()

### 3.Define parameters
Define the custom IP address and Baidu STT authentication

In [3]:
RESPEAKER_ADDR = 0x43C00000
RESPEAKER_RANGE = 0x1000
RESPEAKER_OFFSET = 0x00

APP_ID = input('Please input APP ID: ')
API_KEY = input('Please input API KEY: ')
SECRET_KEY = input('Please input SECRET KEY: ')
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

Please input APP ID: ******
Please input API KEY: ******
Please input SECRET KEY: ******


### 4. Create MMIO and numpy array instances
The MMIO class allows a Python object to access addresses in the system memory mapped. In particular, registers and address space of peripherals in the PL can be accessed.

In [4]:
mmio = MMIO(RESPEAKER_ADDR, RESPEAKER_RANGE)

cap_cnt = 44100
ch1 = np.zeros(shape=(cap_cnt),dtype=np.uint32)

### 5. Set audio format and Capture audio
the format of audio is 44.1khz sample rate and 32 bits depth.
Using mmio class to access registers and address space of peripherals in the PL.
record for 5 second.

In [5]:
Wave_write1 = wave.open(r"recong.wav", 'w')
Wave_write1.setnchannels(1)
Wave_write1.setsampwidth(4)
Wave_write1.setframerate(cap_cnt)

for t in range(0,5):
    for i in range(0,cap_cnt):
        ch1[i] = mmio.read(0)
    Wave_write1.writeframes(ch1.tobytes())
Wave_write1.close()

### 6.Play in notebook
Since the samples are in 32-bit PCM format, 
users can play the audio directly in notebook.

In [6]:
IPAudio("recong.wav")

### 7.Convert audio format
convert the audio to 16 bit and 16k sample rate

In [7]:
file = SoundFile('recong.wav')
temp_data = bytes(cap_cnt * 4)

file.buffer_read_into(temp_data, dtype='int16');

data = audioop.ratecv(temp_data, 2, 1, 44100, 16000, None)

### 8. Recognize
recognize the speech and print the result

In [8]:
result = client.asr(data[0], 'pcm', 16000, {
    'dev_pid': 1536,
})

print(result)

{'corpus_no': '6678463040283282544', 'err_msg': 'success.', 'err_no': 0, 'result': ['测试'], 'sn': '268622123641554950848'}
