Install the following packages using pip:
```
pip install ipywidgets
pip install pyaudio
pip install vosk
pip install transformers
pip install torch
```

In [1]:
%pip install ipywidgets

Note: you may need to restart the kernel to use updated packages.


In [None]:
import ipywidgets as widgets
from IPython.display import display
from queue import Queue
from threading import Thread

messages = Queue()
recordings = Queue()

record_button = widgets.Button(
    description='Record',
    disabled=False,
    button_style='success',
    tooltip='Record',
    icon='microphone'
)


stop_button = widgets.Button(
    description='Stop',
    disabled=False,
    button_style='warning',
    tooltip='Stop',
    icon='stop'
)

output = widgets.Output()

def start_recording(data):
    messages.put(True)
    
    with output:
        display("Starting...")
        record = Thread(target=record_microphone)
        record.start()
        transcribe = Thread(target=speech_recognition, args=(output,))
        transcribe.start()

def stop_recording(data):
    with output:
        messages.get()
        display("Stopped.")
    
record_button.on_click(start_recording)
stop_button.on_click(stop_recording)

display(record_button, stop_button, output)

In [3]:
!python -m pip install pyaudio

Collecting pyaudio
  Downloading PyAudio-0.2.14-cp38-cp38-win_amd64.whl.metadata (2.7 kB)
Downloading PyAudio-0.2.14-cp38-cp38-win_amd64.whl (164 kB)
   ---------------------------------------- 0.0/164.1 kB ? eta -:--:--
   ---- ---------------------------------- 20.5/164.1 kB 320.0 kB/s eta 0:00:01
   --------- ----------------------------- 41.0/164.1 kB 393.8 kB/s eta 0:00:01
   ------------------- ------------------- 81.9/164.1 kB 573.4 kB/s eta 0:00:01
   -------------------------------------- 164.1/164.1 kB 818.2 kB/s eta 0:00:00
Installing collected packages: pyaudio
Successfully installed pyaudio-0.2.14


Note the index of the desired input device (e.g., your microphone) to use it in the recording script.

In [4]:
import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    print(p.get_device_info_by_index(i))

p.terminate()

{'index': 0, 'structVersion': 2, 'name': 'Microsoft Sound Mapper - Input', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.09, 'defaultLowOutputLatency': 0.09, 'defaultHighInputLatency': 0.18, 'defaultHighOutputLatency': 0.18, 'defaultSampleRate': 44100.0}
{'index': 1, 'structVersion': 2, 'name': 'External Microphone (Realtek(R)', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.09, 'defaultLowOutputLatency': 0.09, 'defaultHighInputLatency': 0.18, 'defaultHighOutputLatency': 0.18, 'defaultSampleRate': 44100.0}
{'index': 2, 'structVersion': 2, 'name': 'Microphone Array (IntelÂ® Smart ', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.09, 'defaultLowOutputLatency': 0.09, 'defaultHighInputLatency': 0.18, 'defaultHighOutputLatency': 0.18, 'defaultSampleRate': 44100.0}
{'index': 3, 'structVersion': 2, 'name': 'Microsoft Sound Mapper - Output', 'hostApi': 0, 'maxInputChan

The record_microphone function handles recording audio from the selected input device. Configure the CHANNELS, FRAME_RATE, RECORD_SECONDS, AUDIO_FORMAT, and input_device_index according to your setup.

In [21]:
CHANNELS = 1
FRAME_RATE = 16000
RECORD_SECONDS = 5
AUDIO_FORMAT = pyaudio.paInt16
SAMPLE_SIZE = 2

def record_microphone(chunk=1024):
    p = pyaudio.PyAudio()

    stream = p.open(format=AUDIO_FORMAT,
                    channels=CHANNELS,
                    rate=FRAME_RATE,
                    input=True,
                    input_device_index=7,  # Use the index of your microphone
                    frames_per_buffer=chunk)

    frames = []

    while not messages.empty():
        data = stream.read(chunk)
        frames.append(data)
        if len(frames) >= (FRAME_RATE * RECORD_SECONDS) / chunk:
            recordings.put(frames.copy())
            frames = []

    stream.stop_stream()
    stream.close()
    p.terminate()

In [9]:
%pip install vosk
%pip install transformers
%pip install torch

Collecting vosk
  Using cached vosk-0.3.45-py3-none-win_amd64.whl.metadata (1.8 kB)
Collecting srt (from vosk)
  Using cached srt-3.5.3.tar.gz (28 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting websockets (from vosk)
  Downloading websockets-12.0-cp312-cp312-win_amd64.whl.metadata (6.8 kB)
Using cached vosk-0.3.45-py3-none-win_amd64.whl (14.0 MB)
Downloading websockets-12.0-cp312-cp312-win_amd64.whl (124 kB)
   ---------------------------------------- 0.0/125.0 kB ? eta -:--:--
   --- ------------------------------------ 10.2/125.0 kB ? eta -:--:--
   --------- ----------------------------

The speech_recognition function uses the vosk model for converting recorded audio into text.

In [22]:
import subprocess
import json
from vosk import Model, KaldiRecognizer
import time

model = Model(model_name="vosk-model-small-en-us-0.15")
rec = KaldiRecognizer(model, FRAME_RATE)
rec.SetWords(True)
    
def speech_recognition(output):
    
    while not messages.empty():
        frames = recordings.get()
        
        rec.AcceptWaveform(b''.join(frames))
        result = rec.Result()
        text = json.loads(result)["text"]
        
        cased = subprocess.check_output('python recasepunc/recasepunc.py predict recasepunc/checkpoint', shell=True, text=True, input=text)
        output.append_stdout(cased)
        time.sleep(1)

Conclusion

This project demonstrates a basic implementation of a voice recording and speech recognition system using Python. The system can be extended with additional features such as handling longer recordings, integrating with other NLP models for further processing, and enhancing the user interface.