Installing Whisper

In [None]:
%brew install ffmpeg
%pip install -U openai-whisper

Installing and checking if Piper is downloaded correctly

In [6]:
%pip install -q --upgrade pip
%pip install -q piper-tts
import shutil, sys
print("piper CLI:", shutil.which("piper"))
try:
    import piper
    print("piper module:", getattr(piper, "__version__", "import ok"))
except Exception as e:
    print("import error:", e)

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
piper CLI: /Users/mukundsenthilkumar/Documents/Offline-AI-Kiosk/venv/bin/piper
piper module: import ok


Whisper Demo 1: Using English audio file

In [None]:
from faster_whisper import WhisperModel
model = WhisperModel("medium", device="cpu", compute_type="int8")  # great on Apple Silicon
segments, info = model.transcribe("WhisperDemo1.m4a")  # or task="transcribe"
text = "".join(s.text for s in segments)
print(text)

 Hello, testing, testing, testing.


Whisper Demo 2: Using Hindi audio file

In [10]:

from faster_whisper import WhisperModel
model = WhisperModel("medium", device="cpu", compute_type="int8") 
segments, info = model.transcribe("WhisperDemoHi.m4a", language="hi")  
text = "".join(s.text for s in segments)
print(text)

 ‡§π‡§æ‡§à ‡§®‡§Æ‡§∏‡•ç‡§§‡•á ‡§Æ‡•á‡§∞‡•á ‡§®‡§æ‡§Æ ‡§ï‡§æ‡§∞‡•ç‡§§‡•á ‡§ï‡•ç‡§Ø‡§æ ‡§π‡•à ‡§Ü‡§™‡§ï‡§æ ‡§®‡§æ‡§Æ ‡§ï‡•ç‡§Ø‡§æ ‡§π‡•à? ‡§Æ‡•à‡§Ç ‡§î‡§∞ ‡§™‡§ö ‡§∂‡•ç‡§ï‡•Ç‡§≤ ‡§ó‡§π‡•Ä ‡§•‡§æ ‡§î‡§∞ ‡§µ‡§æ‡§π‡§æ‡§Å‡§™‡•á ‡§≠‡§π‡•Ç‡§§ ‡§∏‡§æ‡§• ‡§à‡§™‡§∞‡§æ‡§à ‡§ï‡§∞‡•Ä‡§•‡•Ä ‡§î‡§∞ ‡§ú‡•à ‡§Æ‡•à‡§Ç ‡§™‡§∞‡§æ‡§à ‡§ï‡§∞ ‡§•‡§æ ‡§π‡•à‡§Ç, ‡§Æ‡•á‡§∞‡•Ä ‡§™‡•à‡§∞ ‡§Æ‡•á‡§Ç ‡§ö‡•â‡§§ ‡§≤‡§ó‡•Ä ‡§•‡•Ä, ‡§î‡§∞ ‡§Æ‡•à‡§Ç ‡§ö‡§£–ª–∏—à–∫–æ–º‡§π ‡§®‡§π‡•Ä‡§Ç ‡§™‡§æ ‡§∞‡§π‡§æ ‡§•‡§æ ‡§§‡•ã ‡§Æ‡•à‡§Ç ‡§á‡§∏‡§≤‡§ø‡§ï ‡§Ø‡§æ ‡§ï‡§∞‡•Ç‡§Å? ‡§Ö‡§ö‡•ç‡§õ‡§æ‡§®‡•á ‡§ï‡•á ‡§®‡§π‡•Ä‡§Ç ‡§ó‡•á‡§®‡§æ‡§´‡•Å‡•§


Whisper Demo 3: Translating Hindi to English

In [11]:
from faster_whisper import WhisperModel
model = WhisperModel("medium", device="cpu", compute_type="int8") 
segments, info = model.transcribe("WhisperDemoHi.m4a", task="translate", language="hi")  
text = "".join(s.text for s in segments)
print(text)

 Hi, Namaste. My name is Kartik. What is your name? What else should I say or should I keep talking? I went to school the day before yesterday and I was studying a lot. When I was studying, I had an injury in my leg and I was not able to walk. So what should I do? Enough.


Whisper Demo 4: Recording speech and coverting it to text

In [2]:
import sounddevice as sd
from scipy.io.wavfile import write
import numpy as np
import os

file_path = "/Users/mukundsenthilkumar/Documents/Offline-AI-Kiosk/input.wav"
if os.path.exists(file_path):
    os.remove(file_path)

SAMPLE_RATE = 16000
DURATION_S  = 5
OUT_WAV     = "input.wav"

print("üéôÔ∏è Recording... speak now")
audio = sd.rec(int(DURATION_S * SAMPLE_RATE), samplerate=SAMPLE_RATE, channels=1, dtype='float32')
sd.wait()
audio_i16 = (audio * 32767).astype("int16")
write(OUT_WAV, SAMPLE_RATE, audio_i16)
print(f"‚úÖ Saved {OUT_WAV}")

from faster_whisper import WhisperModel
model = WhisperModel("medium", device="cpu", compute_type="int8")  # great on Apple Silicon
segments, info = model.transcribe("input.wav", task="translate", language="hi")  # or task="transcribe"
text = "".join(s.text for s in segments)
print(text)

üéôÔ∏è Recording... speak now
‚úÖ Saved input.wav



Piper Demo 1: Testing Male voice

In [14]:
import shutil, sys, subprocess
from IPython.display import Audio, display
from pathlib import Path

file_path = "/Users/mukundsenthilkumar/Documents/Offline-AI-Kiosk/hello1.wav"
if os.path.exists(file_path):
    os.remove(file_path)

PIPER = shutil.which("piper") or [sys.executable, "-m", "piper"]
VOICE = str(Path("voices/en_US-hfc_male-medium.onnx").resolve())

cmd = ([PIPER] if isinstance(PIPER, str) else PIPER) + ["-m", VOICE, "-f", "hello1.wav"]
subprocess.run(cmd, input="Hello from Piper. I am super awesome".encode("utf-8"), check=True)
print("‚úÖ Piper generated hello1.wav")
display(Audio("hello1.wav", autoplay=True))


‚úÖ Piper generated hello1.wav


Piper Demo 2: Testing a longer message with a female voice

In [18]:
import shutil, sys, subprocess
from IPython.display import Audio, display
from pathlib import Path

file_path = "/Users/mukundsenthilkumar/Documents/Offline-AI-Kiosk/hello2.wav"
if os.path.exists(file_path):
    os.remove(file_path)

PIPER = shutil.which("piper") or [sys.executable, "-m", "piper"]
VOICE = str(Path("voices/en_US-hfc_female-medium.onnx").resolve())

text = """""
Yo, it‚Äôs Alex Pak, the Korean with no stack,
Face so flat, even mirrors talk back‚Äî"Bro, you good?"
Looking like a pancake that never rose,
Profile straight as the lies he told his hoes.

No jawline, just a suggestion,
Side view got no f-ing‚Äô dimension.
Origami face, fold it in half,
Siri can't find depth‚Äîbitch, do the math.

Came outta Seoul with no Seoul glow,
Got that ‚ÄúNPC default‚Äù flow.
Kimchi hot, but Pak stays cold,
Built like a rice cracker three days old.

Swagger on zero, rizz in decline,
Girl said "talk dirty"‚Äîhe replied in a flatline.
Tried to pose up, thought he snapped,
But the camera said "nah," and the pixels clapped.

He a chopstick in a world full of blades,
No edge, no sauce, just rice with no taste.
Dropped in the cypher, thought he was lit,
But even K-pop said, ‚ÄúNah fam, you don‚Äôt fit.‚Äù
"""

cmd = ([PIPER] if isinstance(PIPER, str) else PIPER) + ["-m", VOICE, "-f", "hello2.wav"]
subprocess.run(cmd, input=text.encode("utf-8"), check=True)
print("‚úÖ Piper generated hello2.wav")
display(Audio("hello2.wav", autoplay=True))

‚úÖ Piper generated hello2.wav
