# Building a Simple Local AI Voice Assistant

# Requirements

1. Be able to control the voice assistant from voice
2. BE able to talk to it as well as perform commands/actions 
3. Tasks
   1. Create tasks in some task backlog db
   2. Create, read, edit and delete files locally
   3. Send emails
   4. Fully local setup (audio transcription and the AI should be fully local)
   5. Answer questions about personal notes and knowledge management stuff 

In [17]:
# 1 - Voice Control 
# We'll need an audion transcription model to convert audio to text 
# We'll use whisper turbo 3
# source: https://huggingface.co/openai/whisper-large-v3-turbo
# pip install --upgrade pip
# pip install --upgrade transformers datasets[audio] accelerate

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3-turbo"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

result = pipe("audio-testfile.mp3")
print(result["text"])

Device set to use cuda:0
Using custom `forced_decoder_ids` from the (generation) config. This is deprecated in favor of the `task` and `language` flags/config options.
Transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English. This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. See https://github.com/huggingface/transformers/pull/28687 for more details.


,


In [18]:
# 2 - Interaction using LLM
# We'll use llama 3.2 with Ollama https://ollama.com/
# pip install ollama
# https://github.com/ollama/ollama-python

import ollama
response = ollama.chat(model='llama2:13b', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])



The sky appears blue because of a phenomenon called Rayleigh scattering, which is the scattering of light by small particles in the atmosphere. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen, as well as small particles like dust and water droplets. These particles scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is why the sky appears blue during the daytime, as the blue light is scattered in all directions and reaches our eyes from all parts of the sky.

In addition to Rayleigh scattering, the sky can also appear blue due to a phenomenon called Mie scattering, which is the scattering of light by larger particles like dust and water droplets. This type of scattering occurs at shorter wavelengths than Rayleigh scattering and can make the sky appear more blue, especially in conditions where there are high levels of pollution or aerosols in the atmosphere.


In [19]:
def get_response(prompt):
    response = ollama.chat(model='llama2:13b', 
                           messages=[{'role': 'user', 'content': prompt}])
    return response['message']['content']

get_response("What is the country known for having the best weather in the world?")

"\nThere is no definitive answer to what country has the best weather in the world, as weather conditions can vary greatly depending on the location and time of year. However, there are several countries that are known for their pleasant and consistent weather, such as:\n\n1. Hawaii, USA - Known for its tropical climate and warm temperatures year-round, with average temperatures ranging from 70°F to 85°F (21°C to 30°C).\n2. Costa Rica - With a tropical climate and two distinct seasons (dry and rainy), Costa Rica offers mild weather conditions throughout the year, with average temperatures ranging from 70°F to 84°F (21°C to 30°C).\n3. Australia - Australia has a diverse range of climates, but the southern states such as Tasmania and Victoria are known for their mild weather conditions, with average temperatures ranging from 50°F to 70°F (10°C to 21°C) during the summer months.\n4. South Africa - The western coast of South Africa, particularly the region around Cape Town, is known for it

In [20]:
import pyaudio
import wave

def record_audio(filename="prompt.mp3", duration=5, sample_rate=44100, channels=2, chunk=1024):
    """
    Record audio from the microphone and save it to a file.
    
    :param filename: Name of the output file (default: "prompt.mp3")
    :param duration: Duration of the recording in seconds (default: 5)
    :param sample_rate: Sample rate of the recording (default: 44100 Hz)
    :param channels: Number of audio channels (default: 2 for stereo)
    :param chunk: Number of frames per buffer (default: 1024)
    """
    p = pyaudio.PyAudio()

    stream = p.open(format=pyaudio.paInt16,
                    channels=1,
                    rate=sample_rate,
                    input=True,
                    frames_per_buffer=chunk)

    print("Recording...")

    frames = []

    for i in range(0, int(sample_rate / chunk * duration)):
        data = stream.read(chunk)
        frames.append(data)

    print("Recording finished.")

    stream.stop_stream()
    stream.close()
    p.terminate()

    # Save the recorded data as a WAV file
    wf = wave.open(filename.replace('.mp3', '.wav'), 'wb')
    wf.setnchannels(channels)
    wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
    wf.setframerate(sample_rate)
    wf.writeframes(b''.join(frames))
    wf.close()

    print(f"Audio saved as {filename.replace('.mp3', '.wav')}")

# Example usage:            
record_audio()

Recording...
Recording finished.
Audio saved as prompt.wav


In [21]:
def transcribe(audio_filepath):
    result = pipe(audio_filepath)
    return result.get("text")

transcribe("./prompt.wav")

','

In [22]:
record_audio()
prompt = transcribe("./prompt.wav")
get_response(prompt)

Recording...
Recording finished.
Audio saved as prompt.wav


'\nI don\'t understand what you mean by ",". Could you explain?'

Stiching together models

Taking the recorded transcripts and putting into llama3.2 for now
Then have a Whisper model read out the responses

In [23]:
from transformers import pipeline

pipe = pipeline("text-to-speech", model="suno/bark")

from transformers import AutoProcessor, AutoModel

processor = AutoProcessor.from_pretrained("suno/bark")
model = AutoModel.from_pretrained("suno/bark")

inputs = processor(
    text=["Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."],
    return_tensors="pt",
)

speech_values = model.generate(**inputs, do_sample=True)

record_audio()
prompt = transcribe("./prompt.wav")
get_response(prompt)
# transcribe("./response.wav")

# def transcribe_and_respond(audio_input):
#     if audio_input is None:
#         return "No audio input provided."

#     # Transcribe the audio input
#     transcription = transcribe(audio_input.name)
#     if not transcription:
#         return "Transcription failed. Please try again."   
#     # Get the response from the LLM
#     response = get_response(transcription)
#     if not response:
#         return "Failed to get a response from the LLM. Please try again."
#     # Return the transcription and response
#     return transcription, response



Device set to use cuda:0
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.


Recording...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.


Recording finished.
Audio saved as prompt.wav


'\nI don\'t understand what you mean by "<SYS>" and "</SYS>". Could you explain?'

**Tasks**
   1. Create tasks in some task backlog db
   2. Create, read, edit and delete files locally
   3. Send emails
   4. Fully local setup (audio transcription and the AI should be fully local)
   5. Answer questions about personal notes and knowledge management stuff 

   6. GPS Directions
   7. Local trivia
   8. 

In [41]:
# Creating the task db first before writing the tools for the model
import pandas as pd
from datetime import datetime

# Create an empty DataFrame for the tasks database
tasks_df = pd.DataFrame(columns=['task', 'status', 'creation_date', 'completed_date'])

tasks_df

Unnamed: 0,task,status,creation_date,completed_date


In [42]:
# Tool for adding a task
def add_task(task_description):
    """
    Add a task to the tasks database.
    """
    new_task = pd.DataFrame({
        'task': [task_description],
        'status': ['Not Started'],
        'creation_date': [datetime.now().strftime('%Y-%m-%d %H:%M:%S')],
        'completed_date': [None]
    })
    global tasks_df
    tasks_df = pd.concat([tasks_df, new_task], ignore_index=True)
    
    return tasks_df

# Tool Calling

Tool calling is about giving LLMs the ability to perform actions.

```
{
    'type': 'function',
    'function': {
        'name': 'create_file',
        'description': 'Create a new file with given content',
        'parameters': {
            'type': 'object',
            'properties': {
                'filename': {
                    'type': 'string',
                    'description': 'The name of the file to create',
                },
                'content': {
                    'type': 'string',
                    'description': 'The content to write to the file',
                },
            },
            'required': ['filename', 'content'],
        },
    },
},
```

In [43]:
tool_add_tasks_to_db = {
    'type': 'function',
    'function': {
        'name': 'add_task',
        'description': 'Add a task to the tasks database',
        'parameters': {
            'type': 'object',
            'properties': {
                'task_description': {
                    'type': 'string',
                    'description': 'The description of the task to add',
                },
            },
            'required': ['task_description'],
        },
    },
}

In [44]:
# Creating tasks in a backlog task db
def get_response_with_tools(prompt):
    response = ollama.chat(model='llama3.2', 
                           messages=[{'role': 'user', 'content': prompt}],
                           tools=[tool_add_tasks_to_db])
    # Process tool calls if present
    if 'tool_calls' in response['message']:
        for tool_call in response['message']['tool_calls']:
            if tool_call['function']['name'] == 'add_task':
                task_description = tool_call['function']['arguments']['task_description']
                add_task(task_description)
                print(f"Task added: {task_description}")
    else:
        return response['message']['content']

In [45]:
get_response_with_tools("Create a task to create a local voice AI assistant")

Task added: Create a local voice AI assistant


In [46]:
tasks_df

Unnamed: 0,task,status,creation_date,completed_date
0,Create a local voice AI assistant,Not Started,2025-07-16 14:16:46,


In [47]:
tool_create_file = {
            'type': 'function',
            'function': {
                'name': 'create_file',
                'description': 'Create a new file with given content',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'filename': {
                            'type': 'string',
                            'description': 'The name of the file to create',
                        },
                        'content': {
                            'type': 'string',
                            'description': 'The content to write to the file',
                        },
                    },
                    'required': ['filename', 'content'],
                },
            },
        }
tool_read_file = {
            'type': 'function',
            'function': {
                'name': 'read_file',
                'description': 'Read the content of a file',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'filename': {
                            'type': 'string',
                            'description': 'The name of the file to read',
                        },
                    },
                    'required': ['filename'],
                },
            },
        }
tool_delete_file = {
            'type': 'function',
            'function': {
                'name': 'delete_file',
                'description': 'Delete a file',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'filename': {
                            'type': 'string',
                            'description': 'The name of the file to delete',
                        },
                    },
                    'required': ['filename'],
                },
            },
        }

tool_edit_file = {
            'type': 'function', 
            'function': {
                'name': 'edit_file',
                'description': 'Edit the content of a file',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'filename': {
                            'type': 'string',
                            'description': 'The name of the file to edit',
                        },
                        'content': {
                            'type': 'string',
                            'description': 'The content to write to the file',
                        },
                    },
                    'required': ['filename', 'content'],
                },
            },
        }
tools = [tool_create_file, tool_read_file, tool_delete_file, tool_add_tasks_to_db]

In [48]:
import os
# Writing functions to create, read, edit and delete files

def create_file(filename, content):
    with open(filename, 'w') as file:
        file.write(content)
    return f"File {filename} created successfully"

def read_file(filename):
    with open(filename, 'r') as file:
        return file.read()

def edit_file(filename, content):
    with open(filename, 'w') as file:
        file.write(content)
    return f"File {filename} edited successfully"

def delete_file(filename):
    os.remove(filename)
    return f"File {filename} deleted successfully"

# Creating tasks in a backlog task db
def get_response_with_tools(prompt):
    response = ollama.chat(model='llama3.2', 
                           messages=[{'role': 'user', 'content': prompt}],
                           tools=tools)
    # Process tool calls if present
    if 'tool_calls' in response['message']:
        for tool_call in response['message']['tool_calls']:
            if tool_call['function']['name'] == 'add_task':
                task_description = tool_call['function']['arguments']['task_description']
                add_task(task_description)
                print(f"Task added: {task_description}")
            elif tool_call['function']['name'] == 'create_file':
                print("Creating file...")
                filename = tool_call['function']['arguments']['filename']
                content = tool_call['function']['arguments']['content']
                create_file(filename, content)
                print(f"File created: {filename}")
            elif tool_call['function']['name'] == 'read_file':
                print("Reading file...")
                filename = tool_call['function']['arguments']['filename']
                content = read_file(filename)
                print(f"File content: {content}")
            elif tool_call['function']['name'] == 'delete_file':
                print("Deleting file...")
                filename = tool_call['function']['arguments']['filename']
                delete_file(filename)
                print(f"File deleted: {filename}")
    else:
        return response['message']['content']

In [55]:
get_response_with_tools("Create a file called 'test.txt' with the content 'Hello, world!'")

Creating file...
File created: test.txt


In [57]:
tasks_df

Unnamed: 0,task,status,creation_date,completed_date
0,Create a local voice AI assistant,Not Started,2025-07-16 14:16:46,


In [58]:
record_audio(duration=5)
prompt = transcribe("./prompt.wav")
get_response_with_tools(prompt)

Recording...
Recording finished.
Audio saved as prompt.wav




Creating file...
File created: experience.txt


In [59]:
# Save tasks_df to CSV file
tasks_df.to_csv('tasks.csv', index=False)
print("Tasks saved to tasks.csv")

Tasks saved to tasks.csv
