# My Own Jarvis! Building a Voice Assistant
Using OpenAI's ChatGPT **Function Calling** feature, you can build a personal AI assistant similar to Iron Man's Jarvis.  

![](assets/stt_chatgpt_tts_en.png)

Conversations in Korean are also possible. To implement real-time conversation functionality, the following technologies can be utilized:  

1. Convert the user's voice to text using the STT (Speech to Text) service provided by Microsoft Azure.
2. Use ChatGPT's Function Calling feature to selectively execute predefined APIs or code.
3. The execution results are spoken back to the user using the TTS (Text to Speech) service provided by Microsoft Azure.

> **Notes**
>- ***If there are route search APIs available for each country, modify the code accordingly and execute it.***
>- This code has been tested with Python 3.11.4 kernel / Azure OpenAI 1.13.3 version.
>- The code below is for PoC purposes. It is not a complete solution and should be used as a reference.
>- To utilize the STT and TTS features, you will need hardware such as a microphone and speakers. If your development environment is container-based, it might not work properly.

In [None]:
# This is for cases when the script runs on a desktop (local environment). The libraries need to be reinstalled if necessary.
# First, you need to install the Python runtime. https://www.python.org/downloads/
# !pip install azure-identity
# !pip install -r ../requirements.txt

Load the environment variables required for execution. Save the necessary information in advance in a `.env` file, as outlined below:

1. Azure OpenAI API details
2. Azure Speech API details
3. Kakao REST API details (https://developers.kakao.com/console/app)
4. OpenWeatherMap API details (https://openweathermap.org/current)

In [1]:
import azure.cognitiveservices.speech as speechsdk
import os
from openai import AzureOpenAI
import json
import requests
import pytz
from urllib import parse
from datetime import datetime
from dotenv import load_dotenv
load_dotenv()

client = AzureOpenAI(
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT","").strip(),
    api_key        = os.getenv("AZURE_OPENAI_API_KEY"),
    api_version    = os.getenv("OPENAI_API_VERSION")
)

deployment_name    = os.getenv('DEPLOYMENT_NAME')
speech_key         = os.getenv("AZURE_SPEECH_KEY")         # Speech Key for Azure Speech Service
speech_region      = os.getenv("AZURE_SPEECH_REGION")      # Service region for Azure Speech Service
speech_language    = os.getenv("AZURE_SPEECH_LANGUAGE")    # Language for Azure Speech Service
KAKAO_API_KEY      = os.getenv("KAKAO_REST_API_KEY")       # API key for Kakao REST API
WEATHER_API_KEY    = os.getenv("WEATHER_API_KEY")          # OpenWeatherMap API key for weather information

### Collect user commands as text using the Azure Speech To Text (STT) engine

In [2]:
# Azure Cognitive Speech to Text function
def stt():
    # Creates a recognizer with the given settings
    # Azure STT & TTS API key
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region, speech_recognition_language='ko-KR')
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

    print("Please speak now~")

    # Starts speech recognition and returns after recognizing a single utterance. The end of a
    # single utterance is determined by listening for silence at the end or processing 15 seconds maximum of audio.
    result = speech_recognizer.recognize_once()

    # Checking the result
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Speech recognition result: {}".format(result.text))
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print("No matching speech found: {}".format(result.no_match_details))
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech recognition was canceled: {}".format(
            cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(
                    cancellation_details.error_details))
    return result

### Read text aloud using the Azure Text To Speech (TTS) engine

In [3]:
# Azure Cognitive Text to Speech function
def tts(input):
    # Set the voice name, refer to https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts for full list.
    # speech_config.speech_synthesis_voice_name = "ko-KR-InJoonNeural"
    # Creates a synthesizer with the given settings
    # Azure STT & TTS API key
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
    speech_config.speech_synthesis_voice_name = "ko-KR-SeoHyeonNeural"
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

    # Synthesizes the received text to speech.
    result = speech_synthesizer.speak_text_async(input).get()

    # Checking the result
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized to speaker for text [{}]".format(input))
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech synthesis canceled: {}".format(
            cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(
                    cancellation_details.error_details))
        print("Did you update the subscription info?")

### Find location coordinates (latitude/longitude) based on a place name using Kakao API

In [4]:
# Function using Kakao REST API (includes address-to-coordinates conversion)
headers = {
    "Authorization": f"KakaoAK {KAKAO_API_KEY}",
    "Content-Type": "application/json",
}    

def get_location_xy(keyword="Microsoft Korea"):
    url = "https://dapi.kakao.com/v2/local/search/keyword.json"
    params = {"query": keyword}

    response = requests.get(url, headers=headers, params=params)
    if response.status_code == 200:
        places = response.json().get('documents', [])
        results = []
        for place in places:
            name = place.get('place_name')
            latitude = place.get('y')
            longitude = place.get('x')
            results.append({'place_name': name, 'y': latitude, 'x': longitude})
        return results[0]
    else:
        raise Exception(f"API request failed: {response.status_code}")


# Convert from seconds to hours, minutes, and seconds
def convert_second(seconds):
    seconds = seconds % (24 * 3600)
    hour = seconds // 3600
    seconds %= 3600
    minutes = seconds // 60
    seconds %= 60
    
    return "%d hours %d minutes %d seconds" % (hour, minutes, seconds)

# Convert from meters to kilometers
def convert_meter(meter):
    return str(round(meter / 1000, 2))


In [5]:
print(get_location_xy("Microsoft Korea"))

{'place_name': '한국마이크로소프트', 'y': '37.574780065995', 'x': '126.979071101633'}


### Function 1. Kakao Mobility Route API

In [6]:
# Kakao route API
def get_directions(origin, destination, waypoints="", priority="RECOMMEND", car_fuel="GASOLINE", car_hipass="true", alternatives="false", road_details="false"):
    # Collect location coordinates based on the keyword
    xy_info = get_location_xy(origin)
    origin_xy_info = xy_info["x"] + "," + xy_info["y"] + ",name=" + xy_info["place_name"]
    xy_info = get_location_xy(destination)
    destin_xy_info = xy_info["x"] + "," + xy_info["y"] + ",name=" + xy_info["place_name"]
    
    params = {
        "origin": origin_xy_info,
        "destination": destin_xy_info,
        "waypoints": waypoints,
        "priority": priority,
        "car_fuel": car_fuel,
        "car_hipass": car_hipass,
        "alternatives": alternatives,
        "road_details": road_details,
    }
    url = "https://apis-navi.kakaomobility.com/v1/directions?{}".format("&".join([f"{k}={v}" for k, v in params.items()]))
    response = requests.get(url, headers=headers)
    
    response_summary = response.json()["routes"][0]["summary"]
    return_data = {
        "origin_name": response_summary["origin"]["name"],
        "destination_name": response_summary["destination"]["name"],
        "taxi_fare": response_summary["fare"]["taxi"],
        "toll_fare": response_summary["fare"]["toll"],
        "distance": convert_meter(response_summary["distance"]) + "km",
        "duration": convert_second(response_summary["duration"]),
    }
    
    return json.dumps(return_data)

### Function 2. Kakao Mobility Future Route API

In [7]:
# Kakao future route API
def get_future_directions(origin, destination, departure_time, waypoints="", priority="RECOMMEND", car_fuel="GASOLINE", car_hipass="true", alternatives="false", road_details="false"):
    # Collect location coordinates based on the keyword
    xy_info = get_location_xy(origin)
    origin_xy_info = xy_info["x"] + "," + xy_info["y"] + ",name=" + xy_info["place_name"]
    xy_info = get_location_xy(destination)
    destin_xy_info = xy_info["x"] + "," + xy_info["y"] + ",name=" + xy_info["place_name"]
    
    params = {
        "origin": origin_xy_info,
        "destination": destin_xy_info,
        "waypoints": waypoints,
        "priority": priority,
        "car_fuel": car_fuel,
        "car_hipass": car_hipass,
        "alternatives": alternatives,
        "road_details": road_details,
        "departure_time": departure_time,
    }
    url = "https://apis-navi.kakaomobility.com/v1/future/directions?{}".format("&".join([f"{k}={v}" for k, v in params.items()]))
    response = requests.get(url, headers=headers)
    
    response_summary = response.json()["routes"][0]["summary"]
    return_data = {
        "origin_name": response_summary["origin"]["name"],
        "destination_name": response_summary["destination"]["name"],
        "taxi_fare": response_summary["fare"]["taxi"],
        "toll_fare": response_summary["fare"]["toll"],
        "distance": convert_meter(response_summary["distance"]) + "km",
        "duration": convert_second(response_summary["duration"]),
    }
    
    return json.dumps(return_data)


### Function 3. Retrieve Real-Time Local Time Information

In [8]:
def get_current_time(location):
    try:
        # Get the timezone for the city
        timezone = pytz.timezone(location)

        # Get the current time in the timezone
        now = datetime.now(timezone)
        current_time = now.strftime("%Y%m%d%H%M")

        return current_time
    except:
        return "Sorry, unable to find the timezone for this location."

### Function 4. Retrieve Real-Time Local Weather Information

In [9]:
# Function to fetch weather for a specific location
def get_current_weather(location="서울 종로구 종로1길 50"):
    xy_info = get_location_xy(location)
    params = {
        "lat": xy_info["y"],
        "lon": xy_info["x"],
        "units": "metric",
        "lang":  "en",
        "appid": WEATHER_API_KEY
    }
    url = "https://api.openweathermap.org/data/2.5/weather?{}".format("&".join([f"{k}={v}" for k, v in params.items()]))
    response = requests.get(url, headers=headers)

    return_data = {
        "Weather_main": response.json()["weather"][0]["main"],
        "Weather_description": response.json()["weather"][0]["description"],
        "Temperature_Celsius": response.json()["main"]["temp"],
        "Humidity": response.json()["main"]["humidity"],
        "Cloudiness": response.json()["clouds"]["all"]
    }

    return json.dumps(return_data)

#### Verifying OpenWeatherMap Weather API Functionality

In [10]:
# Verify that weather information is collected properly
response = get_current_weather("한국마이크로소프트")
print(response)

{"Weather_main": "Clouds", "Weather_description": "overcast clouds", "Temperature_Celsius": -2.34, "Humidity": 38, "Cloudiness": 87}


### Define Functions for Use with OpenAI Function Calling

In [11]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_directions",
            "description": "API to search routes based on origin and destination information",
            "parameters": {
                "type": "object",
                "properties": {
                    "origin": {"type": "string"},
                    "destination": {"type": "string"},                    
                },
                "required": ["origin", "destination"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_future_directions",
            "description": "API to search routes based on origin and destination information based on future departure_time",
            "parameters": {
                "type": "object",
                "properties": {
                    "origin": {"type": "string"},
                    "destination": {"type": "string"},
                    "departure_time": {
                        "type": "string",
                        "description": "The time format of the given time must be converted to %Y%m%d%H%M format. If there is no year information, 2023 is used as the default. ",
                    },
                },
                "required": ["origin", "destination", "departure_time"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Get the current time in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The location name. The pytz is used to get the timezone for that location. Location names should be in a format like Asia/Seoul, America/New_York, Asia/Bangkok, Europe/London"
                    }
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather information in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name. City names should be in a format like Seoul, Busan, Sokcho, Daegu"
                    }
                },
                "required": ["location"],
            },
        }
    }
]

available_functions = {
    "get_directions": get_directions,
    "get_future_directions": get_future_directions,
    "get_current_time": get_current_time,
    "get_current_weather": get_current_weather,
} 

#### Function to validate the arguments provided to each function

In [12]:
import inspect

# Helper method used to check if the correct arguments are provided to a function
def check_args(function, args):
    sig = inspect.signature(function)
    params = sig.parameters

    # Check if there are extra arguments
    for name in args:
        if name not in params:
            return False
    # Check if the required arguments are provided 
    for name, param in params.items():
        if param.default is param.empty and name not in args:
            return False

    return True

#### Function to automatically select and execute functions based on user intent using the OpenAI GPT model

In [13]:
def run_conversation(messages, tools, available_functions, deployment_name):
    # Step 1: Send the conversation and available functions to GPT
    response = client.chat.completions.create(
        model = deployment_name,
        messages = messages,
        tools = tools,
        tool_choice="auto"    # auto is default, but we'll be explicit
    )

    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls
    # Step 2: Check if GPT wanted to call a function
    if tool_calls:
        # Step 3: Call the function
        # Note: The JSON response may not always be valid; be sure to handle errors
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            # Verify function exists
            if function_name not in available_functions:
                return "Function " + function_name + " does not exist"
            fuction_to_call = available_functions[function_name]  
        
        # Verify function has correct number of arguments
        function_args = json.loads(tool_call.function.arguments)
        if check_args(fuction_to_call, function_args) is False:
            return "Invalid number of arguments for function: " + function_name
        function_response = fuction_to_call(**function_args)
        
        # Add the following conditional based on the function_name
        if function_name == "get_directions" or function_name == "get_future_directions":
            messages.append(
                {"role": "system", "content": "You are a bot that guides you through car routes. When the user provides the origin and destination names, you provide summary route guidance information."}
            )
        elif function_name == "get_current_weather":
            messages.append(
                {"role": "system", "content": "You are an agent that tells the user about the weather. You describe based on the given data without making interpretations or irrelevant statements."},
            )
        elif function_name == "get_current_time":
            messages.append(
                {"role": "system", "content": "You are a bot that tells global times. Reply strictly based on the given data without interpretations or additional statements."},
            )
        else :
            messages.append(
                {"role": "system", "content": "You are an AI assistant that helps users find information. Your answers must be factual. Try to provide concise, clear responses."},
            )
        
        # Add assistant response to messages
        messages.append(response_message)  # Extend conversation with assistant's reply

        # Add function response to messages
        messages.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": function_response,
            }
        )  # Extend conversation with function response

        second_response = client.chat.completions.create(
            model = deployment_name,
            messages = messages,
            temperature=0
        )
        # Get a new response from GPT where it can see the function response
        
        return second_response

#### Function for natural language queries with GPT
In this example, requests that do not correspond to Function Calling will terminate.

In [14]:
def gpt(input):
    messages = [
        {"role": "user", "content": input}
    ]
    assistant_response = run_conversation(messages, tools, available_functions, deployment_name)
    # If assistant_response is empty
    if not assistant_response:
        return "I cannot answer this question. Please try asking in a different way."
    else:
        content = json.dumps(assistant_response.choices[0].message.content, ensure_ascii=False, indent=4)
        content = content.replace("\\n", "\n").replace("\\\"", "\"")
        return content


Use GPT to inquire and verify if each functionality operates correctly.

In [17]:
future_time = ""
# future_time = "2023년 12월 21일 18시에"
origin_name = "한국마이크로소프트"
destin_name = "강남역"
query = f"{future_time} {origin_name}에서 {destin_name}까지 얼마나 걸려?"
query = "How's the weather in Paris, France?"
# query = "What time is it in Hawaii?"

print(gpt(query))

"The weather in Paris, France, is currently clear with a clear sky. The temperature is -2.24°C, the humidity is 64%, and there is no cloudiness."


In [18]:
query = "What time is it in Hawaii?"

print(gpt(query))

"The time in Hawaii (Honolulu) is 3:22 AM on January 2, 2025."


## The code below works in an environment with a local PC (physical microphone and speaker).
**Note**  
>- If executed in Docker or Codespace, it might not work efficiently.

In [None]:
# if __name__ == "__main__":
tts("Try asking about weather, navigation, or time.")

while True:
    result_stt = stt().text
    print(result_stt)
    if(result_stt == ""):
        # Speech recognition failed
        print("Speech recognition failed")
        tts("Speech recognition failed. Please try again.")
    elif(result_stt == "Exit." or result_stt == "Close."):
        print("Ending conversation")
        break
    else:
        # Speech recognition succeeded
        result_gpt = gpt(result_stt)
        tts(result_gpt)

Now it's time to add your own APIs one by one.
Good luck!