# Technical Documentation

## Overview

This documentation covers the components of a Python application that integrates Azure Cognitive Services for speech recognition, Twilio for messaging, and OpenAI for natural language processing. The application serves as a smart personal assistant for managing calls. The llm_module.py file is responsible for managing the interaction with the language model, processing user input, and generating appropriate responses. The azure_tts.py module handles text-to-speech conversion by communicating with Azure's API to generate audio from text and encoding the result in Base64 format for playback or further processing.


### Directory Structure

- `config.py`: Contains configuration settings and utility functions.
- `audio_server.py`: Handles audio playback.
- `azure_service_asr.py`: Manages speech recognition and interaction with the language model.
- `llm_module.py`: Handles interactions with the OpenAI language model and processes user requests.
- `azure_tts.py`: Converts text into speech using Azure Text-to-Speech services.
- `utils.py`: Provides helper functions for communication and message management.

## Configuration (`config.py`)

## Module Overview 
This module contains configuration settings and utility functions that are essential for the application. It typically includes API keys, service endpoints, and other environment-specific variables to streamline the integration of various components

### Environment Variables

The following environment variables are loaded from a `.env` file:

- `AZURE_TTS_URL`: URL for Azure Text-to-Speech service.
- `AZURE_SUB_KEY`: Subscription key for Azure services.
- `OPENAI_API_KEY`: API key for OpenAI services.
- `TWILIO_ACCOUNT_SID`: Twilio account SID for messaging.
- `TWILIO_AUTH_TOKEN`: Authentication token for Twilio.
- `WHATSAPP_NUMBER`: WhatsApp number for sending messages.
- `RECIPIENT_NUMBER`: Recipient's number for WhatsApp messages.
- `AZURE_API_KEY`: API key for Azure services.
- `AZURE_REGION`: Region for Azure services.

### Global Variables

- `client`: Instance of OpenAI client.
- `recipient_name`: Placeholder for the recipient's name (default: "Mr. Ravi Ranjan").
- `call_state`: Tracks the current call state (default: "LISTENING").
- `recognizer`: Holds the speech recognizer instance.

### Functions

#### `change_call_state(new_call_state)`

Changes the global call state to the provided new state.

- **Parameters**:
  - `new_call_state` (str): The new state to set (e.g., "LISTENING", "SPEAKING").
  
- **Returns**: 
  - (str): The updated call state.
  
- **Behavior**: This function updates the global `call_state` variable and returns the new state. It is used to control the flow of the assistant's operations based on the current interaction phase (e.g., listening or speaking).

**Code**:
```python
def change_call_state(new_call_state):
    """Change the global call state to the new state provided."""
    global call_state
    call_state=new_call_state
    return call_state
    

#### `get_call_state()`

Retrieves the current call state.

- **Returns**:
  - (str): The current call state.
  
- **Behavior**: This function simply returns the current value of the `call_state` variable, allowing other components of the application to determine if the assistant is currently listening or speaking.

**Code**:
```python
def get_call_state():
    """Retrieve the current call state."""
    return call_state

#### `change_recognizer(new_recognizer)`

Updates the global recognizer with a new recognizer instance.

- **Parameters**:
  - `new_recognizer`: The new recognizer instance to set.
  
- **Returns**:
  - The updated recognizer instance.
  
- **Behavior**: This function changes the global `recognizer` variable to a new instance, which is essential when initializing or switching recognizers.

**Code**:
```python
def change_recognizer(new_recognizer):
    """Update the global recognizer with a new recognizer instance."""
    global recognizer
    recognizer=new_recognizer
    return recognizer
    


#### `get_recognizer()`

Gets the current recognizer instance.

- **Returns**:
  - The current recognizer instance.
  
- **Behavior**: Returns the instance of the recognizer that is currently set, allowing other parts of the code to access the speech recognition functionality.

**Code**:
```python
def get_recognizer():
    """Get the current recognizer instance."""
    return recognizer

### Tools

- **send_message_to_whatsapp**: Tool for sending messages via WhatsApp.
  
  - **Parameters**:
    - `message` (str): The summary message to be sent to the recipient.
    
  - **Returns**: 
    - None.
    
  - **Behavior**: This tool is configured to send a message to the recipient via WhatsApp, using the parameters specified.
**Code**:
```python
tools=[
    {
    "type": "function",
    "function": {
        "name": "send_message_to_whatsapp",
        "description": "This tool sends a message to Mr. Ravi Ranjan via WhatsApp.",
        "parameters": {
            "type": "object",
            "properties": {
                "message": {
                    "type": "string",
                    "description": "The summary message to be sent to Mr. Ravi Ranjan."
                }
            },
            "required": ["message"]
        }
    }
   }
]
```
---

## Audio Service (`audio_service.py`)

## Module Overview 
This module is responsible for handling audio playback within the application. It manages audio streams, controls playback options, and ensures smooth delivery of sound during interactions, such as speech recognition or feedback.

### Functions

#### `play_audio(audio_base64)`

Plays audio from a base64-encoded string. Manages call state for automatic speech recognition (ASR) during playback.

- **Parameters**:
  - `audio_base64` (str): A base64-encoded string representing the audio data.
  
- **Returns**:
  - None.
  
- **Behavior**: 
  - This function decodes the base64-encoded audio string, writes it to a temporary audio file (`output.mp3`), and plays the audio using the `playsound` library. 
  - It changes the `call_state` to "SPEAKING" before playback to pause ASR and reverts it back to "LISTENING" afterward, allowing for a seamless user experience.

**Code**:
```python
def play_audio(audio_base64):
    """
    Play audio from a base64-encoded string.

    This function decodes a base64-encoded audio string, writes it to an 
    output file, and plays the audio using the playsound library. 
    It also manages the call state for automatic speech recognition (ASR) 
    by changing the state to "SPEAKING" before playback and reverting 
    it to "LISTENING" afterward.

    Args:
        audio_base64 (str): A base64-encoded string representing the audio data.

    Returns:
        None
    """

    audio_data = base64.b64decode(audio_base64)
    audio_file_path = "output.mp3"  

    with open(audio_file_path, "wb") as audio_file:
        audio_file.write(audio_data)
    
    # Stop ASR before playing audio
    change_call_state("SPEAKING")
    print("[DEBUG] Playing audio and pausing ASR...")
    playsound(audio_file_path)  # This will play the audio file

    change_call_state("LISTENING")
    print("[DEBUG] Audio playback finished. Restarting ASR...")




---

## Azure Service for ASR (`azure_service_asr.py`)

## Module Overview
This module manages the speech recognition functionality by interfacing with Azure's Automatic Speech Recognition (ASR) services. It handles audio input, processes speech recognition requests, and ensures accurate transcription of spoken language into text.
### Global Variables

- `call_state`: Initialized with the initial state.
- `messages`: Holds message array for conversation history.

### Functions

#### `initialize_speech_recognizer()`

Initializes the Azure speech recognizer with the provided configuration.

- **Returns**:
  - `recognizer`: Instance of the speech recognizer.
  
- **Behavior**: This function creates and configures an instance of the Azure Speech Recognizer, setting it up with the necessary API keys and region information for Azure services.

**Code**:
```python
def initialize_speech_recognizer():
    """Initialize the Azure speech recognizer."""
    speech_config = speechsdk.SpeechConfig(subscription=AZURE_API_KEY, region=AZURE_REGION)
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    return recognizer
```
#### `start_recognition_if_listening(recognizer)`

Starts continuous speech recognition if the current call state is "LISTENING".

- **Parameters**:
  - `recognizer`: The recognizer instance to start.
  
- **Returns**:
  - None.
  
- **Behavior**: This function checks the current `call_state`. If it is "LISTENING", it invokes the `start_continuous_recognition()` method on the recognizer, allowing it to begin processing audio input from the microphone.

**Code**:
```python
def start_recognition_if_listening(recognizer):
    """Start continuous recognition if the call state is LISTENING."""
    global call_state  # Access the global call_state variable
    if call_state == "LISTENING":
        recognizer.start_continuous_recognition()
        print("[DEBUG] ASR started and is now listening...")
    else:
        print("[DEBUG] ASR is not in the LISTENING state.")
```


#### `process_recognized_text(recognized_text, recognizer)`

Processes the recognized text and interacts with the language model.

- **Parameters**:
  - `recognized_text` (str): The text recognized by the speech recognizer.
  - `recognizer`: The recognizer instance used for processing.
  
- **Returns**:
  - None.
  
- **Behavior**: This function stops the speech recognition temporarily while processing the recognized text. It interacts with the language model (LLM) to generate responses based on the input, ensuring that the assistant can handle user queries effectively.

**Code**:
```python
def process_recognized_text(recognized_text, recognizer):
    """Process the recognized text and interact with the LLM."""
    global call_state  # Access the global call_state variable
    change_call_state("SPEAKING")
    recognizer.stop_continuous_recognition()  # Stop ASR during response processing
    print("[DEBUG] ASR stopped for LLM response processing.")

```
#### `log_llm_response(llm_response, function_name, function_arguments, function_id, first_chunk_time, error_occurred)`

Logs details of the response from the language model.

- **Parameters**:
  - `llm_response`: The response from the language model.
  - `function_name`: Name of the function called.
  - `function_arguments`: Arguments used in the function call.
  - `function_id`: ID of the function.
  - `first_chunk_time`: Time taken for the first response chunk.
  - `error_occurred`: Flag indicating if an error occurred.
  
- **Returns**:
  - None.
  
- **Behavior**: This function prints debug information about the LLM's response and related parameters to the console. This is useful for monitoring and troubleshooting the interaction with the language model.

**Code**:
```python
def log_llm_response(llm_response, function_name, function_arguments, function_id, first_chunk_time, error_occurred):
    """Log details of the LLM response."""
    print(f"[DEBUG] LLM Response: {llm_response}")
    print(f"[DEBUG] Function Name: {function_name}")
    print(f"[DEBUG] Function Arguments: {function_arguments}")
    print(f"[DEBUG] Function ID: {function_id}")
    print(f"[DEBUG] First Chunk Time: {first_chunk_time} ms")
    print(f"[DEBUG] Error Occurred: {error_occurred}")

```

#### `handle_llm_response(llm_response, function_name, function_arguments, function_id)`

Handles the response from the language model appropriately, sending messages if necessary.

- **Parameters**:
  - `llm_response`: The response from the language model.
  - `function_name`: Name of the function to be called if applicable.
  - `function_arguments`: Arguments for the function call.
  - `function_id`: ID of the function being handled.
  
- **Returns**:
  - None.
  
- **Behavior**: This function processes the LLM's response. If the response contains a message, it appends it to the conversation history. If a function needs to be called (e.g., sending a message), it performs the function call and logs the outcome.

**Code**:
```python
def handle_llm_response(llm_response, function_name, function_arguments, function_id):
    """Handle the LLM response appropriately."""
    global call_state  # Access the global call_state variable
    if llm_response:
        messages.append({"role": "assistant", "content": llm_response})
    elif function_name:
        append_asst_msg(messages=messages, function_id=function_id, function_name=function_name,
                        function_args=function_arguments)
        print(f"[DEBUG] Calling function '{function_name}' with arguments: {function_arguments}")
        # Call the WhatsApp sending function
        function_returns = send_message_to_whatsapp(function_arguments)
        append_tool_call_message(messages=messages, function_id=function_id, function_name=function_name,
                                 function_returns=function_returns)
        llm_response, _, _, _, _, _ = process_chunk(None, client, messages, tools)
        print(f"LLM Response after function call: {llm_response}")
        messages.append({"role": "assistant", "content": llm_response})
    
    change_call_state("LISTENING")
```

#### `handle_recognition_result(evt, recognizer)`

Handles the result of the speech recognition event.

- **Parameters**:
  - `evt`: The event object containing recognition details.
  - `recognizer`: The recognizer instance handling the event.
  
- **Returns**:
  - None.
  
- **Behavior**: This function responds to recognition events by checking the result's reason (e.g., recognized speech, no match, canceled). If speech was recognized, it processes the recognized text. It also logs the results of the interaction with the language model and continues listening for further input.

**Code**:
```python
def handle_recognition_result(evt, recognizer):
    """Handle the recognition result from the speech recognizer."""
    global call_state  # Access the global call_state variable

    if call_state != "LISTENING":
        print("[DEBUG] ASR ignored input since call_state is not LISTENING.")
        return

    print("[DEBUG] LISTENING AGAIN")
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        recognized_text = evt.result.text.strip()
        if recognized_text:
            print(f"[DEBUG] ASR recognized text: {recognized_text}")
            process_recognized_text(recognized_text, recognizer)
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        print("[DEBUG] No speech could be recognized.")
    elif evt.result.reason == speechsdk.ResultReason.Canceled:
        print(f"[DEBUG] Recognition canceled: {evt.result.cancellation_details.reason}")

    # Process the recognized text with process_chunk
    llm_response, function_name, function_arguments, function_id, first_chunk_time, error_occurred = process_chunk(recognized_text, client, messages, tools)
    
    # Log the results
    log_llm_response(llm_response, function_name, function_arguments, function_id, first_chunk_time, error_occurred)

    # Handle the LLM response
    handle_llm_response(llm_response, function_name, function_arguments, function_id)
    recognizer.start_continuous_recognition()
        print(f"[DEBUG] Recognition canceled: {evt.result.cancellation_details.reason}")

```

#### `recognize_speech_continuously()`

Continuously recognizes speech using Azure Speech Service.

- **Returns**:
  - None.
  
- **Behavior**: 
  - This function initializes the speech recognizer, sets up the event listeners for recognition events, and starts the recognition process. 
  - It maintains an ongoing loop, allowing the system to keep listening for speech input until the `call_state` is set to "STOP", at which point it safely terminates recognition.

**Code**:
```python
def recognize_speech_continuously():
    """Continuously recognize speech using Azure Speech Service."""
    global call_state  # Access the global call_state variable
    call_state = get_call_state()  # Refresh call_state if needed
    
    recognizer = initialize_speech_recognizer()
    change_recognizer(recognizer)  # Ensure the recognizer is set globally
    print("[DEBUG] ASR started and is now listening...")
    print("[DEBUG] ASR initialized. Waiting to start recognition...")
    
    initiate_conversation_with_llm()
    print("Listening... Speak into your microphone.")
    
    recognizer.recognized.connect(lambda evt: handle_recognition_result(evt, recognizer))
    
    # Start ASR only if it's in the correct state
    start_recognition_if_listening(recognizer)
    
    try:
        while True:
            call_state = get_call_state()
            if call_state == "STOP":
                recognizer.stop_continuous_recognition()
                print("[DEBUG] ASR is now STOPPING")
                break 
    except KeyboardInterrupt:
        print("[DEBUG] Stopping recognition...")
        print(f"traceback in recognize_speech_continuously: {traceback.format_exc()}")
        recognizer.stop_continuous_recognition()
```

---



## LLM Module (`llm_module.py`)

## Module Overview:
This module handles interactions with the OpenAI language model, processing user requests, and generating responses. It serves as the bridge between user input and the language model's capabilities, enabling dynamic conversation and content generation.

### Global Variables

- `messages`: List of messages to maintain the conversation context, initialized with a system prompt.
- `call_state`: Current state of the call, retrieved using `get_call_state()`.

### Functions

#### `get_message_array()`

Returns the current array of messages.

- **Returns**:
  - The current array of messages (list).

- **Behavior**: This function provides access to the conversation history, allowing other components to review the messages exchanged with the user.

**Code**:
```python
def get_message_array():
    """Returns the current array of messages."""
    return messages
```

---

#### `append_user_message(messages, request)`

Appends a user's message to the messages array if the request is not empty.

- **Parameters**:
  - `messages`: The current array of messages (list).
  - `request` (str): The user's message to append.

- **Returns**: 
  - None.

- **Behavior**: This function checks if the user's request is not empty before appending it to the messages array, maintaining a record of user interactions.

**Code**:
```python
def append_user_message(messages, request):
    """Appends a user's message to the messages array if the request is not empty."""
    if request:
        messages.append({"role": "user", "content": request})
```


#### `create_chat_completion(client, messages, tools)`

Creates a chat completion using the specified client and messages.

- **Parameters**:
  - `client`: The OpenAI client instance used for communication.
  - `messages`: The array of messages forming the conversation context (list).
  - `tools`: The available tools for processing (list).

- **Returns**:
  - The chat completion response object.

- **Behavior**: This function generates a response from the OpenAI model based on the current messages and tools, enabling dynamic interactions with the user.

**Code**:
```python
def create_chat_completion(client, messages, tools):
    """Creates a chat completion using the specified client and messages."""
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        tools=tools,
        stream=True
    )
```
---

#### `determine_break_punctuation(count)`

Returns punctuation marks used for breaking continuous speech based on the count of processed chunks.

- **Parameters**:
  - `count` (int): The number of processed chunks.

- **Returns**:
  - A list of punctuation marks (list).

- **Behavior**: This function determines which punctuation marks are appropriate for splitting the text based on the number of chunks, ensuring coherent audio playback.

**Code**:
```python
def determine_break_punctuation(count):
    """Returns punctuation marks used for breaking continuous speech based on the count of processed chunks."""
    if count <= 2:
        return [',', '!', ':', '.', '?', '|', '।', '፧', '፨', '،', '؛', '؟']
    else:
        return ['.', '?', '।', '፧', '፨', '؛', '؟']

```

---

#### `check_punctuation_split(continious_string, current_gpt_chunk)`

Checks if the continuous string can be split based on punctuation rules.

- **Parameters**:
  - `continious_string` (str): The ongoing text that is being processed.
  - `current_gpt_chunk` (str): The latest chunk of text received.

- **Returns**:
  - A tuple containing:
    - A boolean indicating if the string can be split (bool).
    - The modified continuous string (str).
    - Any discarded string that was not included in the split (str).

- **Behavior**: This function assesses whether the ongoing string can be split based on specific punctuation rules and conditions, ensuring proper text processing.
**Code**:
```python
def check_punctuation_split(continious_string, current_gpt_chunk):
    """Checks if the continuous string can be split based on punctuation rules."""
    words = continious_string.split()
    if not words:
        return False, "", ""

    last_word = words[-1]
    if last_word in ["Mr.", "Dr.", "Ms.", "Mrs."]:
        return False, continious_string, ""

    discarded_string = ""
    if current_gpt_chunk in [",", "."]:
        if check_before_or_after_comma_is_number(last_word):
            discarded_string = " " + last_word
            continious_string = ' '.join(words[:-1])
    
    return True, continious_string, discarded_string
```
---

#### `generate_and_play_audio(text)`

Generates audio from text and plays it if the text is not empty.

- **Parameters**:
  - `text` (str): The text to convert to audio.

- **Returns**:
  - None.

- **Behavior**: This function generates audio from the given text and plays it using the audio service if the text is not empty, ensuring effective audio feedback to the user.
**Code**:
```python
def generate_and_play_audio(text):
    """Generates audio from text and plays it if the text is not empty."""
    if text:
        print("chunk:->", text)
        audio_base64 = generate_audio_azure(text)
        if audio_base64:
            play_audio(audio_base64)  # Implement play_audio for playback
        else:
            print("[ERROR] Failed to generate audio.")
```
---

#### `extract_function_calls(chunk)`

Extracts function calls from a chunk of response data.

- **Parameters**:
  - `chunk`: The response chunk from the language model.

- **Returns**:
  - A tuple containing:
    - The extracted function name (str).
    - The function arguments (str).
    - The function ID (str).

- **Behavior**: This function analyzes the response chunk for any function calls, allowing the system to invoke the necessary tools or functions based on the model's output.
**Code**:
```python
def extract_function_calls(chunk):
    """Extracts function calls from a chunk of response data."""
    function_name = None
    function_id = None
    function_arguments = ''
    
    if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.tool_calls:
        tool_call = chunk.choices[0].delta.tool_calls[0]
        if tool_call.function:
            function_name = tool_call.function.name
            function_id = tool_call.id
            print(f"function_name: {function_name}")
            if tool_call.function.arguments:
                function_arguments += tool_call.function.arguments
                print(f"function_arguments :{function_arguments}")
    
    return function_name, function_arguments, function_id
```
---

#### `process_streaming_response(response, initial_timestamp)`

Processes the streaming response from the language model, accumulating generated text and managing audio playback.

- **Parameters**:
  - `response`: The streaming response object containing generated text chunks.
  - `initial_timestamp`: The timestamp when the streaming started.

- **Returns**:
  - A tuple with the complete response string (str), function name (str), function arguments (str), function ID (str), and time for the first chunk (int).

- **Behavior**: This function accumulates generated text from the model, detects punctuation for audio playback, and manages the timing of audio generation, providing a seamless experience.
**Code**:
```python

#-----------------------------------------------------------------------------------------------------
def process_streaming_response(response, initial_timestamp):
    """
    Processes the streaming response from the language model, accumulating generated text and managing audio playback.

    It tracks the time taken for the first chunk, detects function calls, and generates audio output as needed.

    Args:
        response: The streaming response object containing generated text chunks.
        initial_timestamp: The timestamp when the streaming started.

    Returns:
        A tuple with the complete response string, function name, function arguments, function ID, and time for the first chunk.
    """

    complete_string = ""
    continious_string = ""
    first_chunk_time = 0
    count = 1
    function_id = None
    function_name = None
    function_arguments = ''
    chunk_timestamp = initial_timestamp
    
    for chunk in response:
        current_gpt_chunk = chunk.choices[0].delta.content if chunk.choices else ""
        
        if current_gpt_chunk and chunk.choices and chunk.choices[0].delta:
            break_punctuation = determine_break_punctuation(count)
            
            if any(punc in current_gpt_chunk for punc in break_punctuation):
                continious_string += current_gpt_chunk
                complete_string += current_gpt_chunk

                if count == 1:
                    first_chunk_time = (time.time() - chunk_timestamp) * 1000
                    print(f"[DEBUG] Time taken to generate 1st chunk: {first_chunk_time} ms")

                should_split, continious_string, discarded_string = check_punctuation_split(continious_string, current_gpt_chunk)
                if should_split:
                    chunk_timestamp = time.time()
                    continious_string = continious_string.strip()
                    generate_and_play_audio(continious_string)
                    continious_string = discarded_string
                    count += 1
            else:
                continious_string += current_gpt_chunk
                complete_string += current_gpt_chunk

        # Extract function calls if any
        fn_name, fn_args, fn_id = extract_function_calls(chunk)
        if fn_name:
            function_name = fn_name
        if fn_args:
            function_arguments += fn_args
        if fn_id:
            function_id = fn_id

    if continious_string:
            print("chunk:->", continious_string)
            audio_base64 = generate_audio_azure(continious_string)

            if audio_base64:
                play_audio(audio_base64) 
            else:
                print("[ERROR] Failed to generate final audio.")

    return complete_string, function_name, function_arguments, function_id, first_chunk_time


#-----------------------------------------------------------------------------------------------------
def process_chunk(request, client, messages, tools):
    """
    Handles a user request by appending the message and generating a response from the chat client.

    Captures exceptions during processing and manages the overall flow of information.

    Args:
        request: The user's message to process.
        client: The chat client for interaction with the language model.
        messages: The conversation context.
        tools: The available tools for processing.

    Returns:
        A tuple containing the complete response string, function name, function arguments, function ID, first chunk time, and an error flag.
    """
    try:
        append_user_message(messages, request)
        response = create_chat_completion(client, messages, tools)
        
        initial_timestamp = time.time()
        (complete_string, function_name, 
         function_arguments, function_id, 
         first_chunk_time) = process_streaming_response(response, initial_timestamp)
        
        return complete_string, function_name, function_arguments, function_id, first_chunk_time, False

    except Exception as e:
        print(f"[ERROR] An error occurred while processing the chunk: {e}")
        print(f"traceback in process chunk : {traceback.format_exc()}")
        return None, None, None, None, None, True
```
---

#### `process_chunk(request, client, messages, tools)`

Handles a user request by appending the message and generating a response from the chat client.

- **Parameters**:
  - `request`: The user's message to process (str).
  - `client`: The chat client for interaction with the language model.
  - `messages`: The conversation context (list).
  - `tools`: The available tools for processing (list).

- **Returns**:
  - A tuple containing the complete response string (str), function name (str), function arguments (str), function ID (str), first chunk time (int), and an error flag (bool).

- **Behavior**: This function captures exceptions during processing, manages the overall flow of information, and communicates with the OpenAI model to generate responses.
**Code**:
```python
```
---

#### `initiate_conversation_with_llm()`

Initiates a conversation with the language model by sending a greeting message to the caller.

- **Returns**:
  - None.

- **Behavior**: This function sends an initial query to the language model, prompting it to introduce the assistant and ask for the caller's name and purpose. It appends the generated response to the conversation context.
**Code**:
```python
def initiate_conversation_with_llm():
    """
    Initiates a conversation with the language model by sending a greeting message to the caller.

    Sends an initial query to generate an introductory message and appends the response to the conversation context.

    Returns:
        None
    """
    global messages
    if call_state == "LISTENING":
        initial_query = "Please introduce yourself to the caller as Mr. Ravi Ranjan’s assistant and ask for their name and the purpose of their call."
        print("[DEBUG] Sending initial query to LLM...")

        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                stream= True
            )
            llm_response, _, _, _, _, _ = process_chunk(initial_query,client,messages,tools)
            messages.append({"role":"assistant","content":llm_response})

            print(f"LLM Generated Greeting: {llm_response}")
        except Exception as e:
            print(f"[ERROR] An error occurred while initiating the conversation: {e}")
            print(f"traceback in intitial  chunk : {traceback.format_exc()}")


```
---

## Azure TTS Module (`azure_tts.py`)

## Module Overview
This module is responsible for converting text into speech using Azure Text-to-Speech (TTS) services. It takes generated text responses and synthesizes them into natural-sounding audio output for effective communication with users.

### Function

#### `generate_audio_azure(text, lang="en-US", voice_name="en-US-AvaNeural", tts_style="chat", retry=True)`

Converts text into audio using the Azure Text-to-Speech API.

- **Parameters**:
  - `text` (str): The text to convert into speech.
  - `lang` (str, optional): The language code for the speech synthesis (default is `"en-US"`).
  - `voice_name` (str, optional): The voice name to use for speech synthesis (default is `"en-US-AvaNeural"`).
  - `tts_style` (str, optional): The style of speech (default is `"chat"`).
  - `retry` (bool, optional): A flag to retry the request in case of failure (default is `True`).

- **Returns**:
  - A Base64-encoded audio string (str) if the request is successful.
  - An empty string (`""`) if the request fails.

- **Behavior**:
  1. The function sanitizes the input text by replacing problematic characters such as `&`, `<`, and `>` to ensure compatibility with Azure TTS's XML format.
  2. It constructs an SSML (Speech Synthesis Markup Language) payload with the specified language, voice, and style.
  3. Sends a POST request to Azure TTS API using the provided configuration.
  4. If successful, it encodes the audio response into Base64 format and returns it.
  5. Handles errors gracefully, printing error messages and returning an empty string in case of failure.

- **Headers**:
  - `Ocp-Apim-Subscription-Key`: The subscription key for Azure API authentication.
  - `Content-Type`: Specifies that the request content is SSML.
  - `X-Microsoft-OutputFormat`: Sets the audio output format (e.g., 16 kHz mono MP3).
  - `User-Agent`: Identifies the client making the request.

- **Request Data**:
  - The function generates an SSML payload containing the text, language, and voice settings for the speech synthesis.

- **Error Handling**:
  - If the API returns a non-200 status code, the function raises an exception.
  - Any other errors during the request or processing are caught and logged.

### Example Usage

```python
from azure_tts import generate_audio_azure

# Example text to convert to audio
text_to_convert = "Hello, this is an example of text-to-speech conversion."

# Generate audio in Base64 format
audio_base64 = generate_audio_azure(text_to_convert)

if audio_base64:
    print("Audio successfully generated!")
else:
    print("Failed to generate audio.")
```
---

## Utils Module (`utils.py`)

## Module Overview
This module provides a collection of helper functions that support various tasks within the application, including communication between components, message management, and utility functions that enhance overall functionality and maintainability.

### Functions

#### `send_message_to_whatsapp(message)`

Sends a WhatsApp message using Twilio's API.

- **Parameters**:
  - `message` (str): The content of the message to be sent.

- **Returns**:
  - A success message (`"Message sent successfully"`) if the operation is successful.
  - An error message (`"Error in sending message."`) if the operation fails.

- **Behavior**:
  1. Authenticates with Twilio using credentials from `config.py`.
  2. Attempts to send a WhatsApp message to the recipient number.
  3. Prints the message SID if successful or logs the error if it fails.

- **Example Usage**:
  ```python
  def send_message_to_whatsapp(message):
    client = Client(ACCOUNT_SID, AUTH_TOKEN)
    try:
        msg = client.messages.create(
            from_=WHATSAPP_NUMBER,
            body=message,
            to=f'whatsapp:{RECIPIENT_NUMBER}'
        )
        print(f"[INFO] WhatsApp message sent. SID: {msg.sid}")
        return "Message sent successfully"
    except Exception as e:
        print(f"[ERROR] Failed to send WhatsApp message: {e}")
        return "Error in sending message.

  message = "Hello from Twilio!"
  result = send_message_to_whatsapp(message)
  print(result)
  ```
  
### `check_before_or_after_comma_is_number(s)`

**Description**:  
This function checks whether a number is located immediately before or after a comma (`,`), or a period (`.`) in the given string.

- **Parameters**:
  - `s` (str): The input string to validate.

- **Returns**:
  - `True` if a number is found before or after a comma/period.
  - `False` otherwise.

- **Behavior**:
  - Uses the regular expression `r'\d[,.]|[,.]\d'` to detect patterns such as:
    - `1,000`
    - `3.14`
    - `,1`
  - Handles exceptions gracefully by logging an error message and returning `False`.

- **Example Usage**:
  ```python

  text_1 = "The price is 1,000 USD."
  text_2 = "Hello, world!"
  result_1 = check_before_or_after_comma_is_number(text_1)  # Output: True
  result_2 = check_before_or_after_comma_is_number(text_2)  # Output: False
  ```

### `append_asst_msg(messages, function_id, function_name, function_args)`

**Description**:  
This function appends a message to the `messages` list, representing a response from an assistant. The message includes metadata about a tool or function the assistant has invoked.

- **Parameters**:
  - `messages` (list): The conversation history stored as a list of message objects.
  - `function_id` (str): A unique identifier for the invoked function or tool call.
  - `function_name` (str): The name of the function/tool invoked by the assistant.
  - `function_args` (str): The arguments passed to the function/tool.

- **Returns**:  
  - None (modifies the `messages` list in place).

- **Behavior**:  
  - Appends an object to the `messages` list with the following structure:
    ```json
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "function_id",
          "function": {
            "name": "function_name",
            "arguments": "function_args"
          },
          "type": "function"
        }
      ]
    }
    ```

- **Example Usage**:
  ```python
  conversation = []
  append_asst_msg(conversation, "tool_001", "fetch_data", '{"param1": "value1"}')
  print(conversation)  
  ```


## Function: `append_tool_call_message`

**Description**:  
Appends a tool call message to the conversation history. This message captures the return value of a previously invoked tool or function.

### Parameters

- **`messages`** (`list`):  
  The conversation history, represented as a list of message objects. This list is modified in place to include the new tool call message.

- **`function_id`** (`str`):  
  A unique identifier that links this message to the original tool or function call.

- **`function_name`** (`str`):  
  The name of the tool or function that generated the return value.

- **`function_returns`** (`str`):  
  The return value or output of the tool or function.

---

### Returns

- **None**:  
  The function modifies the `messages` list in place and does not return a value.

---

### Behavior

The function appends a new dictionary to the `messages` list with the following structure:
```json
{
  "tool_call_id": "function_id",
  "role": "tool",
  "name": "function_name",
  "content": "function_returns"
}
```


## frontend Module (`frontend.py`)

## Overview
This document describes the implementation of a phone call interface using Streamlit, which allows users to manage a phone call session with speech recognition capabilities. The interface includes buttons to start and end calls, a timer to track call duration, and a message display area to show conversation history.

### Page Configuration
- `st.set_page_config()`: Configures the page title, icon, and layout to be centered.

### Custom Styles
- `st.markdown()`: Applies custom CSS styles for the main container, status messages, and timer display to enhance the user interface.

### Session State Initialization
The following session states are initialized:
- **`call_status`**: Indicates the current status of the call (e.g., "No call in progress").
- **`call_start_time`**: Records the time when the call starts.
- **`elapsed_time`**: Tracks the total duration of the call in seconds.
- **`messages`**: A list to store chat messages during the call.

## Helper Functions

### `format_duration(seconds)`
**Description**: Formats the call duration in hours, minutes, and seconds.

- **Parameters**:
  - `seconds` (int): The total duration in seconds.
  
- **Returns**:
  - (str): A string representing the formatted duration (HH:MM:SS).

### `on_call_connected()`
**Description**: Executed when the call is connected. It changes the call state to "LISTENING" and starts the speech recognition process.

### `on_call_ended()`
**Description**: Executed when the call ends. It changes the call state to "STOP" to perform cleanup actions.

## Main Logic
- **Call Timer Logic**: Calculates the elapsed time based on the call status.

## User Interface Elements

### Title and Main Container
- `st.title()`: Displays the title of the application.
- `st.markdown()`: Creates a styled main container for the UI.

### Status Container
- `set_markdown(status_message)`: A helper function to display the current call status with appropriate styling based on whether the call is ongoing or has ended.

### Call Timer Display
- Displays the formatted call duration when the call is active.

## Call Control Buttons
Two buttons are provided for managing call status:

### 📞 Start Call:
- Sets the call status to "Call is started".
- Records the start time and updates the status message.
- Calls `on_call_connected()` to initiate speech recognition.

### 🔴 End Call:
- Sets the call status to "Call is ended".
- Updates the status message.
- Calls `on_call_ended()` to stop any ongoing processes.

## Message Display
- Iterates through the stored messages in `st.session_state.messages` and displays them in the UI, indicating whether they are from the assistant or the user.

## Footer
- Adds a horizontal line to separate the footer from the main content.

![Phone Call Interface](ui_image.jpeg)

This image illustrates the layout of the phone call interface, showcasing the call status, timer, and message display area. The design emphasizes clarity and ease of use, ensuring a smooth user experience during phone call management.

create readme to run above code