# Lab 1 - Core Functionality

> **⚠️ Browser Compatibility Notice**: This lab requires **Google Chrome browser** for optimal performance. Please ensure you're using Chrome before proceeding.

In this lab, you'll learn about building a real-time voice chat application using Nova Sonic's bidirectional streaming API. You'll understand the event-based architecture, tool use handling, conversation history management, and guardrails implementation.

**Prerequisites:** Complete Lab 0 - Introduction to Nova Sonic via AWS Console to understand Nova Sonic's capabilities and behavior.

## Understanding the Event Flow

To initiate a conversation with Nova Sonic, a sequence of JSON-formatted events must be sent to the Nova Sonic connection via Bedrock. The diagram below illustrates the event flow between the client (middle server) and the Nova Sonic system.

![Nova Sonic Event Flow](static/image-1.png)

For more details on Amazon Nova Sonic events, please refer to:
- [Input Events](https://docs.aws.amazon.com/nova/latest/userguide/input-events.html)
- [Output Events](https://docs.aws.amazon.com/nova/latest/userguide/output-events.html)

## Architecture Overview

The Nova Sonic Speech-to-Speech model and its APIs have been designed and optimized for real-time, conversational interactions through an open bidirectional audio stream or channel configuration.

For architectures that require an internet-exposed connection to serve mobile or web clients, the following approach is recommended:

![Nova Sonic Sample Architecture](static/image-2.png)

## Setup Process

### Starting the Python WebSocket Server

1. Use the terminal command at the bottom of the Code Editor. If it's not open, click the icon in the top-right corner to launch it.

   ![Open SageMaker Studio](static/image-3.png)

2. Start the Python websocket server:

   ```bash
   cd python-server
   python server.py
   ```

   ![WebSocket Server Running](static/image-8.png)

   The WebSocket server will start on port 8081.

### Starting the React Frontend App

1. Keep the WebSocket terminal open. Now, open a new terminal to start the React client app. Click the **+** button to open a new terminal.

   ![VSCode New Terminal](static/image-9.png)

2. Navigate to the REACT client folder, then setup and run the react app:

   ```bash
   cd react-client
   npm ci
   npm start
   ```

   It may take a few minutes to build and run the app for the first time.

3. If your browser blocks opening a new tab, go to the **Ports** tab to find the URL and open it manually.

   ![VSCode Ports Tab](static/image-10.png)

## Conversing with Nova Sonic

In this hands-on lab, we will use the frontend React app to engage in an audio conversation with the Amazon Nova Sonic model.

![Demo UI](static/image-11.png)

1. Start by asking questions like "Hi Nova, how are you?" and enjoy chatting with Nova Sonic.

2. When using Chrome, ensure the sound setting is set to **Allow**:

   ![Chrome Sound Settings](static/image-12.png)

3. Grant microphone access when prompted:

   ![Microphone Permission](static/image-13.png)

4. Click the bubbles to view the raw JSON data sent to and received from Nova Sonic:

   ![View JSON Data](static/image-14.png)

5. Click the settings icon to customize Voice ID, System Prompt, Tool usage, and Chat history:

   ![Demo UI Settings](static/image-15.png)

### Understanding Output ASR (Automatic Speech Recognition)

Each ASR (transcript) returned from Nova Sonic comes in a pair:
- **[Speculative]**: Appears before the audioOutput event (what Sonic thinks it will say)
- **[Final]**: Follows the audioOutput event (what Sonic actually said)

![Sonic ASR](static/image-16.png)

## Understanding the Parameters

### VoiceId
Specified in the `PromptStart` event - determines which voice Sonic should use.

### ToolConfiguration
Included in the `PromptStart` event:

```json
{
  "tools": [
    {
      "toolSpec": {
        "name": "getDateTool",
        "description": "get information about the current day",
        "inputSchema": {
          "json": JSON.stringify({
            "type": "object",
            "properties": {},
            "required": []
          })
        }
      }
    }
  ]
}
```

### System Prompt
Included in the first `textInput` event:

```json
{
  "event": {
    "textInput": {
      "promptName": "68d4c692-8567-4805-bb36-406e9c8f0f96",
      "contentName": "8e6fe1ae-c3df-4bcd-a0e3-86d3cbf1cc15",
      "content": "You are a friend. The user and you will engage in a spoken dialog exchanging the transcripts of a natural real-time conversation. Keep your responses short, generally two or three sentences for chatty scenarios.",
      "role": "SYSTEM"
    }
  }
}
```

## System Prompt Implementation

Now that you've experimented with system prompts in Lab 0, let's understand how they're implemented programmatically. The system prompt is sent as the first `textInput` event with role "SYSTEM".

### Lab: Implement Custom System Prompts

Try modifying the system prompt in the settings panel of the demo UI. The default baseline is:

```
You are a friend. You and the user will engage in a spoken dialog exchanging the transcripts of a natural real-time conversation. Keep your responses short, generally two or three sentences for chatty scenarios.
```

Test the different system prompt patterns you learned in Lab 0. For additional best practices, refer to the [official documentation](https://docs.aws.amazon.com/nova/latest/userguide/prompting-speech-best-practices.html).

## Implementing Barge-in Handling

You've experienced barge-in in Lab 0. Now let's understand how it's implemented in code. Barge-in allows users to interrupt Nova Sonic during speech for more natural conversations.

![Sonic Barge In](static/image-17.png)

### Test Barge-in Implementation

1. Ensure both the Python server and REACT app are running.

2. Click "Start conversation" and ask: "I want to travel to Japan during the summer. Can you give me some recommendations?"

3. While Nova Sonic is responding, interrupt by saying: "I changed my mind. I want to visit Peru instead."

4. You will see an "interrupted" event, and Nova Sonic will respond based on your new command.

   ![Sonic Barge In UI](static/image-18.png)

Check the REACT code implementation in [`react-client/src/s2s.js`](react-client/src/s2s.js#L209-L232) - specifically the `cancelAudio()` function (line 209) and the interruption detection logic (lines 228-229) that handles barge-in events.

![Sonic Barge In Pause Audio](static/image-19.png)

## Managing Chat Histories

Nova Sonic responses include ASR transcripts for both user and assistant voices. Storing chat history allows resuming sessions when connections close unexpectedly.

![Sonic Chat History](static/image-20.png)

### Lab: Test Chat History

1. Start a conversation and ask: "Hi Nova, can we resume the reservation?"

   ![Sonic Chat History UI](static/image-21.png)

2. Default chat history example:

```json
[
  {
    "content": "hi there i would like to cancel my hotel reservation",
    "role": "USER"
  },
  {
    "content": "Hello! I'd be happy to assist you with cancelling your hotel reservation. To get started, could you please provide me with your full name and the check-in date for your reservation?",
    "role": "ASSISTANT"
  },
  {
    "content": "yeah so my name is don smith",
    "role": "USER"
  },
  {
    "content": "Thank you, Don. Now, could you please provide me with the check-in date for your reservation?",
    "role": "ASSISTANT"
  }
]
```

## Handling ToolUse

Tool Use (function calls) enable external functionality in Amazon Nova, such as API calls or code functions.

### Setup toolConfiguration

When starting a new session with Nova Sonic, provide the Tool configuration in the `PromptStart` event:

```json
{
  "tools": [
    {
      "toolSpec": {
        "name": "getDateTool",
        "description": "get information about the current day",
        "inputSchema": {
          "json": JSON.stringify({
            "type": "object",
            "properties": {},
            "required": []
          })
        }
      }
    }
  ]
}
```

### Receive and Process the ToolUse Event

Throughout the conversation, Sonic will trigger a ToolUse event if the user input matches one of the tool specs.

![Sonic ToolUse Events](static/image-22.png)

### Test ToolUse Using the Sample App

1. Ensure both the Python server and REACT app are running.

2. Click "Start conversation" and ask: "What time is it?" or "I'm in the Eastern Time Zone—what time is it currently?"

3. Nova Sonic will respond with the correct date. You should see ToolUse and ToolResult events on the UI.

   ![Sonic ToolUse UI](static/image-23.png)

4. Click the red ToolUse event to view details:

   ![Sonic ToolUse Event](static/image-24.png)

5. Click the red ToolResult event to view details:

   ![Sonic ToolResult Event](static/image-25.png)

6. Check the Python code implementation in [`python-server/s2s_session_manager.py`](python-server/s2s_session_manager.py#L268-L330) - specifically the `processToolUse` function starting at line 268 to see how different tool types are handled and processed.

   ![Sonic ToolUse Processing](static/image-26.png)

## About Guardrails

Three key aspects to consider when applying guardrails to Nova Sonic:

1. **Built-in RAI**: Nova offers integrated Responsible AI (RAI) that aligns with the AWS Acceptable Use Policy. See [Nova RAI documentation](https://docs.aws.amazon.com/nova/latest/userguide/responsible-use.html).

2. **System Prompts**: Incorporate additional policy evaluation logic as part of the system prompts.

3. **Moderate ToolResult input**: When invoking external function calls via ToolUse, validate the returned context using the [ApplyGuardrail API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ApplyGuardrail.html).

### Lab: Test Guardrails with System Prompts

Update the system prompts with this real estate agent sample, then ask: "Can you tell me if the seller is male or female?"

```
You are a real estate agent assisting customers by providing accurate, factual information about real estate properties. 
Your responses should be clear, informative, and strictly professional. Do not include or imply any references to ethnicity, gender, or other protected attributes, in accordance with fair housing regulations. 
Focus solely on property features, location details, market trends, and other objective criteria.
```

Nova Sonic will gracefully refuse to share information that goes against the system prompt instructions.

## Key Takeaways

In this lab, you've learned about:

✅ **Event-based architecture**: How Nova Sonic processes input and output events
✅ **System prompt implementation**: Programmatic control of assistant behavior
✅ **Barge-in handling**: Natural interruption processing in code
✅ **Chat history management**: Maintaining conversation context
✅ **Tool use integration**: External function calling capabilities
✅ **Guardrails implementation**: Responsible AI practices

## Additional Resources

- [Amazon Nova Sonic Documentation](https://docs.aws.amazon.com/nova/latest/userguide/speech.html)
- [Input Events Reference](https://docs.aws.amazon.com/nova/latest/userguide/input-events.html)
- [Output Events Reference](https://docs.aws.amazon.com/nova/latest/userguide/output-events.html)
- [Nova Sonic Best Practices](https://docs.aws.amazon.com/nova/latest/userguide/prompting-speech-best-practices.html)
- [Amazon Nova Samples GitHub Repository](https://github.com/aws-samples/amazon-nova-samples)

## Challenge: Add a Coin Flip Tool

Now that you understand how tools work, it's your turn to implement a simple coin flip tool to help with decision-making.

### Your Task

Create a new tool called `coinFlipTool` that randomly returns either "heads" or "tails" when called.

### Hints

1. **Tool Specification**: Follow the same pattern as `getDateTool` - this tool doesn't need any input parameters
2. **Tool Name**: Use `coinFlipTool` as the name
3. **Description**: Write a clear description so Nova Sonic knows when to use it (think about decision-making scenarios)
4. **Processing Logic**: Look at how `getDateTool` is handled in `s2s_session_manager.py` and add a similar `if` block for your coin flip tool
5. **Randomness**: Python's `random` module can help you generate random results

### Test Your Implementation

Try asking Nova Sonic:
- "Flip a coin for me"
- "Should I go to the gym today? Flip a coin"
- "Help me decide - heads or tails?"
- "I can't decide between pizza or sushi. Flip a coin"

### Expected Result

When working correctly:
1. Nova Sonic recognizes your coin flip request
2. A ToolUse event appears in the UI for "coinFlipTool"
3. The tool returns either "heads" or "tails" randomly
4. Nova Sonic speaks the result naturally in conversation
5. Each time you ask, you might get a different result