# Amazon Nova Act
Amazon Nova Act is an AI model that has been trained to complete tasks in a web browser. Using the Nova Act SDK, with just a few lines of code and natural language prompts, you can automate a range of tasks on the same websites used by humans such as:

- 📝 automating workflows across complex multi-element web forms
- ✈️ automating payments and bookings in Hospitality and Travel
- 🖥️ automating QA testing of user interfaces
- 📥 automating backoffice workflows for Human Capital Services such as creating and refreshing job postings

In this notebook you will learn about the foundational elements of Nova Act and how to use them to build reliable and robust browser-based automations. By the end of this notebook you should have a good understanding of best practices around prompting Nova Act, working with sensitive data, managing authentication, and much more. This notebook is designed for developers who are looking to develop browser-based automations.

In order to run this notebook, you will need a terminal session running locally, as well as Python 3.10 or later. You do not need to be familiar with Python in order to complete this workshop.

# Module 1: Nova Act Fundamentals

## Module Overview

In Module 1 you will learn about the fundamentals of working with Nova Act. You will start with basic prompting techniques, then learn how to debug Nova Act by reviewing its log files. You will then learn how to extract structured data from a website, and we wrap up this module with best practices around prompting and screen resolutions.

After completing Module 1 you will be equipped to write and debug advanced Nova Act workflows. Let's get started!

## Note on using this notebook

#### NOTE: You will need to run the steps outlined in this notebook on your local machine in a terminal, and not within this notebook as you typically would use a notebook. You will use this notebook to follow along the steps you need to carry out in the terminal. The reason for this is that we want to demonstrate Nova Act launching a web browser, and you can not launch a web browser from a notebook.**

**1. Get and set your API key**

NOTE: if you are running this as part of an AWS event, your instructor may provide you with an API key. In that case, you can skip this next step of obtaining your own API key.

You authenticate with Nova Act via an API key. You can obtain an API key from [https://nova.amazon.com/act](https://nova.amazon.com/act). Copy your key and set it in your environment:

```sh
# Replace <your api key> with your actual API key
export NOVA_ACT_API_KEY=<your api key>

# On Windows
$env:NOVA_ACT_API_KEY = <your api key>
```

Let's confirm you have set the API key.

```sh
echo $NOVA_ACT_API_KEY

# On Windows
echo $env:NOVA_ACT_API_KEY
```

You should see your API key output. If you do not see your API key, please return to step 2 and obtain your API key.

**2. Install nova-act**

`nova-act` is available via the `pip` installer. Before we install it, let's first create a virtual environment where we will run our workshop commands.

```sh
mkdir nova-act-workshop
cd nova-act-workshop
python3 -m venv .venv
source .venv/bin/activate
```

Next, install `nova-act`

```sh
pip install nova-act
```

Congratulations! You're all setup and ready to begin writing Nova Act workflows. In the next section you will create your first Nova Act workflow!

---
# Section 2: Interactive Mode

## Overview

In this section you will learn about the interactive mode of Nova Act and run your first workflows.

As a developer you use the Nova Act SDK to build your workflows. The SDK includes a function called `act()` which you use to pass prompts to Nova Act. Working with the Nova Act SDK in interactive mode is a great way to start learning about Nova Act and building workflows. It allows you to quickly iterate on your `act()` prompts by observing how those perform and then adjusting as needed. Once you have created a workflow that is working well, you can move the `act()` calls into a Python script and run the workflow in that way.

Interactive mode is a great way to build Nova Act workflows as you can easily observe how Nova Act operates on a website for each prompt you provide. This enables you to easily adjust your prompts to ensure your Nova Act workflow is running reliably and consistently.

In the steps below you will build a workflow that searches for coffee makers on amazon.com, and finds the cheapest one.

**1. Start a Python shell environment**

From the terminal type `python`
```sh
python
```

You should now be within a Python shell.

**2. Start a Nova Act session**

To start, import the `NovaAct` module.

```sh
from nova_act import NovaAct
```

Next, create a Nova Act instance that points to `amazon.com` for its starting page. We do this by creating a `NovaAct` instance, passing in the website url via the required `starting_page` parameter which is the only required parameter. We will cover the other optional parameters throughout the workshop. We start the `NovaAct` session with `nova.start()`.

```sh
nova = NovaAct(starting_page="https://amazon.com")
nova.start()
```

You should see a Chrome browser launch and open to `amazon.com`. The first time you launch Nova Act it will take a few moments longer for the browser to launch as the underlying modules need to be instantiated.

**3. Write your first prompt**

Great! Now you've got started your first Nova Act session. Let's ask Nova Act to search for coffee makers. We issue prompts to `NovaAct` via the `nova.act("<prompt goes here>")` command. Let's issue the command by running the command below in our terminal. Note: please wait for the search to complete and do not adjust the window size.

```sh
nova.act("search for coffee makers")
```

After a few moments you should see a search happening on the website, and the search results displayed. Congratulations, you just successfully completed your first Nova Act prompt!

**4. Understanding the prompt and response**

So what just happened? Let's explore that.

When you make a `nova.act()` call, the Nova Act SDK makes an inference call to the Nova Act AI model (running in the cloud). It passes to the inference call a screenshot of the current website, page metadata, and the prompt. This is why it is important to not adjust the window size as it can lead to inconsistent understanding during inference.

The inference response typically includes an action and screen coordinates, such as `agentClick(" <box>757,561,812,718</box> ");` which the SDK then carries out on the website. This step of taking actions on the website is called the `actuation` step. Other actions include `agentType()` and `agentScroll()`.

Along with the action to carry out, the inference response includes how the model thought about the request and how it arrived at its decision for the next step. This is contained with a `think()` element.

During inference, the model may determine it needs to carry out multiple actions to complete the task, which would result in multiple `think()` elements. We can see all of this in our terminal for the prompt we just specified.

```sh
>>> nova.act("search for coffee makers")
eab1> act("search for coffee makers")
..............
eab1> think("I am on the amazon homepage. My task is to search for coffee makers. I see a search field, but it is empty. I need to type 'coffee makers' into the search field.");
>> agentType("coffee makers", "<box>10,372,48,1062</box>");
......................
eab1> think("The search field now contains 'coffee makers', so my last action was successful. The search has not been initiated. I see a search button to the right of the search field. I need to click the search button to initiate the search.");
>> agentClick("<box>10,1062,48,1110</box>");
....................
eab1> think("I am now on the search results page for coffee makers, so my last action was successful. I have searched for coffee makers. The task is complete, and I need to simply return because no information was asked of me.");
>> return;
```

You can see the `act()` call containing our prompt. We can also see how our one `act()` call resulted in two `think()` steps, each with an associated action; an `agentType()` for typing into the search field, and a `agentClick()` for clicking the search button.

Note that we didn't specifically say to `click the search bar`, but instead used a natural language instruction of `search for coffee makers`. When you prompt Nova Act, think of constructing your prompts using the same terminology you would use if you were asking a person to carry out those steps.

**5. Return a value from Nova Act**

Let's try out more prompts by asking Nova Act to return the price of the first coffee maker.
```sh
nova.act("return the price of the first item")
```

Let's review the output in the terminal. Note that if you are running this from a location where you are unable to purchase the item, you may not be able to see the price.
```sh
>>> result = nova.act("return the price of the first item")
be84> act("return the price of the first item")
................
be84> think("I am on the search results page for coffee makers. My task is to return the price of the first item. The first item is not visible, it may be lower on the page. I should scroll down the page to reveal the first item.");
>> agentScroll("down", "<box>0,0,812,1600</box>");
..................
be84> think("The page has scrolled down, and the results are now visible. Therefore, my last action was successful. I see the first item and its price is $99.99. The task is complete, and I need to return with the price of the first item.");
>> return "$99.99";
```

In this example we see Nova Act can extract and return information from a web page. In a later section you will learn how to structure the output returned from the page. Here we see a single `nova.act()` call resulted in a single `think()` step, compared to the two we saw in the previous step.

**6. Terminate an interactive session**

For the remainder of the workshop you will run Nova Act within Python scripts. This allows us to more quickly write and run the Nova Act workflows. Let's go ahead and end our interactive session. Some handy commands to know:
* *Control+X* - this terminates an active Nova Act instruction
* *nova.stop()* - this terminates an active Nova Act instance
* *Control+D* - this terminates the interactive Python session

We want to end the Python session so we can type in:

```sh
<Control+D>
```

#### Summary
Congratulations! You just completed this section in which you ran your first Nova Act workflow via the interactive mode. You learned how to understand the response that is returned from a `nova.act()` call, and how the SDK and the AI model work together to complete tasks in a website.

---
# Section 3: Logging, Debugging, and Headless Mode

## Overview
In this section you will learn how to debug Nova Act workflows and learn about headless mode.

**1. Nova Act log files**

Nova Act logs the result of each `act` call in its own HTML file. This file includes all of the individual steps (the `think` elements) that were carried out as part of that `act` call. The following are captured in the log file for each step: the screenshot, the prompt, and the inference result (`think`). A link to the HTML file is shown after an `act` call has completed.

*TIP: you can specify the folder where Nova Act will write its log files. This enables you to centralize the log files from across all of your Nova Act runs. You specify this by passing the parameter `logs_directory` into the Nova Act constructor. For example, `NovaAct(starting_page="http://www.amazon.com/", logs_directory="<absolute_folder_path>"`*

**2. Debugging an act call**

Let's make an `act` call and see what is contained in the HTML file. Copy the Nova Act script below and save it to a file called `coffee-mugs.py`.

```python
from nova_act import NovaAct

with NovaAct(starting_page="http://www.amazon.com") as nova:
    nova.act("search for coffee mugs")
```

Run the script:

```python
python coffee-mugs.py
```

When this completes, you should see a link to the HTML file.
```
d56c> ** View your act run here: /var/folders/...-39d640727e15_search_for_coffee_mugs.html
```

Let's inspect the contents of the file. Copy the link to the HTML file and paste it in your browser.

![search-coffee-mugs](./search-coffee-mugs.png)

At the top we see the prompt that we sent, `search for coffee mugs`. Then for each step that was carried out for this `act`, we see:
* a screenshot of the website at the start of the step
* a red bounding box indicating where Nova Act needs to carry out its next step
* the thinking step

In our first step, we can see a red bounding box around the search bar near the top of the page, along with the `think` step of how Nova Act understands how it should carry out its action on this page.

The thinking step and the red bounding box are very valuable when debugging why Nova Act chose to carry out a step the way it did. If Nova Act is not carrying out a step as you intended it to, you can use these to inform how you should adjust the prompt to have Nova Act correctly carry out the step.

The HTML file contains all of the steps that were contained within a single `act` call. A script with multiple `nova.act()` calls will have an equivalent amount of HTML files.


**3. Headless mode and video capture**

Nova Act can be run in headless mode where the browser window is not visible during runtime. This is how you would typically run Nova Act in a production environment, and when running in the cloud. This is toggled via the parameter `headless=True`.

You can also optionally create a video capture of the entire workflow, which can be valuable in debugging a headless invocation, and for creating demos of the workflows. Video capture is toggled via the parameter `record_video`. Let's run a script that runs in headless mode and captures the video.

Copy the script below into a file named `coffee-mugs-headless.py`.

```python
from nova_act import NovaAct

with NovaAct(starting_page="http://www.amazon.com", headless=True, record_video=True) as nova:
    nova.act("search for coffee mugs")
```

Run the script:
```python
python coffee-mugs-headless.py
```

In the terminal you can follow Nova Act as it runs. Wait until it completes, and then continue.

The video recording is contained in the same folder as the HTML files. From the terminal output above, locate and open the folder. You should see a file named `session_video_tab-0.webm`. Double click this file which should open in your browser. If it opens in another application that isn't able to render it, go to your browser and select the file (`File -> Open`).

If you play the video, you can see all of the steps that were carried out on the website during the workflow.

The section on CAPTCHAs describes how you can observe the browser when it is running in headless mode.

**4. Metadata of an act call**

After it has successfully completed a workflow, Nova Act returns an `ActResult` object. This contains the payload if one was requested (we will do that in a later module). It also returns the `session_id` and `act_id` which can help during debugging.

```sh
ActResult(
    response = None
    parsed_response = None
    valid_json = None
    matches_schema = None
    metadata = ActMetadata(
        session_id = eab1497d-e89b-4b2a-92d5-bdb203df7f0d
        act_id = ffc38fbf-9a19-49b0-8224-8f2d6d7e0678
        num_steps_executed = 3
        start_time = 2025-05-15 09:03:12.306233 EDT
        end_time = 2025-05-15 09:03:45.974266 EDT
        prompt = 'search for coffee mugs'
    )
)
```

**Summary**

In this module you learned how Nova Act logging works and how to use these log files to debug a workflow by looking at bounding boxes along with what the model was thinking. You also learned how to create a video recording of the workflow.

# Section 4: Structured Output

## Overview

**1. Structured output overview**

Nova Act can return structured data representation of the data on a web site. This enables powerful market research capabilities when paired with Nova Act's inherent ability to also navigate a website. Structured output is enabled via [Pydantic](https://docs.pydantic.dev/latest/) classes. You define the structure you want Nova Act to return, and pass that in to the `nova.act()` call.

Here is an example of a class, `Book`, we can use to capture simple details of a book. We include a second class, `BookList`, to contain all of the books on a page.

```python
from pydantic import BaseModel

class Book(BaseModel):
    title: str
    author: str

class BookList(BaseModel):
    books: list[Book]
```

**2. Return structured data from a workflow**

Now let's see how we would do this within a script. In this example, we will query Wikipedia for the top selling books in 2020 and return the result in a JSON structure. Our script loads the Wikipedia best seller page, extracts the book data into the `BookList` schema, and then validates the data and converts it to JSON.

Copy the code below and save it locally in a file named `top-books.py`.

```python
import json
from nova_act import NovaAct
from pydantic import BaseModel

class Book(BaseModel):
    title: str
    author: str

class BookList(BaseModel):
    books: list[Book]

def get_books(year: int) -> BookList | None:
    """
    Get top books of the year and return as a BookList. Return None if there is an error.
    """
    with NovaAct(
        starting_page=f"https://en.wikipedia.org/wiki/List_of_The_New_York_Times_number-one_books_of_{year}#Fiction"
    ) as nova:
        result = nova.act("Return the books in the Fiction list",
            # Specify the schema for parsing.
            schema=BookList.model_json_schema())
        if not result.matches_schema:
            # act response did not match the schema ¯\_(ツ)_/¯
            return None
        # Parse the JSON into the pydantic model.
        book_list = BookList.model_validate(result.parsed_response)
        book_list_json = json.dumps(book_list.model_dump(), indent=2)
        print(f"Books on the page are {book_list_json}")

if __name__ == "__main__":
    get_books(2020)
```

Run the script:

```
python top-books.py
```

You should see an output resembling this:
```
80a6> act("Return the books in the Fiction list, format output with jsonschema: {"$defs": {"Book": {"properties": {"title": {"title": "Title", "type": "string"}, "author": {"title": "Author", "type": "string"}}, "required": ["title", "author"], "title": "Book", "type": "object"}}, "properties": {"books": {"items": {"$ref": "#/$defs/Book"}, "title": "Books", "type": "array"}}, "required": ["books"], "title": "BookList", "type": "object"}")
..............
80a6> think("Extracting:  Return the books in the Fiction list");
>> return "{\"books\": [{\"title\": \"Where the Crawdads Sing\", \"author\": \"Delia Owens\"}, {\"title\": \"American Dirt\", \"author\": \"Jeanine Cummins\"}, {\"title\": \"Golden in Death\", \"author\": \"J. D. Robb\"}, {\"title\": \"American Dirt\", \"author\": \"Jeanine Cummins\"}, {\"title\": \"One Minute Out\", \"author\": \"Mark Greaney\"}, {\"title\": \"Blindside\", \"author\": \"James Patterson and James O. Born\"}, {\"title\": \"House of Earth and Blood\", \"author\": \"Sarah J. Maas\"}, {\"title\": \"The Mirror & the Light\", \"author\": \"Hilary Mantel\"}, {\"title\": \"The Boy from the Woods\", \"author\": \"Harlan Coben\"}, {\"title\": \"Little Fires Everywhere\", \"author\": \"Celeste Ng\"}]}";

Books on the page are {
  "books": [
    {
      "title": "Where the Crawdads Sing",
      "author": "Delia Owens"
    },
    {
      "title": "American Dirt",
      "author": "Jeanine Cummins"
    },
    {
      "title": "Golden in Death",
      "author": "J. D. Robb"
    },
    {
      "title": "American Dirt",
      "author": "Jeanine Cummins"
    },
    {
      "title": "One Minute Out",
      "author": "Mark Greaney"
    },
    {
      "title": "Blindside",
      "author": "James Patterson and James O. Born"
    },
    {
      "title": "House of Earth and Blood",
      "author": "Sarah J. Maas"
    },
    {
      "title": "The Mirror & the Light",
      "author": "Hilary Mantel"
    },
    {
      "title": "The Boy from the Woods",
      "author": "Harlan Coben"
    },
    {
      "title": "Little Fires Everywhere",
      "author": "Celeste Ng"
    }
  ]
}
```

**3. The BOOL_SCHEMA**

Nova Act natively provides the `BOOL_SCHEMA` class which returns `True` or `False` based on the question in the prompt. For example, we can ask Nova Act if there is a search bar on the page.

Copy the code below and save it locally in a file named `bool-schema.py`.

```python
from nova_act import NovaAct, BOOL_SCHEMA

with NovaAct(starting_page="http://www.amazon.com") as nova:
    prompt = "Is there a search bar on the page?"
    result = nova.act(prompt, schema=BOOL_SCHEMA)
    print(f"{prompt} : {result}")
```

Run the script:
```
python bool-schema.py
```

You should see an output resembling this:
```
Is there a search bar on the page? : True
```

**Summary**

In this module you learned how to output structured data from a workflow, and about the native `BOOL_SCHEMA` class.

# Section 5: Prompting Best Practices

## Overview

In this section you will learn about best practices for prompting Nova Act. Nova Act is designed for enterprise use and scale, which means it is critical that workflows run reliabily and consistently over many runs.

**1. Be concise**

As a general rule of thumb, a single `nova.act()` should comprise 3-5 actions (eg type, click). Keeping prompts concise enables scripts to run more reliably and consistently, whereas larger prompts can increase the variance in results as the number of actions increases.

As an example, this prompt could work when you run it, but over many runs, there may be variability in the output.
```sh
nova.act("Search for coffee makers then view the details of the first product and then add it to your cart")
```

Instead, break this down into multiple prompts.
```sh
nova.act("Search for coffee makers")
nova.act("View the details of the first product")
nova.act("Add it to your cart")
```

**2. Data formatting and filling out forms**

When filling out a web form, you should explicitly instruct Nova Act on what data to input into each form element. This works well when you know the page structure ahead of time. However, there are cases where you may not know the page ahead of time, for example if you are creating one script to run on multiple sites. In those cases you can provide a data structure to Nova Act and ask it to populate the web form. Nova Act will use the field names and descriptions on the page to match to the input data it was provided.

Also, format the data in a conversational tone, just as you would if you were instructing another person to enter the data. So instead of using structured data formats like JSON or key values like `firstName: Bobby`, use a syntax like `The first name is Bobby.`. New lines are not included in the prompt so be sure to include proper punctuation to delineate inputs. Ending each line with a `.` (period) is a good practice.

In this example, we pass our input data into the Nova Act prompt, and assuming there are similarly labeled fields on the page, Nova Act will populate the page with this data.
```sh
data = """
The first name is Bobby.
The last name is McGee.
The phone number is 555-555-5555.
```

nova.act(f"Fill out the form using this data: {data}")
:::

**3. Working with dates**`

Date pickers are one of the more complex UI controls. It is recommended to specify the start and end dates in absolute time.
```sh
nova.act("select dates march 23 to march 28")
```

**4. Set the user agent**

Nova Act comes with Playwright's Chrome and Chromium browsers. These use the default User Agent set by Playwright. You can override this with the `user_agent` option:
```sh
nova = NovaAct(..., user_agent="MyUserAgent/2.7")
```

**5. Build dynamic workflows with BOOL_SCHEMA**

For many workflows, you may need Nova Act to first identify what actions it needs to carry out on the page, and then create prompts for those actions. This is helpful when you need to support multiple websites with a single Nova Act workflow. In a previous section you learned about the `BOOL_SCHEMA` constant. We can use this to understand the elements on a page, for example to learn `Is there a captcha on the screen?`:

```sh
result = nova.act("Is there a captcha on the screen?", schema=BOOL_SCHEMA)
if result.matches_schema and result.parsed_response:
    input("Please solve the captcha and hit return when done")
```

Similarly, you can ask Nova Act to return all the form elements on a page and use that to drive the prompts. For example, by iterating over each element and creating a prompt for it.

**Summary**
In this module you learned best practices for prompting Novav Act. In the next section you will learn about optimal screen resolutions.

# Section 6: Screen Resolution Best Practices

## Overview

In this section you will learn about optimal screen resolutions and best practices related to window sizes.

**1. Screen resolutions**

Nova Act is optimized for screen resolutions between `864×1296` and `1536×2304`. Performance may degrade outside this range.

When running Nova Act in `headless` mode, it launches a browser at a resolution of `1600x813`. When you are running Nova Act in `headed` mode (browser is visible), ensure that the screen resolution is capable of displaying `1600x813`. If not the `headed` browser will run in a lower resolution which can lead to inconsistencies in how the website is rendedered when running in `headless` mode. This is because many websites adjust their page and menu layouts based on the screen resolution, for example, collapsing menu items into a hamburger menu on lower resolutions.

Let's see two examples of setting in different resolutions. In the first example, we set a lower resolution of `1536x864` and in the second a higher resolution of `1024×768`. Notice the difference in the page content.

Here is the code for the lower resolution. Save it to a file named `low-res.py`.
```sh
from nova_act import NovaAct

def main():
    with NovaAct(starting_page="https://amazon.com", screen_height= 864, screen_width=1536) as nova:
        nova.act("search for pinatas")

if __name__ == "__main__":
    main()
:::

Then run it with:
:::code{language="sh" showLineNumbers=false showCopyAction=true copyAutoReturn=false}
python low-res.py
```

You should see a screen resolution like the following:
![low-res](864x1536.png)

Now let's run it in a higher resolution. (Note: your local machine's resolution will need to support `1920x1080` to see this example. If it can not, then you will need to adjust your resolution, or simply follow the code and observe the screenshot below). Save the code below to a file named `high-res.py`.

```
from nova_act import NovaAct

def main():
    with NovaAct(starting_page="https://amazon.com", screen_height= 1080, screen_width=1920) as nova:
        nova.act("search for pinatas")

if __name__ == "__main__":
    main()
```

Then run it with:
```
python high-res.py
```

You should see a screen resolution like the following:
![high-res](./1080x1920.png)

Notice how the higher resolution screenshot contains more content than the lower resolution screenshot.

**2. Adjusting window size during a workflow**

While Nova Act is running, **do not modify** the browser window size. Nova Act uses screenshots to help it understand the web site, and uses bounding boxes specified in screen coordinates to specify actions to take on the site. Adjusting the window size may confuse Nova Act as elements may become hidden or move around the page.

If you find you have adjusted the window size, simply stop the script, and then run it again.

You can run Nova Act in `headless` mode which removes any possibility of accidentally interfering with the Nova Act browser window. See the later section on CAPTCHAs for tips on observing a `headless` browser,

**Summary**
In this module you learned about best practices related to window size and screen resolution.

---
# Module 1 Summary

Congratulations! You have completed Module 1 of the Nova Act Fundamentals workshop. Here's what you've learned:

## ✅ Key Concepts Covered:

1. **Environment Setup**: API key configuration and Nova Act installation
2. **Interactive Mode**: Basic Nova Act operations and prompt construction
3. **Logging & Debugging**: HTML log files, headless mode, and video capture
4. **Structured Output**: Using Pydantic models and BOOL_SCHEMA for data extraction
5. **Prompting Best Practices**: Concise prompts, form filling, and dynamic workflows
6. **Resolution Best Practices**: Optimal screen sizes and window management

## 🚀 Next Steps:

You are now equipped to:
- Write and debug advanced Nova Act workflows
- Extract structured data from websites
- Build reliable, enterprise-scale automation
- Handle various web UI elements and scenarios

## 💡 Key Takeaways:

- Keep prompts concise (3-5 actions per `act()` call)
- Use structured output for data extraction
- Leverage logging for debugging
- Maintain consistent screen resolutions
- Build adaptive workflows with BOOL_SCHEMA