# E2E Testing Agent

## Overview
This notebook defines an intelligent agent designed to control a headless browser and perform end-to-end (E2E) testing on web pages. 
Users can specify the webpage URL and describe test cases in natural language. 
The agent will interpret these instructions, generate, and execute the tests.

## Motivation
Programmatic browser control and E2E testing were traditionally done by writing scripts in frameworks such as Puppeteer or Playwright. We want to use the Agent system to allow the user to specify the E2E test cases in natural language. The agent will then create and execute these tests through Playwright.

## Key Components
1. [LangGraph](https://langchain-ai.github.io/langgraph/) - agent implementation
2. [Playwright](https://github.com/microsoft/playwright-python) - a Python Playwright version that we can use to generate a script that can execute the test
3. [Taipy](https://taipy.io/) - A library that we will use to spin up a simple website where we can demonstrate the capability of our agent
4. [langchain_community.agent_toolkits.PlayWrightBrowserToolkit](https://python.langchain.com/v0.1/docs/integrations/toolkits/playwright/)

## Method
The E2E tests generation process goes through the following steps:

1. **Instructions To Actions Conversion**: Convert user instruction for testing into well defined action steps that will be implemented.

2. **Playwright Code Generation**: Generate Playwright code chunks that execute specified action steps.

3. **Assertions Generation**: Creates assertions that specify whether the test have passed or not.

4. **Test Execution**: Evaluates the generate Playwright test case.

5. **Report Generation**: Creates the concise report of 


## Conclusion
TODO: write concise conclusion.

## Setup and Imports

Import necessary libraries and set up the environment.

In [None]:
import os
from typing import TypedDict, Annotated, Sequence, List
from langgraph.graph import Graph, END
from langchain_openai import AzureChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain.prompts.chat import SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain.output_parsers import PydanticOutputParser
from langchain_core.runnables.graph import MermaidDrawMethod

from playwright.async_api import async_playwright
import asyncio

from pydantic import BaseModel, Field

from IPython.display import display, Image
from dotenv import load_dotenv

load_dotenv()

Initialize the LLM instance.

In [2]:
llm = AzureChatOpenAI(
    model_name=os.getenv("AZURE_OPENAI_LLM_MODEL"),
    deployment_name=os.getenv("AZURE_OPENAI_LLM_MODEL_DEPLOYMENT"),
    temperature=0.0,
    streaming=True,
)

## Define Data Structures

Define the structure for the graph state using TypedDict.

In [3]:
class GraphState(TypedDict):
    messages: Annotated[Sequence[HumanMessage | AIMessage], "The messages in the conversation"]
    query: Annotated[str, "A user query containing instructions for the creation of the test case"]
    actions: Annotated[List[str], "List of actions for which to generate the code."]
    target_url: Annotated[str, "Valid URL of the website to test."]
    current_action: Annotated[int, "The current action to generate the code for."]
    script: Annotated[str, "The generated Playwright script."]
    website_state: Annotated[str, "DOM state of the website."]

class ActionList(BaseModel):
    actions: List[str] = Field(..., description="List of atomic actions for end-to-end testing")

## Define Graph Functions

Define the functions that will be used in the LangGraph workflow.

In [10]:
async def convert_user_instruction_to_actions(state: GraphState) -> GraphState:
    "Parse user instructions into a list of actions to be executed."

    output_parser = PydanticOutputParser(pydantic_object=ActionList)

    chat_template = ChatPromptTemplate.from_messages(
        [
            SystemMessagePromptTemplate.from_template(
                """
                You are an end-to-end testing specialist.
                Your goal is to break down general business end-to-end testing tasks into smaller well-defined actions.
                These actions will be later used to write the actual code that will execute the tests.
                """
            ),
            HumanMessagePromptTemplate.from_template(
                """
                Convert the below instructions into a dictionary with the key 'actions' and a list of atomic steps as its value.
                These steps will later be used to generate end-to-end test scripts.
                Each action should be a clear, atomic step that can be translated into code.
                Try to generate minimum number of actions that acomplish what user intends to test.
                First action must always be navigating to the target URL.
                Only return an output in the format of a dictionary with the key 'actions' whose value is a Python list.

                Examples:
                Input: "Test the login flow of the website"
                Output: {{
                    "actions": [
                        "Navigate to the login page via the URL.",
                        "Locate and enter a valid email in the 'Email' input field",
                        "Enter a valid password in the 'Password' input field",
                        "Click the 'Login' button to submit credentials",
                        "Verify that the user is logged in by validating that user name appears in the website header."
                    ]
                }}

                Input: "Test adding item to the shopping cart."
                Output: {{
                    "actions": [
                        "Navigate to the product listing page via the URL.",
                        "Click on the first product in the listing to open product details",
                        "Click the 'Add to Cart' button to add the selected item",
                        "Verify the selected item appears in the shopping cart sidebar or page"
                    ]
                }}

                Inptut: {query}
                Output:
                """
            ),
        ]
    )

    chain = chat_template | llm | output_parser

    actions = chain.invoke({"query": state["query"]})

    return {**state, "actions": actions}

In [5]:
# Note: to run Playwright in Jupyter on windows you need to follow this issue https://github.com/microsoft/playwright-python/issues/178#issuecomment-1302869947
async def get_initial_action(state: GraphState) -> GraphState:
    """Initialize a Playwright script with first action. This action is always navigation to the target URL and DOM state retrieval."""
    initial_script = f"""
async def generated_script_run():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("{state['url']}")
        
        # Define your next action here

        await browser.close()
        return dom_state
"""
    
    return {
        **state,
        "script": initial_script,
        "current_action": state["current_action"] + 1
    }

In [6]:
async def get_website_state(state: GraphState) -> GraphState:
    """Get the current DOM of the website"""

    exec(state["script"])
    dom_content = await generated_script_run()

    return {
        **state,
        'website_state': dom_content
    }

In [7]:

async def generate_code_for_action(state: GraphState) -> GraphState:
    "Generate code for a single action"
    pass

async def get_action_generation_status(state: GraphState) -> GraphState:
    "Decides whether to stop the generation of the actions and move to the assertions"
    pass

async def generate_assertions(state: GraphState) -> GraphState:
    "Generates test assertion that will be applied after action code"
    pass

async def execute_test_case(state: GraphState) -> GraphState:
    "Execute the whole test case with its assertions"
    pass

async def generate_test_report(state: GraphState) -> GraphState:
    "Generate the report from the test results"
    pass

## Set Up LangGraph Workflow

Define graph with nodes.

In [12]:
workflow = Graph()

workflow.add_node("convert_user_instruction_to_actions", convert_user_instruction_to_actions)
workflow.add_node("get_initial_action", get_initial_action)
workflow.add_node("get_website_state", get_website_state)
workflow.add_node("generate_code_for_action", generate_code_for_action)
workflow.add_node("get_action_generation_status", get_action_generation_status)
workflow.add_node("generate_assertions", generate_assertions)
workflow.add_node("execute_test_case", execute_test_case)
workflow.add_node("generate_test_report", generate_test_report)

Add edges to the graph.

In [13]:


workflow.set_entry_point("convert_user_instruction_to_actions")

workflow.add_edge("convert_user_instruction_to_actions", "get_initial_action")
workflow.add_edge("get_initial_action", "get_website_state")
workflow.add_edge("get_website_state", "generate_code_for_action")
workflow.add_edge("generate_code_for_action", "get_action_generation_status")

workflow.add_conditional_edges("get_action_generation_status", lambda x: x, ['get_website_state', 'generate_assertions'])

workflow.add_edge("generate_assertions", "execute_test_case")
workflow.add_edge("execute_test_case", "generate_test_report")

workflow.add_edge("generate_test_report", END)



app = workflow.compile()

## Display Graph Structure

In [None]:
display(
    Image(
        app.get_graph().draw_mermaid_png(
            draw_method=MermaidDrawMethod.API,
        )
    )
)

## Define Workflow Function

Define a function to run the workflow and display results.

In [14]:
# TODO: Finish the workflow
async def run_workflow(query: str, target_url: str):
    """Run the LangGraph workflow"""
    try:
        initial_state = {
            messages: [],
            query: query,
            actions: [],
            target_url: target_url,
            current_action: 0,
            script: None,
            website_state: None
        }

        result = await app.ainvoke(initial_state)

        return resul
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return None

## Execute Workflow

Start up a Streamlit mockup page as a subprocess (does not block notebook execution) to evaluate the workflow.

In [6]:
import subprocess

process = subprocess.Popen(
    ["flask", "run",],
    env={"FLASK_APP": "app.py", "FLASK_ENV": "development", **os.environ}
)

Run the workflow with a sample query.

In [None]:
query = "Test a registration form that contains username, password and password confirmation fields. After submitting it, verify that registration was successful."
target_url = "http://localhost:5000"
result = await run_workflow(query, target_url)

Terminate the Flask subprocess.

In [None]:
process.kill() 
print("Flask app terminated.")

## Display Testing Report

Display the report from the execution of the generated tests.