<a href="https://colab.research.google.com/github/erlichsefi/ScrapeAnything/blob/main/browser_base_translation%20/AutoJavaScript.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AutoGPT with JavaScript

## install selenium & chromium

In [1]:
# According to: https://stackoverflow.com/questions/51046454/how-can-we-use-selenium-webdriver-in-colab-research-google-com
%%capture
%%shell
# Ubuntu no longer distributes chromium-browser outside of snap
#
# Proposed solution: https://askubuntu.com/questions/1204571/how-to-install-chromium-without-snap

# Add debian buster
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF

# Add keys
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A

apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg

# Prefer debian repo for chromium* packages only
# Note the double-blank lines between entries
cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500


Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300


Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF

# Install chromium and chromium-driver
apt-get update
apt-get install chromium chromium-driver

# Install selenium
pip install selenium
apt install chromium-chromedriver
pip install pandas
pip install openai

## util to convert the on screen to data

In [17]:
script_with_logs = """
// Get all elements in the HTML page
const elements = document.getElementsByTagName('*');

// Create an array to store the element details
const elementDetails = [];

// Iterate through each element
for (let i = 0; i < elements.length; i++) {
  const element = elements[i];

  // Get the bounding rectangle of the element
  const rect = element.getBoundingClientRect();

  // Get the text content of the element
  const textContent = element.textContent.trim();

  // Get the tooltip value if it exists
  const tooltip = element.hasAttribute('title') ? element.getAttribute('title') : '';

  // Get the aria-label value
  const ariaLabel = (element.hasAttribute('aria-label') ? element.getAttribute('aria-label') : '');

  // Get the nodeName
  const e_type = element.nodeName;

  // Get the data-initial-value
  const data_initial_value = (element.hasAttribute('data-initial-value') ? element.getAttribute('data-initial-value') : '')

  // Get innerText
  const innerText = element.innerText

  // Store the element, its bounding rectangle, text content, and tooltip details
  const elementInfo = {
    element: element,
    rect: rect,
    textContent: textContent.replaceAll(",",";"),
    ariaLabel: ariaLabel.replaceAll(",",";"),
    tooltip: tooltip.replaceAll(",",";"),
    e_type: e_type.replaceAll(",",";"),
    data_initial_value: data_initial_value.replaceAll(",",";"),
    innerText: innerText !== undefined ? innerText.replaceAll(",",";"): "",
  };
  if ( elementInfo.rect.width > 0 && elementInfo.rect.height > 0){
    if (elementInfo.innerText != '' || elementInfo.data_initial_value != '' || elementInfo.tooltip != '' || elementInfo.textContent != '' || elementInfo.ariaLabel != ''){
      elementDetails.push(elementInfo);
    }

  }
}

let parents = elementDetails.map(e => e.element.parentElement);
let withoutParents = elementDetails.filter(elementDetail => !( parents.includes(elementDetail.element)));

const counts = {};
for (const num of parents) {
  counts[num] = counts[num] ? counts[num] + 1 : 1;
}
// Display the element details
console.log("centerX,centerY,ElementType,textContent,TooltipValue,AriaLabel,data-initial-value");
console.log(withoutParents.map( e=> (e.rect.left + (e.rect.width / 2))+","+(e.rect.top + (e.rect.height / 2))+","+e.e_type+","+e.textContent+","+e.tooltip+","+e.ariaLabel+","+e.data_initial_value).join("\\n"));
"""

def screen_to_table(wd):
  import pandas as pd
  import io

  script = f"""
  var consoleLogs = [];
  var originalLog = console.log;
  console.log = function(message) {{
      consoleLogs.push(message);
      originalLog.apply(console, arguments);
  }};

  {script_with_logs}

  return consoleLogs;
  """
  logs = wd.execute_script(script)
  try:
     return pd.read_csv(io.StringIO("\n".join(logs)), sep=",")
  except Exception as e:
    print(f"WARNING:\n On Table Data: {logs}\n Error {e}")
    return pd.read_csv(io.StringIO("\n".join(logs)), sep=",",on_bad_lines="skip")

In [3]:
def start_browesr():
  from selenium import webdriver
  from selenium.webdriver.chrome.service import Service

  service = Service(executable_path=r'/usr/bin/chromedriver')
  chrome_options = webdriver.ChromeOptions()
  chrome_options.add_argument('--headless')
  chrome_options.add_argument('--no-sandbox')
  chrome_options.add_argument('--lang=en')
  chrome_options.headless = True
  return webdriver.Chrome(service=service,options=chrome_options)

def web_driver_to_image(wd,file_name):
  full_path = f"{file_name}.png"
  wd.save_screenshot(full_path)
  return full_path


import os
import base64
from PIL import Image
from IPython.display import display, HTML

def display_images_side_by_side(before_file, after_file):
    if not os.path.isfile(before_file):
        print(f"Error: File '{before_file}' not found.")
        return
    if not os.path.isfile(after_file):
        print(f"Error: File '{after_file}' not found.")
        return

    _, before_ext = os.path.splitext(before_file)
    _, after_ext = os.path.splitext(after_file)
    valid_extensions = ['.png', '.jpg', '.jpeg', '.gif']

    if before_ext.lower() not in valid_extensions:
        print(f"Error: Invalid file type. Only {', '.join(valid_extensions)} are supported.")
        return
    if after_ext.lower() not in valid_extensions:
        print(f"Error: Invalid file type. Only {', '.join(valid_extensions)} are supported.")
        return

    try:
        display_side_by_side([
            (before_file, 'Before'),
            (after_file, 'After')
        ])
    except Exception as e:
        print(f"Error displaying images: {e}")

def display_side_by_side(images_with_titles):
    html = "<style>td img{max-width:100%; max-height:100%;} td.title-cell{text-align:center; font-size:18px;}</style>"
    html += "<table>"
    html += "<tr>"
    for _, title in images_with_titles:
        html += f"<td class='title-cell'>{title}</td>"
    html += "</tr>"
    html += "<tr>"
    for file, _ in images_with_titles:
        image_data = base64.b64encode(open(file, 'rb').read()).decode('utf-8')
        html += f"<td><img src='data:image/png;base64,{image_data}' /></td>"
    html += "</tr>"
    html += "</table>"

    display(HTML(html))

def draw_on_screen(webdriver,filename,x,y,**kwarg):
  from PIL import Image, ImageDraw
  # Perform mouse click at X and Y coordinates
  # Open the screenshot image using Pillow
  final_fname = f"click_location_{filename}"
  final_fname = web_driver_to_image(webdriver,final_fname)
  image = Image.open(final_fname)

  # Create a drawing context on the image
  draw = ImageDraw.Draw(image)

  # Define the size of the marker
  marker_size = 10

  # Draw a marker at the specified coordinates
  draw.rectangle([(x - marker_size, y - marker_size), (x + marker_size, y + marker_size)], outline="red")

  # Save the marked screenshot

  image.save(final_fname)
  return filename

def get_screen_size(webdriver):
  window_size = webdriver.get_window_size()
  width = window_size["width"]
  height = window_size["height"]
  return f"width={width},height={height}"


def get_scroll_height(web_driver):
    import time
    initial_scroll_position = web_driver.execute_script("return window.pageYOffset")

    # Scroll down a bit
    web_driver.execute_script("window.scrollBy(0, 100);")

    # Wait for a brief moment
    time.sleep(1)

    # Get the scroll position after scrolling down
    scroll_down_position = web_driver.execute_script("return window.pageYOffset")

    # Scroll up to the initial position
    web_driver.execute_script(f"window.scrollTo(0, {initial_scroll_position});")

    # Wait for a brief moment
    time.sleep(1)

    # Scroll up a bit
    web_driver.execute_script("window.scrollBy(0, -100);")

    # Wait for a brief moment
    time.sleep(1)

    # Get the scroll position after scrolling up
    scroll_up_position = web_driver.execute_script("return window.pageYOffset")

    # Scroll back to the initial position
    web_driver.execute_script(f"window.scrollTo(0, {initial_scroll_position});")

    # Compare the scroll positions
    if scroll_down_position > initial_scroll_position or scroll_up_position < initial_scroll_position:
        return "Client can scroll both up and down!"
    elif scroll_down_position > initial_scroll_position:
        return "Client can scroll down!"
    elif scroll_up_position < initial_scroll_position:
        return "Client can scroll up!"
    else:
        return "Client cannot scroll either up or down!"

def get_scroll_width(web_driver):
    import time
    initial_scroll_position = web_driver.execute_script("return window.pageXOffset")

    # Scroll right a bit
    web_driver.execute_script("window.scrollBy(100, 0);")

    # Wait for a brief moment
    time.sleep(1)

    # Get the scroll position after scrolling right
    scroll_right_position = web_driver.execute_script("return window.pageXOffset")

    # Scroll left to the initial position
    web_driver.execute_script(f"window.scrollTo({initial_scroll_position}, 0);")

    # Wait for a brief moment
    time.sleep(1)

    # Scroll left a bit
    web_driver.execute_script("window.scrollBy(-100, 0);")

    # Wait for a brief moment
    time.sleep(1)

    # Get the scroll position after scrolling left
    scroll_left_position = web_driver.execute_script("return window.pageXOffset")

    # Scroll back to the initial position
    web_driver.execute_script(f"window.scrollTo({initial_scroll_position}, 0);")

    # Compare the scroll positions
    if scroll_right_position > initial_scroll_position or scroll_left_position < initial_scroll_position:
        return "Client can scroll both left and right!"
    elif scroll_right_position > initial_scroll_position:
        return "Client can scroll right!"
    elif scroll_left_position < initial_scroll_position:
        return "Client can scroll left!"
    else:
        return "Client cannot scroll either left or right!"


def get_scroll_options(web_driver):
    width = get_scroll_width(web_driver)
    height = get_scroll_height(web_driver)
    return f"On the Width Axis, {width}. On the Height Axis, {height}"

# Tools
base on https://github.com/mpaepper/llm_agents/blob/main/llm_agents/tools/google_search.py

In [4]:
from pydantic import BaseModel

class ToolInterface(BaseModel):
    name: str
    description: str
    web_driver: object
    click_on_screen:bool = False

    def is_click_on_screen(self) -> bool:
      return self.click_on_screen



def example_tool(tool,setup_function=None,*arg,**kwarg):
  wd = start_browesr()
  if setup_function:
    setup_function(wd)
  b_filename = web_driver_to_image(wd,f"{str(tool.__class__)}_before")
  tool().use(wd,*arg,**kwarg)
  a_filename = web_driver_to_image(wd,f"{str(tool.__class__)}_after")
  display_images_side_by_side(b_filename,a_filename)

In [5]:
def click_on_screen(wd, x, y):
  js_script = f"return document.elementFromPoint({x}, {y})"
  input_field = wd.execute_script(js_script)
  # Enter the text into the input field
  input_field.click()
  return wd


class ClickOnCoordinates(ToolInterface):
  """Click on certain coordinate on the screen """

  name = "Click on coordinates on the screen"
  description = "click on x,y coordinates in order to move to the next screen. Input format: {{\"x\": <place_num_here>,\"y\":<place_num_here>}}"
  click_on_screen = True

  def use(self,web_driver:object, x: float, y:float) -> str:
      click_on_screen(web_driver,x,y)

In [6]:
class GoToURL(ToolInterface):
  """ Go to a specific url address """

  name = "Go to a specific url web address"
  description = "Change the url to a provied URL. Input format: {{\"url\":\"<place_url_here>\"}}"
  click_on_screen = True

  def use(self, web_driver:object, url: str)-> None:
      web_driver.get(url)


def change_url(web_driver,first_page):
  web_driver.get(first_page)

#example_tool(GoToURL,url="https://www.google.com/",setup_function=lambda wd:change_url(wd,"https://www.n12.co.il/"))

In [7]:
class ScrollDown(ToolInterface):
    """Scroll down the web page by half the screen height"""

    name = "Scroll Down"
    description = "Scroll down the web page by half the screen height, no input."

    def use(self, web_driver: object) -> None:
        # Get the height of the web page
        page_height = web_driver.execute_script("return document.body.scrollHeight")

        # Get the height of the viewport
        viewport_height = web_driver.execute_script("return window.innerHeight")

        # Calculate the scroll distance (half the screen height)
        scroll_distance = viewport_height // 2

        # Scroll down the web page
        web_driver.execute_script(f"window.scrollBy(0, {scroll_distance});")


#example_tool(ScrollDown,setup_function=lambda wd:change_url(wd,"https://www.n12.co.il/"))

In [8]:
class ScrollUp(ToolInterface):
    """Scroll up the web page by half the screen height"""

    name = "Scroll Up"
    description = "Scroll up the web page by half the screen height, no input."

    def use(self, web_driver: object) -> None:
        # Get the height of the viewport
        viewport_height = web_driver.execute_script("return window.innerHeight")

        # Calculate the scroll distance (half the screen height)
        scroll_distance = viewport_height // 2

        # Scroll up the web page
        web_driver.execute_script(f"window.scrollBy(0, -{scroll_distance});")

#example_tool(ScrollUp,setup_function=lambda wd:change_url(wd,"https://stackoverflow.com/a/20464320"))

In [9]:
class ScrollRight(ToolInterface):
    """Scroll the web page to the right by half the screen width"""

    name = "Scroll Right"
    description = "Scroll the web page to the right by half the screen width, no input"

    def use(self, web_driver: object) -> None:
        # Get the width of the viewport
        viewport_width = web_driver.execute_script("return window.innerWidth")

        # Calculate the scroll distance (half the screen width)
        scroll_distance = viewport_width // 2

        # Scroll the web page to the right
        web_driver.execute_script(f"window.scrollBy({scroll_distance}, 0);")

#example_tool(ScrollRight,setup_function=lambda wd:change_url(wd,"https://www.n12.co.il/"))

In [10]:
from selenium.webdriver.common.action_chains import ActionChains

class GoBack(ToolInterface):
    """go back to previous page"""

    name = "Go Back"
    description = "Go back to the previous page,no input."

    def use(self, web_driver: object) -> None:
        # Simulate clicking the browser's "Next" button
        web_driver.back()

def change_url_twice(web_driver,first_page,second_page):
  web_driver.get(first_page)
  web_driver.get(second_page)

#example_tool(GoBack,setup_function=lambda wd:change_url_twice(wd,"https://www.google.com","https://www.n12.co.il/"))

In [11]:



class EnterText(ToolInterface):
    """Click on a field and enter text"""

    name = "Enter Text"
    description = "Click on a field and enter text, Input format: {{\"text\":\"<text_to_enter>\",\"x\": <place_num_here>,\"y\":<place_num_here>}}"
    click_on_screen = True

    def use(self, web_driver: object, x:float ,y:float, text: str) -> None:
        js_script = f"return document.elementFromPoint({x}, {y})"
        input_field = web_driver.execute_script(js_script)
        print(input_field)
        # Enter the text into the input field
        input_field.click()
        input_field.send_keys(text)
#example_tool(EnterText,x=250,y=250,text="text to enter",setup_function=lambda wd:change_url(wd,"https://docs.google.com/document/d/1o1dTLtEeLGJ9iVAWHolGivi1RYnv2JbcSVEkP7PXB-Q/edit?usp=sharing"))

# AutoAgent



In [12]:
import openai
import os

from pydantic import BaseModel
from typing import List


class ChatLLM(BaseModel):
    model: str = 'gpt-3.5-turbo'
    temperature: float = 0.0
    openai.api_key = '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'

    def generate(self, prompt: str, stop: List[str] = None):
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=self.temperature,
            stop=stop
        )
        return response.choices[0].message.content

In [13]:
import datetime
import re

from pydantic import BaseModel
from typing import List, Dict, Tuple


FINAL_ANSWER_TOKEN = "Final Answer:"
OBSERVATION_TOKEN = "Observation:"
THOUGHT_TOKEN = "Thought:"
PROMPT_TEMPLATE = """

Today is {today}, The site i'm looking on is {site_url}, Here is a representation of what is see on my screen in a table shape.

{on_screen_data}

Screen Size: {screen_size}
Scroll Options: {scroll_ratio}
You should accomplish the task given to you as best as you can using the following tools:

{tool_description}

Use the following format:

Question: the input question you must answer
Thought: comment on what you want to do next
Action: the action to take, exactly one element of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation repeats N times, use it until you are sure of the answer)
Thought: I now know the final answer
Final Answer: your final answer to the original input question

Begin!

Task To Accomplish: {task_to_accomplish}
Thought: {previous_responses}
"""


class Agent(BaseModel):
    llm: ChatLLM
    webdriver : object
    tools: List[ToolInterface]
    prompt_template: str = PROMPT_TEMPLATE
    max_loops: int = 1
    # The stop pattern is used, so the LLM does not hallucinate until the end
    stop_pattern: List[str] = [f'\n{OBSERVATION_TOKEN}', f'\n\t{OBSERVATION_TOKEN}']

    @property
    def tool_description(self) -> str:
        return "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])

    @property
    def tool_names(self) -> str:
        return ",".join([tool.name for tool in self.tools])

    @property
    def tool_by_names(self) -> Dict[str, ToolInterface]:
        return {tool.name: tool for tool in self.tools}

    def run(self, task_to_accomplish: str, url:str):
        webdriver = start_browesr()
        webdriver.set_window_size(1024, 768)
        try:
            webdriver.get(url)
            previous_responses = []
            num_loops = 0
            on_screen = screen_to_table(webdriver)
            screen_size = get_screen_size(webdriver)
            file_name = web_driver_to_image(webdriver,"step_1")
            scroll_ratio = get_scroll_options(webdriver)
            prompt = self.prompt_template.format(
                    today = datetime.date.today(),
                    tool_description=self.tool_description,
                  tool_names=self.tool_names,
                  task_to_accomplish=task_to_accomplish,
                  on_screen_data="{on_screen_data}",
                  previous_responses='{previous_responses}',
                  screen_size="{screen_size}",
                  scroll_ratio="{scroll_ratio}",
                  site_url=url
            )

            print(f"StaticPrompt={prompt}")
            while num_loops < self.max_loops:
                num_loops += 1
                print(f"--- Iteration {num_loops} ---")

                curr_prompt = prompt.format(previous_responses='\n'.join(previous_responses),
                                            on_screen_data=on_screen.to_csv(index=False),
                                            screen_size=screen_size,scroll_ratio=scroll_ratio
                )

                print("\n\n#input:")
                print(f"screenshot={file_name}")
                print(f"Prompt={curr_prompt}")



                generated, tool, tool_input = self.decide_next_action(curr_prompt)
                if tool == 'Final Answer':
                    return tool_input
                if tool not in self.tool_by_names:
                    raise ValueError(f"Unknown tool: {tool}")




                tool_executor = self.tool_by_names[tool]
                if tool_executor.is_click_on_screen():
                  draw_on_screen(webdriver,f"step_click_location_{str(num_loops)}",**tool_input)

                tool_executor.use(webdriver,**tool_input)


                on_screen = screen_to_table(webdriver)
                screen_size = get_screen_size(webdriver)
                scroll_ratio = get_scroll_options(webdriver)
                file_name = web_driver_to_image(webdriver,f"step_{str(num_loops+1)}")
                generated += f"\n{OBSERVATION_TOKEN} \n{THOUGHT_TOKEN}"

                previous_responses.append(generated)

            print("------ Final Screen ------")
            print(f"screenshot={file_name}")
        except Exception as e:
                if webdriver:
                  webdriver.close()
                raise e

    def parse_json(self,tool_input:str):
      import json
      try:
        response = json.loads(tool_input)
      except Exception as e:
        raise ValueError(f"Output of LLM is not parsable as JSON: `{tool_input}`, error = {e}")
      return response

    def decide_next_action(self, prompt: str) -> str:
        print("\n\n#output:")

        generated = self.llm.generate(prompt, stop=self.stop_pattern)
        print(f"Gnerated={generated}")

        tool, tool_input = self._parse(generated)
        print(f"Tool={tool}")
        print(f"Args={tool_input}")

        return generated, tool, self.parse_json(tool_input)

    def _parse(self, generated: str) -> Tuple[str, str]:
        if FINAL_ANSWER_TOKEN in generated:
            return "Final Answer", generated.split(FINAL_ANSWER_TOKEN)[-1].strip()

        regex = r"Action: [\[]?(.*?)[\]]?[\n]*Action Input:[\s]*(.*)"
        match = re.search(regex, generated, re.DOTALL)
        if not match:
            raise ValueError(f"Output of LLM is not parsable for next tool use: `{generated}`")
        tool = match.group(1).strip()
        tool_input = match.group(2)
        return tool, tool_input.strip(" ").strip('"')

In [18]:
agent = Agent(max_loops=0,llm=ChatLLM(), tools=[GoToURL(),ClickOnCoordinates(),EnterText(),GoBack(),ScrollRight(),ScrollUp(),ScrollDown()])
agent.run("Log into my Gmail account, user name is 'erlichsefi@gmail.com', password is '1234567'","https://www.google.com")

  chrome_options.headless = True


StaticPrompt=

Today is 2023-07-03, The site i'm looking on is https://www.google.com, Here is a representation of what is see on my screen in a table shape.

{on_screen_data}

Screen Size: {screen_size}
Scroll Options: {scroll_ratio}
You should accomplish the task given to you as best as you can using the following tools:

Go to a specific url web address: Change the url to a provied URL. Input format: {{"url":"<place_url_here>"}}
Click on coordinates on the screen: click on x,y coordinates in order to move to the next screen. Input format: {{"x": <place_num_here>,"y":<place_num_here>}}
Enter Text: Click on a field and enter text, Input format: {{"text":"<text_to_enter>","x": <place_num_here>,"y":<place_num_here>}}
Go Back: Go back to the previous page,no input.
Scroll Right: Scroll the web page to the right by half the screen width, no input
Scroll Up: Scroll up the web page by half the screen height, no input.
Scroll Down: Scroll down the web page by half the screen height, no input

In [19]:
import os
import glob

files = glob.glob('/content/*')
for f in files:
  if os.path.isfile(f):
    os.remove(f)

In [20]:
agent = Agent(max_loops=3,llm=ChatLLM(), tools=[ClickOnCoordinates(),EnterText(),GoBack(),ScrollRight(),ScrollUp(),ScrollDown()])
agent.run("Log into my Gmail account, user name is 'erlichsefi@gmail.com', password is '1234567'","https://www.google.com")

  chrome_options.headless = True


StaticPrompt=

Today is 2023-07-03, The site i'm looking on is https://www.google.com, Here is a representation of what is see on my screen in a table shape.

{on_screen_data}

Screen Size: {screen_size}
Scroll Options: {scroll_ratio}
You should accomplish the task given to you as best as you can using the following tools:

Click on coordinates on the screen: click on x,y coordinates in order to move to the next screen. Input format: {{"x": <place_num_here>,"y":<place_num_here>}}
Enter Text: Click on a field and enter text, Input format: {{"text":"<text_to_enter>","x": <place_num_here>,"y":<place_num_here>}}
Go Back: Go back to the previous page,no input.
Scroll Right: Scroll the web page to the right by half the screen width, no input
Scroll Up: Scroll up the web page by half the screen height, no input.
Scroll Down: Scroll down the web page by half the screen height, no input.

Use the following format:

Question: the input question you must answer
Thought: comment on what you want t