# RLAgent: Automating tasks on Neuroglancer

This code provides the infrastructure to use Neuroglancer as a fully automated agent. This heavily relies on the Selenium library and JavaScript to control Neuroglancer's state. This provides a flexible and fast way to interact with Neuroglancer, with possibilities to do headless, local or remote sessions. More importantly, it allows to simulate a human's interaction and have a real environment for Reinforcement Learning training.

## The main kind of interaction is to:
- **click** (left, right, middle, double) at a specific position
- change the **JSON state** programmatically to navigate on continous dimensions (zoom, rotation, translation), add new layers, etc.

## It can:
- follow a recorded of sequence of clicks and JSON state changes done by a human
- acquire screenshots in less than 5 ms
- prepares the infrastructure to use Reinforcement Learning to navigate the Neuroglancer UI. However, this task is technically challenging. And has been decomposed into smaller tasks.

## Example of the agent following a human's interaction by reiterating the actions:

<video width="900" height="600" controls>
  <source src="tutorial_images/agent_test.mov" type="video/mp4">
  Your browser does not support the video tag.
</video>


# Prerequisites

Before running this notebook, ensure you have the following dependencies installed:

- **Python**:(Version used: **3.10.16**)
- **Selenium**: (Version used: **4.29.0**)
- **Pillow**: (Version used: **11.1.0**)
- **numpy**: (Version used: **2.2.4**)
- **scipy**: (Version used: **1.12.1**)
- **torch**

We have provided already two chrome drivers for MacOS and Linux. It is very probably that you will need to install the correct version for your system. For this, you need to access Chrome and find your current version and then download the correct driver [here](https://googlechromelabs.github.io/chrome-for-testing/).

# Neuroglancer Automation:

The main classes are <span style="color:blue; font-weight:bold;">Agent</span> and <span style="color:green; font-weight:bold;">ChromeNGL</span>.

The <span style="color:blue; font-weight:bold;">Agent</span> class is the main class that is used to interact with the Neuroglancer UI.

The <span style="color:green; font-weight:bold;">ChromeNGL</span> class is used to control the Chrome browser and the Neuroglancer UI.


In [9]:
from Agent import Agent


This will open a Chrome browser. With headless=True, it will not be visible but all functions will still be available.

In [10]:
agent = Agent(start_session=True, headless=False)


Login attempted. Waiting for confirmation...
Login successful!


In [15]:
agent.start_neuroglancer_session() #defaults to neuroglancer-demo.appspot.com and a default JSON state
# For graphene or other middle-auth+ sessions, we need to start a local host session with the URL we want to start from. This function does the extra google auth login.
#agent.start_graphene_session()

Now that we have a Neuroglancer session we can interact with it.

For convenience, we define a pos_state variable that contains the position, crossSectionScale, projectionOrientation, projectionScale.

The observed state == state here and is: 'state = (pos_state, curr_image).'


In [16]:
pos_state, curr_image, json_state = agent.prepare_state(image_path=None, euler_angles=False, resize=False, add_mouse=False, fast=True)
# if only the state is needed, we can use the get_state function
#pos_state = agent.get_state(euler_angles=False)

print(pos_state) # list of floats and ints
print("Size of current image: ", curr_image.size)
#curr_image.show() # PIL image of the current state
#print(json_state) # JSON state of the current state as a dictionary for easy access to this state in the future if needed
# change url with agent.chrome_ngl.change_url(url)

[[143944.703125, 61076.59375, 192.5807647705078], 2.0339912586467497, [-0.4705163836479187, 0.8044001460075378, -0.30343097448349, 0.1987067461013794], 13976.00585680798]
Size of current image:  (1046, 992)


# Interacting Programmatically with `Agent.apply_actions()`

The `Agent.apply_actions()` function is designed to handle interactions by applying incremental changes to a JSON structure and specifying raw positions for mouse clicks.

### Action Vector Structure

The function takes a vector with the following components:

```plaintext
(
    left_click, right_click, double_click,  # 3 booleans for mouse clicks
    x, y,                                  # 2 floats for absolute mouse position
    key_Shift, key_Ctrl, key_Alt,          # 3 booleans for modifier keys
    json_change,                           # 1 boolean indicating a JSON change
    delta_position_x, delta_position_y, delta_position_z,  # 3 floats for position deltas
    delta_crossSectionScale,               # 1 float for cross-section scaling
    delta_projectionOrientation_q1, delta_projectionOrientation_q2,
    delta_projectionOrientation_q3, delta_projectionOrientation_q4,  # 4 floats for orientation quaternion
    delta_projectionScale                  # 1 float for projection scaling
)


In [17]:
# Doing a click action takes priority over any JSON change. JSON change will be ignored if a click action is present. Very easy to modify, this was done to have easier training.
action_vector = [
    0, 0, 0,  # left, right, double click
    100, 100,  # x, y
    0, 0, 0,  # no modifier keys
    1,  # no JSON change
    10, 0, 0,  # position change
    0,  # cross-section scaling
    0.1, 0, 0, 0,  # orientation change default is quaternion
    1000  # projection scaling (log-scale in neuroglancer)
] # 
#print(action_vector)
agent.apply_actions(action_vector, euler_angles=False)

For an environment test, we can define a loop. And use a model to take the action decision.

In [18]:
# ExampleLoop

for i in range(100):
    pos_state, curr_image, json_state = agent.prepare_state(image_path=None, euler_angles=True, resize=False, add_mouse=False, fast=True) # time is about 0.05 seconds
    # --> action_vector = model.predict(pos_state, curr_image)
    # For example purpose, lets use a random action
    action_vector = [
        0, 0, 0,  # left, right, double click booleans
        100, 100,  # x, y
        0, 0, 0,  # no modifier keys
        1,  # no JSON change
        10, 0, 0,  # position change
        0,  # cross-section scaling
        0.2, 0, 0,  # orientation change in Euler angles, which is better for a model to learn or a human to understand
        2000  # projection scaling (log-scale in neuroglancer)
        ]
    agent.apply_actions(action_vector, json_state=json_state, euler_angles=True, verbose=False)

# If you need to directly change the JSON state, suppose adding a new layer, you can do it with the following function:
# agent.chrome_ngl.change_JSON_state(json_state)

If we want we can follow an episode recorded by a human.

In [19]:
episode_path = "./episodes/raw/1800x900/episode_5.json"
agent.follow_episode(episode_path, sleep_time=0.1)


Step:  1 Action:  Action Event: Drag
Step:  2 Action:  Action Event: Drag
Step:  3 Action:  Action Event: Drag
Step:  4 Action:  Action Event: Drag
Step:  5 Action:  Action Event: Drag
Step:  6 Action:  Action Event: Drag
Step:  7 Action:  Action Event: Drag
Step:  8 Action:  Action Event: Drag
Step:  9 Action:  Action Event: Drag
Step:  10 Action:  Action Event: Drag
Step:  11 Action:  Action Event: Drag
Step:  12 Action:  Action Event: Drag
Step:  13 Action:  Action Event: Drag
Step:  14 Action:  Action Event: Inside render: Single Click: Right Click | Relative position: x=1463, y=205 with keys: None 
Step:  15 Action:  Action Event: Drag
Step:  16 Action:  Action Event: Drag
Step:  17 Action:  Action Event: Drag
Step:  18 Action:  Action Event: Drag
Step:  19 Action:  Action Event: Drag
Step:  20 Action:  Action Event: Drag
Step:  21 Action:  Action Event: Drag
Step:  22 Action:  Action Event: Drag
Step:  23 Action:  Action Event: Drag
Step:  24 Action:  Action Event: Drag
Step:  25

If we need to parse an episode from its recorded sequence of JSON states and actions, we can use the parse_episode function.

In [None]:
episode_path = "./episodes/raw/1800x900/episode_5.json"
save_path = "./reparsed_episodes/test/"
parsed_data = agent.parse_episode(episode_path, save_path)

You can also create synthetic data by running the following function and adapting the code in the file to your needs.

In [None]:
!python utils/datagen/collect.py

# Reinforcement Learning Framework: --WIP

In [None]:
# An environment step is defined as the sequence of actions:
# 1. get the state
# 2. make decision and apply the action
# 3. get the next state
# 4. compute the reward
# 5. update the policy
# -----> 
#new_pos_state, new_curr_image, new_json_state = agent.prepare_state(image_path=None, euler_angles=False, resize=False, add_mouse=False)
#agent.apply_actions(action_vector)
# update policy

Controlling Neuroglancer from a cluster with a Hosting machine. --WIP

Use proxy/startHost.py to start the hosting machine.

Start the cluster script and instead of using the usual functions, you write the action vectors onto a file which is read by the hosting machine. This is experimental and has synchronization issues where the cluster does not wait for the hosting machine to read the action vectors and return the states.