# Chat in an Ai2Thor space with EMISSOR

In this notebook, we demonstrate how you can chat with the agent in the Ai2Thor space and save the so-called ```signals``` in EMISSOR.
For this, we will import 1) a LeolaniChatClient so that we can capture the signals and store them in an EMISSOR scenario and 2) an Ai2ThorClient that let us interact with the space through an agent. There are three modalities considered:

1. text modality for the turns of the user and the agent
2. action modality for the actions caried out by the agent in the space (mostly navigation)
3. image modality for the objects found by the agent in the space

EMISSOR stores signals for each modality in a scenario folder. At the start of an interaction, a new scenario folder is created together with a JSON file with the same name, the so-called scenario JSON. The scenario JSON contains the meta data for the scenario as a whole, including a temporal ruler in terms of a start and end time. The temporal ruler is used to align the signal in a temporal sequence.

https://ai2thor.allenai.org/ithor/documentation/environment-state

In [1]:
from leolani_client import LeolaniChatClient, Action
emissor_path = "./emissor"
HUMAN="Piek"
AGENT="Ai2Thor"
leolaniClient = LeolaniChatClient(emissor_path=emissor_path, agent=AGENT, human=HUMAN)

In [2]:
import ai2thor
from ai2thor.controller import Controller
import numpy as np

In [3]:
from ai2thor_client import Ai2ThorClient

In [4]:
ai2ThorClient = Ai2ThorClient()

max_context=50
AI = "AI"

utterance = "Hi %s. Tell me what to do." % HUMAN
print(AGENT+">"+utterance)
leolaniClient._add_utterance(AGENT, utterance) 

utterance = "This is what I can do:"+str(ai2ThorClient.what_i_can_do())
print(AGENT+">"+utterance)
leolaniClient._add_utterance(AGENT, utterance) 

utterance = input(HUMAN+"> ")
leolaniClient._add_utterance(HUMAN, utterance) 

while not (utterance.lower() == "stop" or utterance.lower() == "bye"):
        ai2ThorClient.process_instruction(utterance)
        for utterance in ai2ThorClient._answers:
            print(AGENT+">"+str(utterance))
            leolaniClient._add_utterance(AGENT, utterance)
            
        for obj, objectType, coord, image in ai2ThorClient._perceptions:
            leolaniClient._add_image(obj['name'], objectType, coord, image) 

        for action in ai2ThorClient._actions:
            leolaniClient._add_action(action)
            
        utterance = input(HUMAN+"> ")
        leolaniClient._add_utterance(HUMAN, utterance) 

ai2ThorClient._controller.stop()
##### After completion, we save the scenario in the defined emissor folder.
leolaniClient._save_scenario() 

Ai2Thor>Hi Piek. Tell me what to do.
Ai2Thor>This is what I can do:('I can do the following:', "['find', 'describe', 'move', 'go', 'turn', 'forward', 'back', 'left', 'right', 'open', 'close', 'look']")


Piek>  describe


Ai2Thor>I see 67 things there.
Apple
Bottle
	I can break it.
Bowl
Bread
ButterKnife
Cabinet
	I can open it.
Cabinet
	I can open it.
Cabinet
	I can open it.
Cabinet
	I can open it.
Cabinet
	I can open it.
Cabinet
	I can open it.
CellPhone
	I can break it.
Chair
	I can move it.
Chair
	I can move it.
CoffeeMachine
	I can move it.
CounterTop
CounterTop
CounterTop
CreditCard
Cup
	I can break it.
DishSponge
Drawer
	I can open it.
Drawer
	I can open it.
Drawer
	I can open it.
Egg
	I can break it.
Faucet
Floor
Fork
Fridge
	I can open it.
GarbageCan
	I can move it.
HousePlant
	I can move it.
Knife
Lettuce
LightSwitch
Microwave
	I can move it.
	I can open it.
Mug
	I can break it.
Pan
PaperTowelRoll
PepperShaker
Plate
	I can break it.
Pot
Potato
SaltShaker
Shelf
Shelf
Shelf
ShelvingUnit
	I can move it.
Sink
SinkBasin
SoapBottle
Spatula
Spoon
Statue
	I can break it.
StoveBurner
StoveBurner
StoveBurner
StoveBurner
StoveKnob
StoveKnob
StoveKnob
StoveKnob
Toaster
	I can move it.
Tomato
Vase
	I can br

Piek>  bye


## Modalities in EMISSOR

For each modality a separate JSON file is used. Since EMISSOR does not support actions, they are now stored as special text signals.
Signals can have annotations. In the case of text signals, we annotate the source of the turn. The sources are the HUMAN, the AGENT or ACTION. In the former two cases, this means either of the two entered this turn, whereas the latter is used for actions carried out by Ai2Thor. The ```text.json``` thus contain a mixture of turns and actions. Here is an example of  turn as a text signal, where the value in the annotation is the speaker ```Piek```:

```
  {
    "@context": {...},
    "@type": "TextSignal",
    "id": "770630ea-3930-4d26-9d9c-27b62312f78c",
    "ruler": {...},
    "seq": [...],
    "modality": "TEXT",
    "time": {...},
    "files": [],
    "mentions": [
      {
        "@context": {...},
        "@type": "Mention",
        "id": "d9dbd4b9-d6a8-40c3-9ea6-e906ecd98100",
        "segment": [
          {
            "@context": {...},
            "@type": "Index",
            "container_id": "770630ea-3930-4d26-9d9c-27b62312f78c",
            "start": 0,
            "stop": 8,
            "_py_type": "emissor.representation.container-Index"
          }
        ],
        "annotations": [
          {
            "@context": {...},
            "@type": "Annotation",
            "type": "ConversationalAgent",
            "value": "Piek",
            "source": "LEOLANI",
            "timestamp": 1730187274264,
            "_py_type": "emissor.representation.scenario-Annotation"
          }
        ]
      }
    ],
    "text": "find cup"
  },
```


Here is a text signal in which Ai2Thor reports what it found in a chat turn, so the value in the annotation is ```Ai2Thor```:

```
  {
    "@context": {...},
    "@type": "TextSignal",
    "id": "1d550a4a-52fa-484a-97dc-64665da699db",
    "ruler": {
      "@context": {...},
      "@type": "Index",
      "container_id": "1d550a4a-52fa-484a-97dc-64665da699db",
      "start": 0,
      "stop": 135,
      "_py_type": "emissor.representation.container-Index"
    },
    "seq": [...],
    "modality": "TEXT",
    "time": {...},
      "@type": "TemporalRuler",
      "container_id": "<emissor.persistence.persistence.ScenarioController object at 0x10d858430>",
      "start": 1730187274333,
      "end": 1730187274333
    },
    "files": [],
    "mentions": [
      {
        "@context": {...},
        "@type": "Mention",
        "id": "791475d7-17b6-48b0-974a-78f548486648",
        "segment": [
          {
            "@context": {...},
            "@type": "Index",
            "container_id": "1d550a4a-52fa-484a-97dc-64665da699db",
            "start": 0,
            "stop": 135,
            "_py_type": "emissor.representation.container-Index"
          }
        ],
        "annotations": [
          {
            "@context": {...},
            "@type": "Annotation",
            "type": "ConversationalAgent",
            "value": "Ai2Thor",
            "source": "LEOLANI",
            "timestamp": 1730187274333,
            "_py_type": "emissor.representation.scenario-Annotation"
          }
        ]
      }
    ],
    "text": "I found 1 instances of type cup in my view\nCup_8266e2aa at {'x': 1.0786449909210205, 'y': 0.8995606899261475, 'z': -0.7677611708641052}"
  },
```

The next examples shows an action as a text signal:

```
  {
    "@context": {...},
    "@type": "TextSignal",
    "id": "859064d1-5b01-4dbf-8f17-98d69d6e62db",
    "ruler": {
      "@context": {...},
      "@type": "Index",
      "container_id": "859064d1-5b01-4dbf-8f17-98d69d6e62db",
      "start": 0,
      "stop": 4,
      "_py_type": "emissor.representation.container-Index"
    },
    "seq": [...],
    "modality": "TEXT",
    "time": {...},
    "files": [],
    "mentions": [
      {
        "@context": {...},
        "@type": "Mention",
        "id": "8547f62b-d310-4049-b355-aca6824a2de6",
        "segment": [
          {
            "@context": {...},
            "@type": "Index",
            "container_id": "859064d1-5b01-4dbf-8f17-98d69d6e62db",
            "start": 0,
            "stop": 4,
            "_py_type": "emissor.representation.container-Index"
          }
        ],
        "annotations": [
          {
            "@context": {...},
            "@type": "Annotation",
            "type": "ConversationalAgent",
            "value": "ACTION",
            "source": "LEOLANI",
            "timestamp": 1730187274335,
            "_py_type": "emissor.representation.scenario-Annotation"
          }
        ]
      }
    ],
    "text": "Look"
  },
```

The meta data for the images are saved in ```image.json```, the image itself in which an object was detected is saved in a subdirectory "image", where each image file has the name of the object. Here is the representation fio an aimage signal in the ````image.json``` meta file:

```
  {
    "@context": {...},
    "@type": "ImageSignal",
    "id": "3eca43c8-a9a8-4743-a20d-85badb1f13ff",
    "ruler": {
      "@context": {...},
      "@type": "MultiIndex",
      "container_id": "3eca43c8-a9a8-4743-a20d-85badb1f13ff",
      "bounds": [
        0,
        0,
        640,
        480
      ],
      "_py_type": "emissor.representation.container-MultiIndex"
    },
    "array": "",
    "modality": "IMAGE",
    "time": {
      "@context": {...},
      "@type": "TemporalRuler",
      "container_id": "cc1d9b3d-dcb2-466e-9b56-aae0944453fe",
      "start": 1730191697555,
      "end": 1730191697555
    },
    "files": [
      "./emissor/cc1d9b3d-dcb2-466e-9b56-aae0944453fe/image/Apple_f33eaaa0.jpg"
    ],
    "mentions": [
      {
        "@context": {...},
        "@type": "Mention",
        "id": "7607d8ef-c67f-4a66-a28b-84f2f2a14bb4",
        "segment": [
          {
            "@context": {...},
            "@type": "MultiIndex",
            "container_id": "3eca43c8-a9a8-4743-a20d-85badb1f13ff",
            "bounds": [
              0,
              0,
              -1,
              0
            ],
            "_py_type": "emissor.representation.container-MultiIndex"
          }
        ],
        "annotations": [
          {
            "@context": {...},
            "@type": "Annotation",
            "type": "apple",
            "value": {
              "_py_type": "builtins-dict"
            },
            "source": "Ai2Thor",
            "timestamp": 1730191697,
            "_py_type": "emissor.representation.scenario-Annotation"
          }
        ]
      }
    ]
  }
```
