# Prerequisites:

In [None]:
import gym
import interaction_gym
import numpy as np
import event_inference as event
import random

Set the random seed for this particular run, all seeding is done when initializing the model

In [None]:
seed = 0

# The model:

The computational model we use here learns probabilistic event schema-representations. All schemata are composed of three probability distributions:
- a starting condition, encoding which kind of observations typically result in an activation of this schema
- a dynamics model, which encodes how the observations typically change during on even
- a ending condition, which encodes what kind of observation is typically required for this event to end

All three components are modeled as Gaussian distributions that live in the space of observations. 

Two examples of what these schemata could potentially encode for the events 'reaching' and 'falling':

<img src="Doc/schema1.png" width="300">

<img src="Doc/schema2.png" width="300">

The model distinguishes between training and testing phases. During training the system learns the schemata in a supervised fashion, i.e., it receives explicit labels about the ongoing event. During testing, on the other hand, the system has to use the learned distributions, to infer the probability of on event being active, given the perceived sensorimotor information.

We first initialize the model:

In [None]:
model = event.CAPRI(epsilon_start=0.01, epsilon_dynamics=0.001, epsilon_end=0.001,
                    no_transition_prior=0.9, dim_observation=18, num_policies=3, 
                    num_models=4, r_seed=seed, sampling_rate=2)

# The environment:

We test the system in a simple agent-patient interaction simulation. In this simulation multiple event sequences $E$ can be observed. Each sequence $E$ is composed of multiple events $e_i$. The following table shows the possible vent sequences and their event components:

<img src="Doc/events.png" width="500">

The system can interact with the environment by different gaze policies $\pi$. The gaze policy states where the system decides to look. Depending on the gaze, the system receives clear or noisy sensory information about the agent and patient. The gaze position is visualized in simulation as a small red dot.

We initialize the environment:

In [None]:
env = interaction_gym.InteractionEventGym(sensory_noise_base=1.0, sensory_noise_focus=0.01, randomize_colors = True)

# Training the system:

The code below runs the sensorimotor loop. During training the system is trained on 100 event sequences. For each sequence a random policy $\pi$ is determined. The system receives a new observation in every time step and updates it event schemata. 

env.render() visualizes the simulation. If this line is removed, no rendering takes place and the simulation runs faster.

In [None]:
for episodes in range(3000):
    
    # Reset environment to new event sequence
    observation = env.reset()
    
    # Sample one-hot-encoding of policy pi(0)
    policy_t = np.array([0.0, 0.0, 0.0])
    policy_t[random.randint(0, 2)] = 1
    for t in range(3000):
        
        #Rendering if desired:
        #env.render()#store_video=True, video_identifier=0)
        
        # Perform pi(t) and receive new observation o(t)
        observation, reward, done, info = env.step(policy_t)
        
        # Update the event probabilities, event schemata, and infer next policy
        policy_t, P_ei = model.step(o_t=observation, pi_t=policy_t, training=True, done=done, e_i=info)
        
        # Next sequence when event sequence is over
        if done:
            print("Episode ", episodes, " done after ", t , " time steps")
            break
env.close()

# Testing the system:

A training phase is followed by a testing phase. Our testing phases are inspired by studies of goal-prediction in infants. Here infants observe reaching movements done by a hand or a mechanical claw. Their gaze is tracked to determine if they are able to anticipate the action goal. If the infant looks at the target of reaching before the agent actually reaches it, it is considered a goal-predictive gaze.

There are various experimental findings on this goal-predictive gaze. Apparently, young infants that have little experience in grasping never perform a goal-predictive gaze. 11-month-old infants perform a goal-predictive gaze when the event is performed by a familiar agent (hand) but not if it is performed by an unfamiliar agent (claw).

Our testing phases also show reach-grasp-and-carry motions done by a hand or claw. Note, that only hand-agents perform reaching and transporting during training. 

Our hypothesis is, that if our model attempts to minimize uncertainty about future events and event boundaries and chooses its gaze accordingly, an anticipatory gaze behavior can emerge similar to the one of infants. However, the system can only anticipate the goal if it is able to identify the event. This is unlikely if the system has little experience with reaching (young age of infants / few training phases) or an agent is performing this event that was never observed before (claw-agent)

<img src="Doc/hypothesis.png" width="500">

The code below runs the testing phase 10 times with event and policy inference of the system. Here, the agent is a claw:

In [None]:
for episodes in range(10):
    
    # Reset environment to new event sequence
    observation = env.reset_to_grasping(claw=False) # claw=False for hand-agent
    
    # Sample one-hot-encoding of policy pi(0)
    policy_t = np.array([0.0, 0.0, 0.0])
    #policy_t[random.randint(0, 2)] = 1
    policy_t[2] = 1.0
    for t in range(3000):
        #policy_t = np.array([0.0, 0.0, 1.0])
        #Rendering if desired:
        env.render() #store_video=True, video_identifier=0)
        
        # Perform pi(t) and receive new observation o(t)
        observation, reward, done, info = env.step(policy_t)
        
        # Update the event probabilities, event schemata, and infer next policy
        policy_t, P_ei = model.step(o_t=observation, pi_t=policy_t, training=False, done=done, e_i=info)
        print("Event = ", info)
        print("P_ei = ", P_ei)
        print("Policy = ", policy_t)
        
        # Next sequence when event sequence is over
        if done:
            print("Episode ", episodes, " done after ", t , " time steps")
            break
env.close()