<a href="https://colab.research.google.com/github/dcownden/PerennialProblemsOfLifeWithABrain/blob/main/sequences/P1C1_BehaviourAsPolicy/P1C1_Sequence1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> &nbsp; <a href="https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/dcownden/PerrenialProblemsOfLifeWithABrain/main/sequences/P1C1_BehaviourAsPolicy/P1C1_Sequence1.ipynb" target="_parent"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open in Kaggle"/></a>

# **Part 1 Behaviour, environments and optimization: evolution and learning**

### **Animals have evolved in an environment; their behaviour is only meaningful within this environment.**

### Objectives: Introduce the core concepts of 
* ### **environment**, where an organism lives;
* ### **behaviour**, what the organism does there; 
* ### **optimization**, how learning and evolution shape behaviour so that the organism's behaviour is better suited to its environment.


___
# Chapter 1.1 Behaviour as a policy in the context of an environment

### Objective: Develop examples of how behaviour is described and evaluated relative to being [good](## "This is a very loaded term, to be unpacked carefully later") in an environmental niche. 

You will learn:
*   What is a policy? A formalization of behaviour as a function that determines an organism's behaviour (usually) based on the organism's experience of their environment.
*   What is a good policy? Rewards (and other signals?) from the environment in response to the organisms behaviour are integrated into a Loss/Objective function to evaluatate (and sometimes improve) a policy.
*   What is stochasticity? The environment and an organism's behavior may have elements of randomness. This can make policy evaluation challenging; was the policy generating bad behaviour or was the organism just unlucky? 



___
# **Sequence 1.1.1: Gridworld Introduction**

### Objective:

Create a simple environment-organism system to illustrate how an organism's **behaviour** within an **environment** can be evaluated using **rewards**. Within this context we will see how behavior guided by intellegence can be better and how **randomness** can make evalutation of behaviour difficult


# Setup



In [None]:
import ipywidgets as widgets
import threading
from IPython.display import display
import numpy as np
import matplotlib.pyplot as plt
import time
import asyncio
#import jax

!pip3 install vibecheck datatops > /dev/null 2> /dev/null #google.colab
from vibecheck import DatatopsContentReviewContainer

#from google.colab import output as colab_output
#colab_output.enable_custom_widget_manager()

In [None]:
# @title Plotting Functions
# @markdown You don't need to worry about how this code works – but you do need to **run the cell**

def make_grid(x_size, y_size):
  """Plots an x_size by y_size grid with cells centered on integer indices and
  returns fig and ax handles for futher use
  Args:
    x_size (int): size of grid in x dimension
    y_size (int): size of grid in y dimension

  Returns:
    fig (matplotlib.figure.Figure): figure handle for the grid
    ax: (matplotlib.axes._axes.Axes): axes handle for the grid
  """
  fig, ax = plt.subplots(figsize = (7,6), layout='constrained')
  #ax.axis("equal")
  ax.set_xticks(np.arange(0, x_size, 1))
  ax.set_yticks(np.arange(0, y_size, 1))
  # Labels for major ticks
  ax.set_xticklabels(np.arange(0, x_size, 1),fontsize=8)
  ax.set_yticklabels(np.arange(0, y_size, 1),fontsize=8)

  # Minor ticks
  ax.set_xticks(np.arange(0.5, x_size-0.5, 1), minor=True)
  ax.set_yticks(np.arange(0.5, y_size-0.5, 1), minor=True)

  # Gridlines based on minor ticks
  ax.grid(which='minor', color='grey', linestyle='-', linewidth=2, alpha=0.5)

  # Remove minor ticks
  ax.tick_params(which='minor', bottom=False, left=False)

  ax.set_xlim(( -0.5, x_size-0.5))
  ax.set_ylim(( -0.5, y_size-0.5))
  return fig, ax

def plot_food(fig, ax, xy_food_loc):
  food = ax.scatter([], [], s=150, marker='o', color='red', label='Food')
  food.set_offsets(xy_food_loc)
  return food

def plot_critter(fig, ax, xy_critter_loc):
  critter = ax.scatter([], [], s=250, marker='h', color='blue', label='Critter')
  critter.set_offsets(xy_critter_loc)
  return critter

In [None]:
# @title Simulation Functions, Variables and Widgets
# @markdown You don't need to worry about how this code works – but you do need to **run the cell**

# keep track of all relevant states of the simluation in a global state dictionary
# maybe turn this into an object if we want to have multiple independent simulations going on?

def init_loc(x_size, y_size, num):
  """Returns random 2d grid locations, without replacement,
  in both unravled/flat coordinates and as xy pairs"""
  int_loc = np.random.choice(x_size * y_size, num, replace=False)
  xy_loc = np.vstack(np.unravel_index(int_loc, (x_size, y_size))).T
  return int_loc, xy_loc

def init_state(x_size=7,
              y_size=7,
              num_food=10,
              xy_critter=None):
  state = {'num_moves': 0,
           'num_eats': 0,
           'num_food': num_food,
           'xy_food_loc': np.array([[]]),
           'int_food_loc': np.array([]),
           'x_size': x_size,
           'y_size': y_size}

  #randomly initialize food locations
  if num_food > x_size * y_size -1:
    state['num_food'] = x_size * y_size -1
  state['int_food_loc'], state['xy_food_loc'] = init_loc(state['x_size'],state['y_size'],state['num_food'])

  if xy_critter is None:
    # put critter roughly in the middle
    state['xy_critter_loc'] = np.array([[np.floor(state['x_size']/2), np.floor(state['y_size']/2)]], dtype=int)
    state['int_critter_loc'] = np.ravel_multi_index((state['xy_critter_loc'][:,0],state['xy_critter_loc'][:,1]), (state['x_size'], state['y_size']))
  else:
    state['xy_critter_loc'] = np.array(xy_critter).reshape((1,2))
    state['int_critter_loc'] = np.ravel_multi_index((state['xy_critter_loc'][:,0],state['xy_critter_loc'][:,1]), (state['x_size'], state['y_size']))
  return state

state = init_state()

def update_critter_location(x, y):
  state['num_moves'] = state['num_moves'] + 1
  state['xy_critter_loc'] = np.array([[x,y]])
  state['int_critter_loc'] = np.ravel_multi_index((state['xy_critter_loc'][:,0],state['xy_critter_loc'][:,1]), (state['x_size'], state['y_size']))

def eating():
  # is the critter on a food patch
  if state['int_critter_loc'][0] in set(state['int_food_loc']):
    #critter is on a food patch
    state['num_eats'] = state['num_eats'] + 1
    # figure out where the new food will go where there isn't already food
    new_food_loc = np.random.choice(list(set(np.arange(state['x_size']*state['y_size'])) - set(state['int_food_loc'])))
    #remove the eaten food
    new_food_loc_set = set(state['int_food_loc'])
    new_food_loc_set.remove(state['int_critter_loc'][0])
    # add the new food
    new_food_loc_set.add(new_food_loc)

    #update state
    state['int_food_loc'] = np.array(np.sort(list(new_food_loc_set)))
    state['xy_food_loc'] = np.vstack(np.unravel_index(state['int_food_loc'], (state['x_size'], state['y_size']))).T
    state['eating_string'] = ('You ate the food here, new food at ' + str(np.unravel_index(new_food_loc, (state['x_size'], state['y_size']))))
  else:
    state['eating_string'] = 'No food here'

def update_state(x, y):
  update_critter_location(x, y)
  eating()
  #print(state_dict['food_locationssUnravel'])

output = widgets.Output()

def plot_fig_from_scratch():
  fig, ax = make_grid(state['x_size'], state['y_size'])
  plot_food(fig, ax, state['xy_food_loc'])
  plot_critter(fig, ax, state['xy_critter_loc'])
  fig.legend(loc = "outside right upper")
  plt.show()

up_button = widgets.Button(description="Up")
down_button = widgets.Button(description="Down")
left_button = widgets.Button(description="Left")
right_button = widgets.Button(description="Right")
random_movement = widgets.Checkbox(
    value=False,
    description='Move Randomly',
    disabled=False,
    indent=False
)

def random_click():
  move = np.random.randint(0,4)
  if move == 0:
    up_button.click()
  elif move == 1:
    down_button.click()
  elif move == 2:
    left_button.click()
  elif move == 3:
    right_button.click()
  else:
    print('should not happen')


class Timer:
  def __init__(self, timeout, callback):
    self._timeout = timeout
    self._callback = callback
  async def _job(self):
    await asyncio.sleep(self._timeout)
    self._callback()
  def start(self):
    self._task = asyncio.ensure_future(self._job())
  def cancel(self):
    self._task.cancel()

click_timer = Timer(5.0, random_click)

def button_output_update(which_button, x, y):
  with output:
    if random_movement.value == True:
      if hasattr(click_timer, '__Task'):
        click_timer.cancel()
    output.clear_output(wait=True)
    print("Moved " + which_button + ". " + "Criter is now at ({}, {}).".format(x,y))
    print(state['eating_string'])
    print("Moves: {0} \tFood Eaten: {1} \tFood Per Move: {2}".format(state['num_moves'], state['num_eats'], state['num_eats'] / state['num_moves']))
    plot_fig_from_scratch()
    if random_movement.value == True:
      click_timer.start()

def on_up_button_clicked(b):
  which_button = 'up'
  x = state['xy_critter_loc'][0][0]
  y = state['xy_critter_loc'][0][1]
  if y >= state['y_size']-1:
    y = state['y_size'] - 2
  elif y < state['y_size']-1:
    y = y+1
  update_state(x, y)
  button_output_update(which_button, x, y)

def on_down_button_clicked(b):
  which_button = 'down'
  x = state['xy_critter_loc'][0][0]
  y = state['xy_critter_loc'][0][1]
  if y > 0 :
    y = y-1
  elif y <= 0:
    y = 1
  update_state(x, y)
  button_output_update(which_button, x, y)

def on_left_button_clicked(b):
  which_button = 'left'
  x = state['xy_critter_loc'][0][0]
  y = state['xy_critter_loc'][0][1]
  if x > 0 :
    x = x-1
  elif x <= 0:
    x = 1
  update_state(x, y)
  button_output_update(which_button, x, y)

def on_right_button_clicked(b):
  which_button = 'right'
  x = state['xy_critter_loc'][0][0]
  y = state['xy_critter_loc'][0][1]
  if x >= state['x_size']-1:
    x = state['x_size'] - 2
  elif x < state['x_size']-1:
    x = x+1
  update_state(x, y)
  button_output_update(which_button, x, y)

up_button.on_click(on_up_button_clicked)
down_button.on_click(on_down_button_clicked)
left_button.on_click(on_left_button_clicked)
right_button.on_click(on_right_button_clicked)

In [None]:
# @title Other Helper Functions
# @markdown You don't need to worry about how this code works – but you do need to **run the cell**

class dotdict(dict):
  def __getattr__(self, name):
    return self[name]

def content_review(notebook_section: str):
  return DatatopsContentReviewContainer(
    "",  # No text prompt
    notebook_section,
    {
      "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
      "name": "neuro_book",
      "user_key": "xuk960xj",
    },
  ).render()

# Micro 1.1.1.1: Initializing Gridworld

Before we introduce an organism with **behaviour** we're going to build an **environment** for them to behave in. To start, this world will consist of a 7 x 7 grid of cells. Let's make a picture of that and see what it looks like.

In [None]:
############################################################################
## TO DO for students: replace ... with the correct arguments(inputs) is the
## make_grid function below to a grid the right size and shape. You can use the
## tool tip by hovering over the word make_grid to find out how to use it. You
## can also use the tool tip to view the source code. How does it work?
raise NotImplementedError("Student exercise: make grid using the make_grid function")
############################################################################

fig, ax = make_grid(...)
plt.show()

In [None]:
# to_remove solution

fig, ax = make_grid(7, 7)
plt.show()

Bonus activity: Tweak the make_grid function in the Plotting Functions cell above to make the gridlines green.

Wow, what a boring environment. Let's add an organism and something for that organism to interact with. We'll start with 10 food items scattered randomly throughout the grid, never more than one food item per cell. To plot these food item we need to initalize their locations by randomly sampling grid coordinates [without replacement](## "never picking the same (x,y) coordinate pair twice"). We'll place the agent roughly in the middle of the grid.

In [None]:
################################################################################
# TODO for students: replace ... in init_loc(...) to initialize the right number of
# food item locations, in coordinates that make sense for our grid environment
# then comment out or remove the next line.
raise NotImplementedError("Exercise: make the grid the right size")
################################################################################
def init_loc(x_size, y_size, num):
  """Returns random 2d grid locations, without replacement,
  in both unravled/flat coordinates and as xy pairs"""
  int_loc = np.random.choice(x_size * y_size, num, replace=False)
  xy_loc = np.vstack(np.unravel_index(int_loc, (x_size, y_size))).T
  return int_loc, xy_loc

fig, ax = make_grid(7, 7)
init_loc(...)
plot_food(fig, ax, xy_food_loc)

xy_critter_loc = (5,5)
plot_critter(fig, ax, xy_critter_loc)

fig.legend(loc='outside right upper')
plt.show()

In [None]:
#to_remove solution
def init_loc(x_size, y_size, num):
  """Returns random 2d grid locations, without replacement,
  in both unravled/flat coordinates and as xy pairs"""
  int_loc = np.random.choice(x_size * y_size, num, replace=False)
  xy_loc = np.vstack(np.unravel_index(int_loc, (x_size, y_size))).T
  return int_loc, xy_loc

fig, ax = make_grid(7, 7)
int_food_loc, xy_food_loc = init_loc(7,7,10)
plot_food(fig, ax, xy_food_loc)

xy_critter_loc = (5,5)
plot_critter(fig, ax, xy_critter_loc)

fig.legend(loc='outside right upper')
plt.show()

In [None]:
# @markdown Vibe Check
content_review("Sequence 1.1.1 Micro 1")

---
# Micro 1.1.1.2: Random Eating

Now that we have an environment scattered with food and an organism, let's introduce some behaviour. The organism drifts around the environment randomly (connect to real life situations where this strategy is employed) and eats the food that they happen to stumble upon. When food is eaten, the organism gets a **reward**, in this case a *Food Eaten* point, and a new food item appears randomly somewhere else in the environment (that doesn't already have food). Run the following cell to see what this looks like.

In [None]:
# @title Random Movement
# @markdown You don't need to worry about how this code works, yet – just **run the cell** for now and watch what happens
display(output)

# 20 random button presses, one every ~3 seconds
state = init_state(x_size=7, y_size=7, num_food=10)
for ii in range(30):
  random_click()
  time.sleep(0.1)

When the organism is just drifting around randomly how good is it at eating lots of food, what is its efficiency in terms of food per movement? Now run the cell above a few more times. Does the organism always eat the same amount of food or does it change between simulation runs? 

In [None]:
# to_remove explanation

"""
The amount of food eaten varies a lot from simulation run to simulation run,
usually it manages to eat one or two pieces of food, sometimes more sometimes less.
""";

Before we move on it's important to test that our simulation is running as we expect. Randomness can make testing hard, but can overcome in part by setting up the environment in such a way that the outcome becomes deterministic. In the two cells bellow change how the the Grid World is initialized in terms of size, shape and number of food items. In one case so that organism is gaurenteed achieve 100% efficnency in terms of food per move and another case where the organism is gaurentted to achieve 0% efficiency.

In [None]:
###############################################################################
# TODO for students: replace ... in init_state(...) to initialize a grid world
# where the organism is always 100% efficient.
raise NotImplementedError("Exercise: make random movement 100% efficient")
################################################################################

display(output)

# 20 random moves, one every ~0.1 seconds
state = init_state(...)
for ii in range(30):
  random_click()
  time.sleep(0.1)

In [None]:
#to_remove solution

display(output)

# 20 random moves, one every ~0.1 seconds
state = init_state(x_size=3, y_size=2, num_food=5)
for ii in range(30):
  random_click()
  time.sleep(0.1)

In [None]:
###############################################################################
# TODO for students: replace ... in init_state(...) to initialize a grid world
# where the organism is always 0% efficient.
raise NotImplementedError("Exercise: make random movement 0% efficient")
################################################################################

display(output)

# 20 random button presses, one every ~0.1 seconds
state = init_state(...)
for ii in range(30):
  random_click()
  time.sleep(0.1)

In [None]:
#to_remove solution

display(output)

# 20 random button presses, one every ~0.1 seconds
state = init_state(x_size=7, y_size=7, num_food=0)
for ii in range(30):
  random_click()
  time.sleep(0.1)

In [None]:
# @markdown Vibe Check
content_review("Sequence 1.1.1 Micro 2")

---
# Micro 1.1.1.3: Better Than Random Eating
Now it's your turn to control the organism. Run the next cell and see how much more efficient than random drifting your control of the organism in terms of food per movement.

In [None]:
# @title Controlled Movement
# @markdown You don't need to worry about how this code works – just **run the cell** and then use the buttons to guide the organism

# user in control
state = init_state(x_size=7, y_size=7, num_food=10)
with output:
  output.clear_output(wait=True)
  print('Press a button')
  plot_fig_from_scratch()
display(widgets.HBox([left_button, widgets.VBox([up_button, down_button]), right_button]), output)

Hopefully you were able to perform much better than a policy of random flailing. Even in this relatively simple, contrived and constrained foraging problem intellegence helps a lot. What kinds of strategies and heuristics did you use to guide your choice of direction? In some very real sense, the whole purpose of a nervous system and a brain is to solve problems of this kind, i.e., based on inputs from the environment which actions should be taken to maximize reward.

In [None]:
# @markdown Vibe Check
content_review("Sequence 1.1.1 Micro 3")

---
# Micro 1.1.1.4: Optimal Eating
Finally we'd like to introduce a time traveling super organism, GW7x7-10-30, from the last chapter of this book. GW7x7-10-30 has absolutely mastered 7x7 grdiworld, with 10 food items and a 30 round duration. Run the next cell and see how efficient a highly optimized and specialized gridworld organism can be. Are you as good as gw7x7-10-30? If not, you gotta read this book 😉 (If you can't beat the AI, at least learn how to program the AI.)

(Last chapter not made yet so working on a minimal sketch/skeleton version to produce this agent)

In [None]:
# @title Optimal Movement
# @markdown You don't need to worry about how this code works – just **run the cell** to watch the super organism behave. See if you can outperform them.

# master organism in control

In [None]:
# @markdown Vibe Check
content_review("Sequence 1.1.1 Micro 4")

---
# Graveyard


Useful scavenging from github and SO

In [None]:
out = widgets.Output(layout=widgets.Layout(height='300px'))

x = np.linspace(0,1,100)

def update_plot(w):
    with out:
        # Without clear_output(), figures get appended below each other inside
        # the output widget
        # Ah ha! Got it! I need wait=True!
        out.clear_output(wait=True)
        plt.plot(x, x**p_widget.value)
        plt.show()

p_widget = widgets.FloatSlider(min=0, max=2, step=0.1, value = 1)
update_plot([])
p_widget.observe(update_plot)
display(p_widget, out)

In [None]:
import ipywidgets as widgets

x = 0
y = {"x": 0}
b = widgets.Button(description="Do it")

def doit(obj):
    print(x, y)
    # x += 1 # uncommenting makes the above print fail
    y["x"] += 1 # this is ok

b.on_click(doit)
display(b)

In [None]:
# how to get figures to work in this context nicely???
a = widgets.FloatSlider(description='a', min=0, max=10)
b = widgets.FloatSlider(description='b', min=0, max=10)
c = widgets.FloatSlider(description='c', min=0, max=10)

def f(a, b, c):
    print('{}*{}*{}={}'.format(a, b, c, a*b*c))

out = widgets.interactive_output(f, {'a': a, 'b': b, 'c': c})

widgets.HBox([widgets.VBox([a, b, c]), out])

In [None]:
from matplotlib import animation, rc, patches
from IPython.display import HTML

In [None]:
# First set up the figure, the axis, and the plot element we want to animate
fig, ax = plt.subplots()

ax.set_xticks(np.arange(0, 11, 1))
ax.set_yticks(np.arange(0, 11, 1))

# Labels for major ticks
ax.set_xticklabels(np.arange(0, 11, 1))
ax.set_yticklabels(np.arange(0, 11, 1))

# Minor ticks
ax.set_xticks(np.arange(1.5, 10.5, 1), minor=True)
ax.set_yticks(np.arange(1.5, 10.5, 1), minor=True)

# Gridlines based on minor ticks
ax.grid(which='minor', color='grey', linestyle='-', linewidth=2, alpha=0.5)

# Remove minor ticks
ax.tick_params(which='minor', bottom=False, left=False)

ax.set_xlim(( 0.5, 10.5))
ax.set_ylim(( 0.5, 10.5))

#ax.axis("equal")
scatter = ax.scatter([], [])
#critterCircle = plt.Circle((5,5), 0.4)
#dot = ax.add_patch(critterCircle)
#scatter.set_offsets([[5, 5]])

scatter.set_offsets([[3, 11-3]])
#fig.canvas.draw()
#fig.canvas.flush_events()

In [None]:
def init():
    scatter.set_offsets([[5, 5]])
    return scatter,

In [None]:
def animate_dot(i):
    print(i)
    x = i
    y = 11-i
    scatter.set_offsets([[x, y]])
    return scatter,

In [None]:
# call the animator. blit=True means only re-draw the parts that have changed.
anim = animation.FuncAnimation(fig, animate_dot, init_func=init,
                               frames=(np.arange(10)+1), interval=500, blit=True)

In [None]:
HTML(anim.to_html5_video())

In [None]:
rc('animation', html='html5')

In [None]:
anim

These movies are really pretty compared drawing everything from scratch, each time, need to figure out some way to combine the two eventually, but will keep prototyping in janky mode for now.

In [None]:
#will python find ffmpeg on the github worker?