# Block construction agents

This is the main general analysis file for the [block construction task](https://github.com/cogtoolslab/block_construction). 

The data should be loaded in by a dataframe produced by experiment_runner.py

In [None]:
# set up imports
import os
import sys
__file__ = os.getcwd()
proj_dir =  os.path.dirname(os.path.realpath(__file__))
sys.path.append(proj_dir)
utils_dir = os.path.join(proj_dir,'utils')
sys.path.append(utils_dir)
analysis_dir = os.path.join(proj_dir,'analysis')
analysis_utils_dir = os.path.join(analysis_dir,'utils')
sys.path.append(analysis_utils_dir)
agent_dir = os.path.join(proj_dir,'model')
sys.path.append(agent_dir)
agent_util_dir = os.path.join(agent_dir,'utils')
sys.path.append(agent_util_dir)
experiments_dir = os.path.join(proj_dir,'experiments')
sys.path.append(experiments_dir)
df_dir = os.path.join(proj_dir,'results/dataframes')

In [None]:
from analysis.utils.analysis_helper import *
from analysis.utils.analysis_graphs import *
from analysis.utils.analysis_figures import *
import utils.blockworld as bw

In [None]:
import model.utils.decomposition_functions

In [None]:
import random

In [None]:
#inline plots
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [None]:
plt.rcParams["figure.figsize"] = (40,7)
plt.rcParams.update({'font.size': 22})

In [None]:
#display all columns
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 20)
pd.set_option('display.max_colwidth', 100)
pd.set_option('display.min_rows', 40)

Let's load the results of the experiment

In [None]:
df_paths = ['subgoal planning full BFS3.pkl']

In [None]:
#load all experiments as one dataframe
df = pd.concat([pd.read_pickle(os.path.join(df_dir,l)) for l in df_paths])
print("Loaded dataframe")

In [None]:
# preprocess the data
preprocess_df(df)

In [None]:
# consider saving the costly preprocessing
df.to_pickle(os.path.join(df_dir,str(df_paths)+"_preprocessed.pkl"))

In [None]:
display(df)

We'll have data for the following agents, agent configurations and on the following silhouettes:

In [None]:
list(smart_short_agent_names(df['agent_attributes'].unique()))

In [None]:
df['agent_attributes'].unique()

In [None]:
df[['agent_label','decomposed_silhouette']]

In [None]:
type(df.iloc[7]['decomposed_silhouette'][0]['decomposition'])

In [None]:
list(df['world'].unique())

In [None]:
df.columns

Let's add human readable labels for nice plotting if the smart labels aren't good enough:

In [None]:
df['agent_attributes_string'] = df['agent_attributes'].astype(str)

In [None]:
#produce dict for agent naming
agents = df['agent_attributes_string'].unique()
agent_labels = dict(zip(list(agents),["XXX"]*len(agents)))
#now manually add names and save back
agent_labels

In [None]:
#save back into agent_labels (just copy and paste...)
agent_labels = {"{'agent_type': 'BFS_Agent', 'scoring_function': 'F1score', 'horizon': 1, 'scoring_type': 'Average'}": '1',
 "{'agent_type': 'BFS_Agent', 'scoring_function': 'random_scoring', 'horizon': 1, 'scoring_type': 'Average'}": 'random',
 "{'agent_type': 'BFS_Agent', 'scoring_function': 'F1score', 'horizon': 2, 'scoring_type': 'Average'}": '2',
 "{'agent_type': 'BFS_Agent', 'scoring_function': 'F1score', 'horizon': 3, 'scoring_type': 'Average'}": '3',
 "{'agent_type': 'BFS_Agent', 'scoring_function': 'F1score', 'horizon': 4, 'scoring_type': 'Average'}": '4'}

In [None]:
#write to dataframe
df['agent_label'] = df['agent_attributes_string'].replace(agent_labels)

In [None]:
#only include big silhouettes
df = df[df['world'].str.contains('int_struct')]

A graphical illustration of the silhouettes used in the dataframe:

In [None]:
illustrate_worlds(df.sort_values('world'))

## Overview over agents
All agents use pure F1 score to judge the value of intermediate states.

| Agent |  | Parameters |
|:--|:--|:--|
| Random (special case of BFS) | Randomly chooses a legal action. | *None* |
| Breadth first search | Agent performs breadth first search on the tree of all possible actions and chooses the sequence of actions that has the highest average reward over the next *planning depth* steps | Horizon: how many steps in advance is the tree of possible actions searched? |
| MCTS | Implements Monte Carlo Tree Search | Horizon: the number of rollouts per run |
| Naive Q learning | Implements naive Q learning with an epsilon-greedy exploration policy | Maximum number of episodes: how many episodes per run |
| A* search | Implements A* search algorithm. Runs until winning state is found or an upper limit of steps is reached. Is determininistic.| *None* |
| Beam search | Implements beam search: searches tree of possible action, but only keeps the best *beam size* actions at each iteration. Is determininistic. | Beam size: the number of states kept under consideration at every step |
| Construction paper agent | Implements the construction paper agent: it occludes part of the silhuouette, uses a lower level agent to build the remainder, then 'slides up' the occluder and builds again... | Lower level agent: the lower level planning agent together with its parameters |
| Subgoal planning agent | Implements planning over `lookahead` many subgoals akin to Correa and Ho and then acts the first one, plans again.| Lookahead: how many subgoals to plan into the Future. \ c_weight: how to weigh cost against reward |
| Full Subgoal planning agent | Implements planning over all possible subgoals akin to Correa and Ho and then acts all of them.| c_weight: how to weigh cost against reward |

### Glossary

**Run**: training (if applicable) and running one agent on one particular silhouette.

**Silhouette**: the particular outline (and set of baseblocks) that the agent has to reconstruct.

**State**: state of the blockworld environment consisting of the blocks that have already been placed in it.

## Qualitative visualizations

### Build animation

In [None]:
#parameters for random run
agent_type = 'Subgoal_Planning_Agent'
lookahead = 1
c_weight = 0.1

In [None]:
for c in range(10):
    a_w_df = df.query("lookeahead == @lookahead and c_weight == @c_weight and world == 'int_struct_11' and agent_type == @agent_type")
    r_ID = random.choice(a_w_df['run_ID'].unique())
    random_run = a_w_df[a_w_df['run_ID']==r_ID]
    print(random_run.tail(1)['world_failure_reason'].item())
    build_animation(random_run,agent_type+" la: "+str(lookahead)+" cw: "+str(c_weight)+" nr. "+str(c+1))

## Success

### Success vs effiency
How does the success of an agent relate to it's computational efficiency?

In [None]:
scatter_success_cost(df)

### Rate of perfect reconstruction per agent

How often does an agent succeed in getting a perfect reconstruction?

In [None]:
mean_win_per_agent(df)

### Rate of perfect reconstruction per agent over silhouettes
How often does an agent achieve a perfect reconstruction in a particular world?

In [None]:
mean_win_per_agent_over_worlds(df)

### F1 score
What [F1 score](https://en.wikipedia.org/wiki/F1_score) does the agent achieve? Since F1 score decreases if an agent keeps building after being unable to perfectly recreate the structure, we look at the peak of F1 score for every run. 

So here is the average peak F1 score per agent conditioned on outcome of the run.

In [None]:
mean_peak_score_per_agent(df)

#### F1 score over silhouettes
What is the peak F1 score for a particular silhouette?

In [None]:
mean_peak_F1_per_agent_over_worlds(df)

## Failure kinds
In run where the agent fails to achieve perfect reconstruction, what is the reason for the failure?

**Full** indicates that no further block can be placed.
**Unstable** indicates the structure has collapsed.
**Did not finish** means that the agent hasn't finished building either because it terminated or it reached the limit on number of steps (40). 

In [None]:
mean_failure_reason_per_agent(df)

### Failure kinds over silhouettes
In kinds of failure does a certain agent on a certain silhouette tend to make?

In [None]:
mean_failure_reason_per_agent_over_worlds(df)

### Holes
The most common failure mode is leaving holes in the structure that should be covered with a block, but that the agent can't get to because a block has been above. Here, the measure is defined as the number of grid cells that are in the silhouette, not built on, but the cell right above is built. The higher this number, the more or the larger the holes are that the agent builds.

In [None]:
mean_score_per_agent(df,scoring_function=bw.holes)

### Heatmaps
To get a qualitative sense of what the agents are doing, let's look at the heatmap of all runs of an agent on a particular silhouette at the moment of peak F1. Looking at final states will include many runs in which the agent has simply filled the entire area. 
Brighter colors indicate higher average occupancy of a cell. The target is shown on the left.

Note that if the final block placed was unstable it is still included here.

In [None]:
heatmaps_at_peak_per_agent_over_world(df)

In [None]:
heatmaps_per_agent_over_world(df)

## Greediness
Do agents prefer a greedy policy (using large blocks to cover much area) or a conservative policy of using smaller blocks and more steps?

### Number of steps taken
On average, how many steps does an agent take before the end of the run? 

Looking at the average number of steps for runs with perfect reconstruction ("win") tells us whether an agent builds with larger or smaller blocks. 

Looking at the average number of steps for runs with failed reconstructions ("fail") tells whether the failures occur early or late in the process. Since many failures are due to the agent simply filling everything with blocks this number is likely high and not very informative.

In [None]:
avg_steps_to_end_per_agent(df)

### Growth rate of F1 score
On average, what is the average F1 score taken over every action up to the peak of F1 score for a particular run? For runs conditioned on perfect reconstructions, what is the growth rate of F1?

The higher this number is, the more F1 score is gained early on in the run (ie. a logaritmic looking curve of F1 score). 
Note that the bars conditioned on winning runs all have a peak F1 score of 1 and are thus directly comparable.

In [None]:
mean_avg_area_under_curve_to_peakF1_per_agent(df)

### Average F1 score over time
What is the average F1 score over time? 

Decreasing line indicates the behavior of the agent to keep choosing the least-worst action if a perfect reconstruction is no longer possible.

For runs that terminate early, the last F1 score is kept as to not show outliers in the later part of the graph. Thus, a perfect reconstruction at step 8 is counted as a score of 1 for the last 12 steps. 

In [None]:
graph_mean_F1_over_time_per_agent(df)

### Average block size over time
What is the average size of the block placed at a certain step?

Note that only runs that aren't finished at a given step are included in the calculation of the mean/std, so later steps might be less informative.

In [None]:
graph_avg_blocksize_over_time_per_agent(df.query("lookeahead == 1"))

## Consistency
### Average pairwise Euclidean distance of block placements
How consistent are different runs of the same agent on the same silhouette?

Here, we measure the average pairwise distance for an agent across runs on the same silhouette. A lower score indicates higher similarity. A score of 0 indicates that all runs were identical—this occurs when the agent is deterministic.

>For any pair of action sequences, we define the “raw action dissimilarity” as the mean Euclidean distance between corresponding pairs of [x, y, w, h] action vectors. When two sequences are of different lengths, we evaluate this metric over the first k actions in both, where k represents the length of the shorter sequence

In [None]:
#This scales exponentially and takes a really long time
mean_pairwise_raw_euclidean_distance_between_runs(df)

### Trajectory graph
The trajectory graph (adopted from [block_construction](https://github.com/cogtoolslab/block_construction)) shows the path all runs from a single agent on a particular value take through state space. The y axis orders states by F1 score. The size of nodes indicates how common a certain state is. The color indicates whether the coloured edge ends in failure (red) or at least one perfect reconstruction.

In [None]:
trajectory_per_agent_over_world(df)

## Locality bias
### Proportion of local block placements
Does the agent prefer to place a block on top of or next to the last placed block?

The score is calculated by looking at what percentage of blocks placed during a run touch (either top/bottom or sides, not corners) the block placed immediately before. 
A score of 1 indicates that all blocks were placed on the last one, a score of 0 indicates that the agent switched to a different location to build at every opportunity.

In [None]:
mean_touching_last_block_per_agent(df)

## Order
Do agents show a bias in which order different parts of the structure are built? To analyze this, here are the heatmaps per silhouette and per agent that show for each cell the average index of the block in that cell. A value of 3.2 would indicate that the block in that cell was placed on average as the 3.2rd block in that run.

In [None]:
heatmaps_block_index_per_agent_over_world(df)

## Planning cost
### How many states are evaluated during planning?
How many states are evaluated during planning? This is a proxy for how expensive and effective the planning of an agent is. 

Low scores for the runs conditioned on perfect reconstructions indicate that often when a solution can be found, it can be found quickly.

In [None]:
total_avg_states_evaluated_per_agent(df)

## The effect of tool use
What effect does the use of a tool have?

### Pairwise efficiency
If we run an agent with and without tools, what is the rate of perfect reconstructions it achieves?

In [None]:
#manually fill the list of agent pairs with (tool, no tool, label)
agent_pairs = (
    [
        (
       "{'agent_type': 'Construction_Paper_Agent', 'decomposition_function': 'random_1_4', 'lower level: agent_type': 'BFS_Agent', 'lower level: scoring_function': 'random_scoring', 'lower level: horizon': 1, 'lower level: scoring_type': 'Final_state', 'lower level: random_seed': None}",
       "{'agent_type': 'BFS_Agent', 'scoring_function': 'random_scoring', 'horizon': 1, 'scoring_type': 'Final_state'}",
       "Random"),
       (
       "{'agent_type': 'Construction_Paper_Agent', 'decomposition_function': 'random_1_4', 'lower level: agent_type': 'BFS_Agent', 'lower level: scoring_function': 'silhouette_score', 'lower level: horizon': 1, 'lower level: scoring_type': 'Final_state', 'lower level: random_seed': None}",
       "{'agent_type': 'BFS_Agent', 'scoring_function': 'silhouette_score', 'horizon': 1, 'scoring_type': 'Final_state'}",
       "Horizon 1"),
       (
       "{'agent_type': 'Construction_Paper_Agent', 'decomposition_function': 'random_1_4', 'lower level: agent_type': 'BFS_Agent', 'lower level: scoring_function': 'silhouette_score', 'lower level: horizon': 2, 'lower level: scoring_type': 'Final_state', 'lower level: random_seed': None}",
       "{'agent_type': 'BFS_Agent', 'scoring_function': 'silhouette_score', 'horizon': 2, 'scoring_type': 'Final_state'}",
       "Horizon 2")]
)

In [None]:
scatter_success_pairs(df,agent_pairs)

## Subgoal planning

### How many subgoals are found?
This graphs displays the mean number of actual subgoals per run (as in: subgoals passed to the lower level agent, ignoring lookahead) per agent. Note that this only works for agents that act out one subgoal before having to plan again. 

In [None]:
mean_num_subgoals_per_agent(df)

### What is the cost of subgoal planning?

In [None]:
#TODO subgoal planning bar graph

In [None]:
df

In [None]:
df['ratio_successful'] = df['_all_sequences'].apply(lambda x:len([s for s in x if s.complete()])/len(x))

In [None]:
df[['world','all_sequences_planning_cost','lower level: horizon','lower level: scoring_function','ratio_successful']].sort_values(by=['world','lower level: scoring_function','lower level: horizon'])

In [None]:
seqs = df.query("all_sequences_planning_cost == 0")['_all_sequences'].head(1).item()

In [None]:
df.columns

In [None]:
df['seq_names'] = df['_chosen_subgoal_sequence'].apply(get_names)

In [None]:
def get_names(x):
    try:
        return x.names()
    except:
        return x

In [None]:
df.query("c_weight == 0")