# Formalizing communicative efficiency as program abstraction

In the previous section, we saw that Architects used increasingly concise language to describe the scenes they were viewing. In particular, we saw that the entities they referred to became increasingly complex, from individual blocks to entire towers.  

Here we try to make sense of this process through the lens of *abstraction learning*. We assume that, as people are exposed to scenes with a number of shared elements, they acquire *abstractions*-- patterns they can use to represent the scenes more efficiently (either in perception or language).

In this section, we formalize the notion of abstraction with the concept of *program abstraction*. By representing scenes as simple computer programs, we can apply abstraction learning algorithms to propose the set of abstractions participants have in their heads after seeing each scene.

In [1]:
import os
import sys
import urllib, io
os.getcwd()

import numpy as np
import scipy.stats as stats
import pandas as pd

import pymongo as pm
from collections import Counter
import json
import re
import ast

from PIL import Image, ImageOps, ImageDraw, ImageFont 

from io import BytesIO
import base64

from IPython.display import clear_output

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", message="numpy.dtype size changed")
warnings.filterwarnings("ignore", message="numpy.ufunc size changed")

## Representing block towers as programs

We assume a basic level of understanding of block building concepts, encapsulated by a *domain specific language* (DSL).  

Our *base DSL* (adapted from the [Dreamcoder building task](https://arxiv.org/abs/2006.08381)), contains primitives:
- **h**: horizontal domino
- **v**: vertical domino
- **r_x**: move right x places, where x in {1,2,3,4,5,6,7,8,9}
- **l_x**: move left x places,  where x in {1,2,3,4,5,6,7,8,9}

## Library learning


In this section we formalize abstraction learning as the acquiring of *program abstractions*. Programs provide a natural analog of abstraction-- program fragments, which are substrings of programs expressed in our *base DSL*.

By augmenting the base DSL with additional program fragments, we create new libraries that can express individual block towers more efficiently (i.e. with fewer tokens). This benefit comes at the cost of storing the new abstractions in memory.

Here we aim to capture the change in DSL across trials, as people are exposed to more and more scenes (programs) that share subunits (program fragments). For each participants, we use run the Abstraction phase from [Dreamcoder](https://arxiv.org/abs/2006.08381) to propose new fragments after every trial.




In [None]:
# learned libraries


## Program generation

Now we have a library of abstractions available to each participant at each trial. Next we need to know how people use these abstractions to represent each scene. In other words, what is the program, expressed in the current DSL, that most efficiently expresses the block tower the participant is looking at?

How people search for programs using their abstraction libraries is an interesting question in itself. Here, however, our focus is on a different question-- how do people choose between more or less efficient ways of expressing a concept, given uncertainty about how 

It's therefore most important for us to find the single most efficient program that represents each tower. Fortunately for us, Dreamcoder libraries uniquely determine this most efficient programs. Programs involving larger abstractions can be found through enumeration. For less compressed programs, enumeration takes too long. However, we can simply swap in the learned abstractions from our DSLs through text-matching.

