## Sampling the space of pure data 



One easy way to produce sample data is with **ap co** which repeats it's arguments once for each input atom...

In [5]:
ap co a : 1 2 3 4 
ap co a b : 1 2 3 4 
ap co (:) : 1 2 3 4 

a a a a
a b a b a b a b
◎ ◎ ◎ ◎

In most sampling situations, we'll want some specified collection d1, d2, d3,.... where d1, d2, d3 are each data.  To produce a sequence like that, you don't want to produce d1 d2 d3 because concatenation destroys the sequence.  Instead, an easy solution is to produce a sequence 

* (:d1) (:d2) (:d3)

where each data in the collection has it's own container.  We'll do a bit of this by hand...

In [6]:
(put : x y z ) (put : a b c )

(:x y z) (:a b c)

In [7]:
Def repn : {rep A : first B : nat : 0} 



In [8]:
repn a : 5 

0 1 2 3 4

In [9]:
ap {put : repn a:B} : first 5 : nat : 0 

◎ (:0) (:0 1) (:0 1 2) (:0 1 2 3)

For convenience, **sample.odd** and **sample.even** produces even and odd sequences of the simplest atom (:) ("Hydrogen"), packaged as above...

In [10]:
sample.odd : 4 
sample.even : 4

(:◎) (:◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎ ◎ ◎)
◎ (:◎ ◎) (:◎ ◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎ ◎)

The **sample.pure** operator lets one sample the whole space of pure data via 

* sample.pure : width depth 

where **width** is the maximum data length and **depth** is the maximum depth.  For example...

In [11]:
#
#  sample.pure : <width> <depth> 
#
sample.pure : 2 2 

◎ (:◎) (:(:◎)) (:(:◎ ◎)) (:𝟬) (:𝝞) (:◎◎) (:) (:◎) (:◎◎) (:◎ ◎) (:◎ (:◎)) (:◎ (:◎ ◎)) (:◎ 𝟬) (:◎ 𝝞) (:◎ ◎◎) (:◎ ) (:◎ ◎) (:◎ ◎◎) (:(:◎) ◎) (:(:◎) (:◎)) (:(:◎) (:◎ ◎)) (:(:◎) 𝟬) (:(:◎) 𝝞) (:(:◎) ◎◎) (:(:◎) ) (:(:◎) ◎) (:(:◎) ◎◎) (:(:◎ ◎) ◎) (:(:◎ ◎) (:◎)) (:(:◎ ◎) (:◎ ◎)) (:(:◎ ◎) 𝟬) (:(:◎ ◎) 𝝞) (:(:◎ ◎) ◎◎) (:(:◎ ◎) ) (:(:◎ ◎) ◎) (:(:◎ ◎) ◎◎) (:𝟬 ◎) (:𝟬 (:◎)) (:𝟬 (:◎ ◎)) (:𝟬 𝟬) (:𝟬 𝝞) (:𝟬 ◎◎) (:𝟬 ) (:𝟬 ◎) (:𝟬 ◎◎) (:𝝞 ◎) (:𝝞 (:◎)) (:𝝞 (:◎ ◎)) (:𝝞 𝟬) (:𝝞 𝝞) (:𝝞 ◎◎) (:𝝞 ) (:𝝞 ◎) (:𝝞 ◎◎) (:◎◎ ◎) (:◎◎ (:◎)) (:◎◎ (:◎ ◎)) (:◎◎ 𝟬) (:◎◎ 𝝞) (:◎◎ ◎◎) (:◎◎ ) (:◎◎ ◎) (:◎◎ ◎◎) (: ◎) (: (:◎)) (: (:◎ ◎)) (: 𝟬) (: 𝝞) (: ◎◎) (: ) (: ◎) (: ◎◎) (:◎ ◎) (:◎ (:◎)) (:◎ (:◎ ◎)) (:◎ 𝟬) (:◎ 𝝞) (:◎ ◎◎) (:◎ ) (:◎ ◎) (:◎ ◎◎) (:◎◎ ◎) (:◎◎ (:◎)) (:◎◎ (:◎ ◎)) (:◎◎ 𝟬) (:◎◎ 𝝞) (:◎◎ ◎◎) (:◎◎ ) (:◎◎ ◎) (:◎◎ ◎◎)

The direct approach is rarely useful because the universe of pure data grows enormously rapidly with **width** and **depth**.  So much so that

* sample.pure : 2 3 

is already challenging for a laptop.  

### Size of sample.pure versus width and depth:

|         | width 1  | width 2 | width 3 | width 4  | width 5 |
|---------|---------:|--------:|--------:|---------:|--------:|
|depth 1  |    2     |    3    |    4    |     5    |    6    |
|depth 2  |    5     |     91   |    4369   |   406,901    |    62,193,781    |
|depth 3  |    26     |     68,583,243   |    ?    |     ?    |    ?    |
|depth 4  |    677     |     ?   |    ?    |     ?    |    ?    |
|depth 5  |    458,330     |     ?   |    ?    |     ?    |    ?    |

Of course, the space of pure data includes all data, meaning all mathematical/logical/computational objects.  Undecided data ("variables"), for example, already appear in sample.pure : 2 2, which you can detect by, for example counting.

In [12]:
ap {count : get : B} : sample.pure : 2 2 

0 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

### Practical searching 

Although **sample.pure** samples all pure data and, therefore, all mathematical objects, these samples are typically far too huge to be practically helpful.  Instead, the operator **sample.data** is much more practical.  This operation works by providing a sequence of codas and a width.

* sample.data <...sequence of codas..> : width 

A few examples will illustrate. 

In [13]:
#
#   Just one argument coda "a" and width <= 5 gives 
#
sample.data a : 5 

(:a) (:a a) (:a a a a) (:a a a a a) (:a a a) ◎

In [14]:
#
#   Note that the sample is delivered, as usual, as (:data1) (:data2) ... 
#
#   If you provide two codas, a and b, all permutations are included. 
#
sample.data a b : 5

(:a a b a b) (:a a a b a) (:a a a) (:b b b b a) (:b b b a a) (:b a b a) (:b a a) (:a a b a a) (:b a a a a) (:a b b a b) (:b a b b b) (:a a b a) (:a a a b) (:b a b b) (:b a a b a) (:b a b) (:a b b b a) (:a b a b) (:b a a b) (:b a b a b) ◎ (:a b b b b) (:b a) (:a b a b b) (:b b a a a) (:b a b a a) (:a a a a) (:a b a) (:a a a b b) (:b b b a) (:b b a b) (:b a a b b) (:a a a a a) (:b b) (:a b b) (:a b) (:b a b b a) (:a a b b a) (:b b b) (:a a a a b) (:a a b) (:b b a a b) (:a b b a a) (:b a a a b) (:a b a a) (:a b a b a) (:a b a a b) (:b b a b a) (:b b b b b) (:a a b b b) (:a a) (:b b a b b) (:b b b a b) (:b b a a) (:a b b b) (:a a b b) (:a) (:a b b a) (:b a a a) (:b) (:a b a a a) (:b b a) (:b b b b)

In [15]:
#
#   You can add language literals <A> and <B> and a special symbol <{$}> to create language expressions.
#
sample.data <A> <B> <{$}> a b (defs:Basic) : 2 

(:b get) (:right b) (:{B : co} has) (:nif nif) (:has {B : pass}) (:is {B if}) (:bin {isnt : B}) (:{a : B} domain) (:domain {B isnt}) (:{B get} nif) (:if {right : B}) (:{B : isnt} right) (:{B plus} A) (:{B prod}) (:star get) (:{if B} A) (:star if) (:B {B : pass}) (:bin isnt) (:co {domain : B}) (:{B : isnt} a) (:has {A B}) (:A bin) (:co {B hasnt}) (:{isnt B} hasnt) (:{b B} arg) (:{right : B} co) (:put {co : B}) (:isnt nif) (:has {sum : B}) (:sum hasnt) (:{get : B} get) (:{prod B}) (:has {a : B}) (:{B : pass} isnt) (:{B star} star) (:{B : hasnt} bin) (:{B sum} get) (:has {B : isnt}) (:get null) (:{null : B} bin) (:A {get B}) (:isnt {has : B}) (:pass {hasnt B}) (:co {is : B}) (:{isnt B}) (:{B A} isnt) (:{a B}) (:{get : B} if) (:b {B : is}) (:{b B} plus) (:{domain : B} b) (:{left : B} nif) (:prod {has B}) (:prod bin) (:{co : B} arg) (:{isnt B} isnt) (:domain if) (:get {B pass}) (:{B : has} prod) (:nif {b : B}) (:{B domain} is) (:a {sum B}) (:if arg) (:{B b} bin) (:{prod : B} has) (:{is B} a

In [16]:
#
#   To meaningfully search, one should add domains representing established definitions.  To do this, first see 
#   all currently available definitions like so...
#
defs:

= Def Let allcodes alphabet ap aq ar arg as assign bin bool by cases co coda codes collect count def defs demo digits dir domain down down1 endswith equal equiv eval eval1 first float_div float_inv float_max float_min float_prod float_sort float_sum floats get has hasnt help home if import in info int_div int_inv int_max int_min int_prod int_sort int_sum ints is isnt join ker kernel language last left let letters localdef log log+ log- logs module multi nat nat_max nat_min nat_prod nat_sort nat_sum nats nchar nif not nth nth1 null once out pass pause permutation plus post pre printable prod pure put readpath rep repn rev ri right sample.atom sample.data sample.even sample.odd sample.pure sample.window skip some source sources split star startswith step sum tail term text_sort theorem up up1 use use1 while with wrap ◎ 𝝞 𝟬

In [17]:
#
#    Each definition is in a "module" which is either a python file (.py) or a coda file (.co). 
#
once : module : defs:

Time Apply Text Sequence Search Source Define Number Import Log Logic Basic IO Help Evaluate Collect  Path Variable Sample Language

In [18]:
#
#    Typically one would want some but not all definitions for a search.  For instance, one typically doesn't want to search 
#    over IO operations or Sample operations (to avoid recursion) or over help system operations.  A typical choice is...
#
defs : Apply Basic Logic Number Sequence  

= ap aq ar arg as bin bool by co count domain equal first float_div float_inv float_max float_min float_prod float_sort float_sum floats get has hasnt if int_div int_inv int_max int_min int_prod int_sort int_sum ints is isnt ker kernel last left nat nat_max nat_min nat_prod nat_sort nat_sum nats nif not nth nth1 null once pass plus post pre prod put rep rev ri right skip some star sum tail text_sort while

In [19]:
sample.data (defs:Basic) : 2

(:prod arg) (:prod bin) (:get if) (:nif co) (:has) (:star right) (:hasnt null) (:null sum) (:left) (:co get) (:null pass) (:get arg) (:co bin) (:get pass) (:left co) (:right hasnt) (:has pass) (:left right) (:prod if) (:has right) (:pass co) (:null has) (:left get) (:put isnt) (:plus pass) (:nif if) (:domain arg) (:get put) (:sum get) (:put put) (:co plus) (:bin co) (:co sum) (:bin put) (:bin left) (:sum domain) (:is) (:is co) (:hasnt bin) (:co pass) (:nif hasnt) (:isnt nif) (:null put) (:right left) (:isnt right) (:prod prod) (:if if) (:sum pass) (:is right) (:left put) (:right) (:right isnt) (:nif put) (:get right) (:hasnt pass) (:plus star) (:star sum) (:isnt star) (:bin) (:arg pass) (:star is) (:plus prod) (:pass right) (:sum bin) (:nif get) (:has hasnt) (:null hasnt) (:bin if) (:is left) (:put pass) (:arg co) (:pass star) (:domain co) (:star isnt) (:isnt if) (:left nif) (:domain get) (:sum arg) (:null prod) (:is sum) (:isnt) (:is isnt) (:right co) (:domain hasnt) (:plus left) (:le

In [20]:
count : defs : Sequence

13

In [21]:
sample.data (defs:Sequence) : 2

(:rep rev) (:tail nth1) (:first once) (:by pre) (:last skip) (:rep rep) (:rep by) (:once tail) (:pre nth) (:by nth1) (:post once) (:last post) (:rev count) (:once first) (:nth1 once) ◎ (:nth1 count) (:by count) (:pre rev) (:skip rep) (:pre post) (:nth1 last) (:tail) (:skip nth) (:post post) (:pre first) (:once by) (:first count) (:last) (:rep tail) (:nth count) (:pre once) (:post nth) (:skip post) (:first nth1) (:rev pre) (:skip first) (:post) (:tail post) (:first tail) (:tail once) (:nth rep) (:tail nth) (:rep once) (:nth1 post) (:rev post) (:nth pre) (:first last) (:nth nth) (:rev first) (:rep nth) (:nth nth1) (:pre pre) (:by nth) (:first by) (:post rep) (:by by) (:count pre) (:tail rep) (:tail pre) (:last count) (:skip count) (:count rep) (:rev) (:post count) (:nth tail) (:once pre) (:rev nth) (:nth1 nth) (:once skip) (:tail skip) (:by tail) (:post nth1) (:by post) (:rev rev) (:last first) (:skip nth1) (:skip pre) (:skip once) (:nth skip) (:by last) (:pre tail) (:pre rep) (:nth1 ski

In [22]:
defs:Sequence

by count first last nth nth1 once post pre rep rev skip tail

In [23]:
sample.data rev first tail last skip rep nth1 once count nth pre post : 2

(:once first) (:count count) (:count nth1) (:post pre) (:rev tail) (:pre first) (:rep count) (:first rev) (:tail count) (:rep nth1) (:rep once) (:post first) (:count rep) (:tail pre) (:skip post) (:nth once) (:once post) (:post count) (:first post) (:first rep) (:tail once) (:first nth) (:first count) (:nth) (:tail first) (:first pre) (:count first) (:rev) (:once skip) (:tail last) (:last skip) (:nth nth) (:rep skip) (:pre once) (:last rep) (:nth pre) (:once nth1) (:tail rep) (:rep post) (:post rev) (:nth nth1) (:post skip) (:first first) (:first last) (:post nth) (:rep first) (:tail post) (:rep nth) (:nth1 pre) (:once last) (:nth count) (:last last) (:once nth) (:rev rep) (:nth1 last) (:nth1 nth1) (:last pre) (:nth1 count) (:rev count) (:post) (:pre pre) (:nth1 nth) (:post tail) (:count rev) (:nth1 tail) (:last count) (:count skip) (:rev nth) (:rev once) (:post rep) (:first nth1) (:rev pre) (:pre rep) (:nth first) (:skip count) (:once rep) (:nth1) (:nth rep) (:last tail) ◎ (:skip skip

In [24]:
#
#    To create a typical sample, we start with A, B and {$} for language purposes, add a few "variables" (x? y? z?) and 
#    mix in the above definitions. 
#    
sample.data <A> <B> x? y? <{$}> (defs:Apply Basic) : 2 

(:get {ar B}) (:ker {B get}) (:{B domain} ker) (:nif sum) (:{B ar} while) (:put {B star}) (:while star) (:{pass B} prod) (:plus is) (:{B ar}) (:pass {star B}) (:{null B} prod) (:if prod) (:null as) (:bin {domain B}) (:nif pass) (:{B kernel} co) (:{B kernel} kernel) (:if {B : is}) (:{hasnt : B} (x:)) (:{B : pass} isnt) (:ar {right : B}) (:B {B : arg}) (:has ri) (:(x:) {if B}) (:{B : kernel} ap) (:as {B : star}) (:(x:) null) (:left {B : right}) (:A {get B}) (:star {has : B}) (:{B hasnt} kernel) (:has {B put}) (:left {ri : B}) (:{B : left} ap) (:sum {is B}) (:(y:) {put : B}) (:(y:) {B : plus}) (:{B : domain} get) (:{B is} star) (:co {co B}) (:{B right} ker) (:bin ker) (:prod {ap B}) (:ap {B : A}) (:pass {B}) (:star {B domain}) (:plus {A B}) (:sum {B if}) (:{as B} hasnt) (:{if : B} while) (:{if B}) (:{get B} plus) (:ker {ap B}) (:while {B : domain}) (:sum {B : co}) (:plus {sum B}) (:{B ar} co) (:nif {B : ker}) (:pass while) (:plus {while B}) (:{B kernel} has) (:get A) (:hasnt {B : as}) (:c

In [25]:
Let Sample : sample.data <A> <B> x? y? <{$}>  : 2 



In [26]:
#
#   sample? is then a collection of data that can be used, for instance, to test for theorems or search for 
#   algebraic structures of interest, as shown in study tutorials. 
#
count : Sample? 

138

For larger scale searching, one would typically increase from width=2 and search in parallel and, perhaps add more variables.

This can be useful for theorem testing and for searching for algebraic structures of interest as illustrated in the theorem tutorial and in various study notebooks. 