## Sampling the space of pure data 



One easy way to produce sample data is with **rep** which repeats it's arguments once for each input atom...

In [1]:
rep a : 1 2 3 4 
rep a b : 1 2 3 4 
rep (:) : 1 2 3 4 

1 2 3 4
1 2 3 4
1 2 3 4

In most sampling situations, we'll want some specified collection d1, d2, d3,.... where d1, d2, d3 are each data.  To produce a sequence like that, you don't want to produce d1 d2 d3 because concatenation destroys the sequence.  Instead, an easy solution is to produce a sequence 

* (:d1) (:d2) (:d3)

where each data in the collection has it's own container.  We'll do a bit of this by hand...

In [2]:
(put : x y z ) (put : a b c )

(:x y z) (:a b c)

In [3]:
Def repn : {rep A : first B : nat : 0} 



In [4]:
repn a : 5 

0 1 2 3 4

In [5]:
ap {put : repn a:B} : first 5 : nat : 0 

◎ (:0) (:0 1) (:0 1 2) (:0 1 2 3)

For convenience, **sample.odd** and **sample.even** produces even and odd sequences of the simplest atom (:) ("Hydrogen"), packaged as above...

In [6]:
sample.odd : 4 
sample.even : 4

(:◎) (:◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎ ◎ ◎)
◎ (:◎ ◎) (:◎ ◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎ ◎)

The **sample.pure** operator lets one sample the whole space of pure data via 

* sample.pure : width depth 

where **width** is the maximum data length and **depth** is the maximum depth.  For example...

In [7]:
#
#  sample.pure : <width> <depth> 
#
sample.pure : 2 2 

◎ (:◎) (:(:◎)) (:(:◎ ◎)) (:𝟬) (:𝝞) (:◎◎) (:) (:◎) (:◎◎) (:◎ ◎) (:◎ (:◎)) (:◎ (:◎ ◎)) (:◎ 𝟬) (:◎ 𝝞) (:◎ ◎◎) (:◎ ) (:◎ ◎) (:◎ ◎◎) (:(:◎) ◎) (:(:◎) (:◎)) (:(:◎) (:◎ ◎)) (:(:◎) 𝟬) (:(:◎) 𝝞) (:(:◎) ◎◎) (:(:◎) ) (:(:◎) ◎) (:(:◎) ◎◎) (:(:◎ ◎) ◎) (:(:◎ ◎) (:◎)) (:(:◎ ◎) (:◎ ◎)) (:(:◎ ◎) 𝟬) (:(:◎ ◎) 𝝞) (:(:◎ ◎) ◎◎) (:(:◎ ◎) ) (:(:◎ ◎) ◎) (:(:◎ ◎) ◎◎) (:𝟬 ◎) (:𝟬 (:◎)) (:𝟬 (:◎ ◎)) (:𝟬 𝟬) (:𝟬 𝝞) (:𝟬 ◎◎) (:𝟬 ) (:𝟬 ◎) (:𝟬 ◎◎) (:𝝞 ◎) (:𝝞 (:◎)) (:𝝞 (:◎ ◎)) (:𝝞 𝟬) (:𝝞 𝝞) (:𝝞 ◎◎) (:𝝞 ) (:𝝞 ◎) (:𝝞 ◎◎) (:◎◎ ◎) (:◎◎ (:◎)) (:◎◎ (:◎ ◎)) (:◎◎ 𝟬) (:◎◎ 𝝞) (:◎◎ ◎◎) (:◎◎ ) (:◎◎ ◎) (:◎◎ ◎◎) (: ◎) (: (:◎)) (: (:◎ ◎)) (: 𝟬) (: 𝝞) (: ◎◎) (: ) (: ◎) (: ◎◎) (:◎ ◎) (:◎ (:◎)) (:◎ (:◎ ◎)) (:◎ 𝟬) (:◎ 𝝞) (:◎ ◎◎) (:◎ ) (:◎ ◎) (:◎ ◎◎) (:◎◎ ◎) (:◎◎ (:◎)) (:◎◎ (:◎ ◎)) (:◎◎ 𝟬) (:◎◎ 𝝞) (:◎◎ ◎◎) (:◎◎ ) (:◎◎ ◎) (:◎◎ ◎◎)

The direct approach is rarely useful because the universe of pure data grows enormously rapidly with **width** and **depth**.  So much so that

* sample.pure : 2 3 

is already challenging for a laptop.  

### Size of sample.pure versus width and depth:

|         | width 1  | width 2 | width 3 | width 4  | width 5 |
|---------|---------:|--------:|--------:|---------:|--------:|
|depth 1  |    2     |    3    |    4    |     5    |    6    |
|depth 2  |    5     |     91   |    4369   |   406,901    |    62,193,781    |
|depth 3  |    26     |     68,583,243   |    ?    |     ?    |    ?    |
|depth 4  |    677     |     ?   |    ?    |     ?    |    ?    |
|depth 5  |    458,330     |     ?   |    ?    |     ?    |    ?    |

Of course, the space of pure data includes all data, meaning all mathematical/logical/computational objects.  Undecided data ("variables"), for example, already appear in sample.pure : 2 2, which you can detect by, for example counting.

In [8]:
ap {count : get : B} : sample.pure : 2 2 

0 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

### Practical searching 

Although **sample.pure** samples all pure data and, therefore, all mathematical objects, these samples are typically far too huge to be practically helpful.  Instead, the operator **sample.data** is much more practical.  This operation works by providing a sequence of codas and a width.

* sample.data <...sequence of codas..> : width 

A few examples will illustrate. 

In [9]:
#
#   Just one argument coda "a" and width <= 5 gives 
#
sample.data a : 5 

◎ (:a a a a a) (:a) (:a a) (:a a a) (:a a a a)

In [10]:
#
#   Note that the sample is delivered, as usual, as (:data1) (:data2) ... 
#
#   If you provide two codas, a and b, all permutations are included. 
#
sample.data a b : 5

(:b b a b) (:b a a a) (:a a a a) (:b) (:b a a b) (:a b a) (:b b b b b) (:a a a b a) (:a a b b b) (:a b b b a) (:b b a b b) (:a b) (:a) (:a b a b) (:b b a a) (:b a b a a) (:a a b a a) (:b b a b a) (:a b a a b) (:a a) (:a b a b b) (:b b b a b) (:b a b) (:b a b a) (:a a a b) (:b a) (:a a b b a) (:b a b b b) (:a b a b a) (:a b b b b) (:a b b a) (:a a b a b) (:a a a b b) (:b a b b a) (:b a b a b) (:b b b a a) (:b a a) (:a b b) (:a a a) (:b a b b) ◎ (:b b b) (:a b b a b) (:b a a b b) (:b b a) (:a b a a a) (:b b a a b) (:b b b b) (:b b a a a) (:b a a a a) (:b a a b a) (:a a b a) (:a a b) (:a b b a a) (:a a a a a) (:b b) (:a a b b) (:a b a a) (:b b b a) (:b a a a b) (:a a a a b) (:b b b b a) (:a b b b)

In [11]:
#
#   You can add language literals <A> and <B> and a special symbol <{$}> to create language expressions.
#
sample.data <A> <B> <{$}> a b (defs:Basic) : 2 

(:isnt {put : B}) (:hasnt sum) (:{bin : B} domain) (:domain {B : right}) (:A {has : B}) (:{has : B} if) (:{hasnt : B} plus) (:{B : a} B) (:{has B} A) (:B {B domain}) (:{B : null} star) (:{prod : B} pass) (:star {isnt : B}) (:{is : B} right) (:{B : A} bin) (:null {B : put}) (:{B : if} null) (:nif left) (:if {B}) (:b sum) (:pass {B : is}) (:{B plus} if) (:pass {sum : B}) (:plus hasnt) (:{B : arg}) (:a {arg : B}) (:right {get B}) (:{B : bin} is) (:if {right : B}) (:bin {sum B}) (:nif {B get}) (:{null B} get) (:put {B}) (:{B : hasnt} if) (:{B nif} star) (:{B put} put) (:has {B : sum}) (:{B : hasnt} sum) (:{B arg} hasnt) (:co {right B}) (:is sum) (:b {B : A}) (:star {nif B}) (:co if) (:B {isnt : B}) (:{B : right} has) (:{hasnt : B} arg) (:{B : hasnt} bin) (:{is : B}) (:isnt null) (:prod {B if}) (:{B domain} null) (:if {B : bin}) (:bin {has : B}) (:sum {if : B}) (:A {B if}) (:{B plus} has) (:arg domain) (:sum {B : has}) (:B {put B}) (:{left B} pass) (:{pass : B} arg) (:right {if : B}) (:arg 

In [12]:
#
#   To meaningfully search, one should add domains representing established definitions.  To do this, first see 
#   all currently available definitions like so...
#
defs:

= Def Let allcodes alphabet ap aq ar arg as assign bin bool by cases co coda codes collect count def defs demo digits dir domain down down1 endswith equal equiv eval eval1 first float_div float_inv float_max float_min float_prod float_sort float_sum floats get has hasnt help home if import in info int_div int_inv int_max int_min int_prod int_sort int_sum ints is isnt join ker kernel language last left let letters localdef log log+ log- logs module multi nat nat_max nat_min nat_prod nat_sort nat_sum nats nchar nif not nth nth1 null once out pass pause permutation plus post pre printable prod pure put readpath rep repn rev ri right sample.atom sample.data sample.even sample.odd sample.pure sample.window skip some source sources split star startswith step sum tail term text_sort theorem up up1 use use1 while with wrap ◎ 𝝞 𝟬

In [13]:
#
#    Each definition is in a "module" which is either a python file (.py) or a coda file (.co). 
#
once : module : defs:

Time Apply Text Sequence Search Source Define Number Import Log Logic Basic IO Help Evaluate Collect  Path Variable Sample Language

In [14]:
#
#    Typically one would want some but not all definitions for a search.  For instance, one typically doesn't want to search 
#    over IO operations or Sample operations (to avoid recursion) or over help system operations.  A typical choice is...
#
defs : Apply Basic Logic Number Sequence  

= ap aq ar arg as bin bool by co count domain equal first float_div float_inv float_max float_min float_prod float_sort float_sum floats get has hasnt if int_div int_inv int_max int_min int_prod int_sort int_sum ints is isnt ker kernel last left nat nat_max nat_min nat_prod nat_sort nat_sum nats nif not nth nth1 null once pass plus post pre prod put rep rev ri right skip some star sum tail text_sort while

In [15]:
sample.data (defs:Basic) : 2

(:has hasnt) (:get is) (:co domain) (:left null) (:hasnt nif) (:star left) (:sum prod) (:sum hasnt) (:prod hasnt) (:sum arg) (:if put) (:co has) (:hasnt arg) (:co hasnt) (:hasnt left) (:arg nif) (:domain hasnt) (:if co) (:domain isnt) (:get plus) (:bin put) (:plus hasnt) (:co is) (:put put) (:nif if) (:plus co) (:arg put) (:nif null) (:get get) (:isnt put) (:put null) (:put isnt) (:domain nif) (:co null) (:domain sum) (:isnt hasnt) (:null) (:right get) (:arg) (:bin pass) (:left left) (:put) (:null nif) (:star nif) (:null pass) (:has put) (:put prod) (:is) (:star plus) (:put arg) (:has right) (:get) (:if if) (:if get) (:nif hasnt) (:if plus) (:left co) (:right arg) ◎ (:left arg) (:arg arg) (:put if) (:arg isnt) (:right left) (:if null) (:is star) (:co left) (:hasnt plus) (:is put) (:right hasnt) (:prod sum) (:if arg) (:plus nif) (:left put) (:prod bin) (:plus null) (:plus pass) (:bin has) (:isnt arg) (:domain pass) (:put get) (:put domain) (:right is) (:isnt bin) (:isnt plus) (:plus arg

In [16]:
count : defs : Sequence

13

In [17]:
sample.data (defs:Sequence) : 2

(:first rep) (:nth tail) (:pre last) (:rep post) (:by pre) (:tail count) (:nth1 pre) (:last skip) (:rep tail) (:nth1 rev) (:once count) (:first first) (:nth1 post) (:once pre) (:rev post) (:by once) (:once rev) (:nth1 once) (:count rep) (:count) (:tail post) (:once last) (:count rev) (:tail first) (:rev count) (:pre rev) (:first pre) (:last count) (:by rep) (:tail last) (:post nth1) (:once first) (:skip by) (:once) (:skip rep) (:nth1 last) (:pre rep) (:tail nth) (:once nth1) (:first rev) (:skip) (:pre count) (:skip rev) (:first by) (:post by) (:first skip) (:rep once) (:nth1 by) (:rev) (:once skip) (:nth) (:nth1 skip) (:last nth) (:nth1 count) (:count nth) (:pre skip) (:by nth1) (:pre nth1) (:count once) (:post once) (:skip once) (:first last) (:last post) (:count post) (:rep first) (:rep last) (:tail skip) (:rep count) (:first nth) (:post pre) (:by) (:nth1 tail) (:pre) (:first nth1) (:first) (:skip pre) (:nth nth1) (:skip post) (:nth once) (:skip nth1) (:rev first) (:post nth) (:count

In [18]:
defs:Sequence

by count first last nth nth1 once post pre rep rev skip tail

In [19]:
sample.data rev first tail last skip rep nth1 once count nth pre post : 2

(:first count) (:nth1 pre) (:rep pre) (:rev rep) (:count nth) (:once tail) (:tail nth1) (:rep rev) (:last first) (:nth rev) (:post nth1) (:post post) (:skip post) (:first nth) (:once first) (:nth1 nth1) (:pre skip) (:pre tail) (:skip) (:pre rep) (:nth1 rev) (:once last) (:once rep) (:nth1 last) (:first last) (:tail once) (:rev first) (:last pre) (:nth post) (:count skip) (:nth nth) (:nth1 post) (:last count) (:post once) (:last rep) (:once) (:skip rev) (:rev last) (:nth1 rep) (:tail rev) (:skip rep) (:post first) (:rep rep) (:rep count) (:once rev) (:once post) (:once pre) (:tail pre) (:last rev) (:count post) (:first pre) (:nth last) (:count once) (:first) (:post rep) (:count tail) (:nth1 nth) (:count last) (:nth1 count) (:tail rep) (:tail) (:once skip) (:rev nth1) (:first rep) (:post rev) (:once nth1) (:nth pre) (:skip first) (:post tail) (:first once) (:rev) (:nth first) (:tail first) (:rep nth1) (:last) (:skip nth1) (:rep post) (:first tail) (:post skip) (:skip last) (:rev nth) (:p

In [20]:
#
#    To create a typical sample, we start with A, B and {$} for language purposes, add a few "variables" (x? y? z?) and 
#    mix in the above definitions. 
#    
sample.data <A> <B> x? y? <{$}> (defs:Apply Basic) : 2 

(:if {is B}) (:bin {while : B}) (:has {is B}) (:put {B aq}) (:{B isnt} prod) (:prod {B domain}) (:arg {arg : B}) (:pass {domain : B}) (:{while : B} A) (:{B : aq} B) (:is {B : put}) (:{B null} (y:)) (:hasnt {isnt B}) (:{B is} co) (:{B : ker} bin) (:{B plus} ker) (:pass {ar : B}) (:kernel nif) (:kernel {B prod}) (:{has : B} nif) (:pass {kernel : B}) (:{aq : B} (x:)) (:{B prod} sum) (:{B co} left) (:{has : B} plus) (:domain {B}) (:is kernel) (:{B : prod} pass) (:kernel {B : ker}) (:{prod : B} if) (:{B as} while) (:{B : domain} arg) (:null isnt) (:plus {ap B}) (:{kernel : B} ar) (:{B hasnt} co) (:bin {B isnt}) (:star {B : (y:)}) (:put {sum B}) (:{B as} arg) (:{B : get} aq) (:right {ar : B}) (:co {B ri}) (:{hasnt : B} hasnt) (:pass {B if}) (:aq {B (y:)}) (:is put) (:{B (x:)} ker) (:A {left : B}) (:{if : B} sum) (:{B star} kernel) (:{B : is} ri) (:star is) (:isnt {ker : B}) (:has null) (:{B co} hasnt) (:{B : get} if) (:nif null) (:(y:) {B : as}) (:hasnt ri) (:aq star) (:(x:) {right B}) (:plu

In [21]:
Let Sample : sample.data <A> <B> x? y? <{$}>  : 2 



In [22]:
#
#   sample? is then a collection of data that can be used, for instance, to test for theorems or search for 
#   algebraic structures of interest, as shown in study tutorials. 
#
count : Sample? 

138

For larger scale searching, one would typically increase from width=2 and search in parallel and, perhaps add more variables.

This can be useful for theorem testing and for searching for algebraic structures of interest as illustrated in the theorem tutorial and in various study notebooks. 