## Sampling the space of pure data 



One easy way to produce sample data is with **rep** which repeats it's arguments once for each input atom...

In [1]:
rep a : 1 2 3 4 
rep a b : 1 2 3 4 
rep (:) : 1 2 3 4 

a a a a
a b a b a b a b
◎ ◎ ◎ ◎

In most sampling situations, we'll want some specified collection d1, d2, d3,.... where d1, d2, d3 are each data.  To produce a sequence like that, you don't want to produce d1 d2 d3 because concatenation destroys the sequence.  Instead, an easy solution is to produce a sequence 

* (:d1) (:d2) (:d3)

where each data in the collection has it's own container.  We'll do a bit of this by hand...

In [2]:
(put : x y z ) (put : a b c )
repn a : 5

(:x y z) (:a b c)
(repn a:5)

In [3]:
def repn : {rep A : first B : nat : 0} 
repn a : 5

(repn a:5)

In [4]:
ap {put : repn a:B} : first 5 : nat : 0 

◎ (:a) (:a a) (:a a a) (:a a a a)

For convenience, **sample.odd** and **sample.even** produces even and odd sequences of the simplest atom (:) ("Hydrogen"), packaged as above...

In [5]:
sample.odd : 4 
sample.even : 4

(:◎) (:◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎ ◎ ◎)
◎ (:◎ ◎) (:◎ ◎ ◎ ◎) (:◎ ◎ ◎ ◎ ◎ ◎)

The **sample.pure** operator lets one sample the whole space of pure data via 

* sample.pure : width depth 

where **width** is the maximum data length and **depth** is the maximum depth.  For example...

In [6]:
#
#  sample.pure : <width> <depth> 
#
sample.pure : 2 2 

◎ (:◎) (:(:◎)) (:(:◎ ◎)) (:𝟬) (:𝝞) (:◎◎) (:) (:◎) (:◎◎) (:◎ ◎) (:◎ (:◎)) (:◎ (:◎ ◎)) (:◎ 𝟬) (:◎ 𝝞) (:◎ ◎◎) (:◎ ) (:◎ ◎) (:◎ ◎◎) (:(:◎) ◎) (:(:◎) (:◎)) (:(:◎) (:◎ ◎)) (:(:◎) 𝟬) (:(:◎) 𝝞) (:(:◎) ◎◎) (:(:◎) ) (:(:◎) ◎) (:(:◎) ◎◎) (:(:◎ ◎) ◎) (:(:◎ ◎) (:◎)) (:(:◎ ◎) (:◎ ◎)) (:(:◎ ◎) 𝟬) (:(:◎ ◎) 𝝞) (:(:◎ ◎) ◎◎) (:(:◎ ◎) ) (:(:◎ ◎) ◎) (:(:◎ ◎) ◎◎) (:𝟬 ◎) (:𝟬 (:◎)) (:𝟬 (:◎ ◎)) (:𝟬 𝟬) (:𝟬 𝝞) (:𝟬 ◎◎) (:𝟬 ) (:𝟬 ◎) (:𝟬 ◎◎) (:𝝞 ◎) (:𝝞 (:◎)) (:𝝞 (:◎ ◎)) (:𝝞 𝟬) (:𝝞 𝝞) (:𝝞 ◎◎) (:𝝞 ) (:𝝞 ◎) (:𝝞 ◎◎) (:◎◎ ◎) (:◎◎ (:◎)) (:◎◎ (:◎ ◎)) (:◎◎ 𝟬) (:◎◎ 𝝞) (:◎◎ ◎◎) (:◎◎ ) (:◎◎ ◎) (:◎◎ ◎◎) (: ◎) (: (:◎)) (: (:◎ ◎)) (: 𝟬) (: 𝝞) (: ◎◎) (: ) (: ◎) (: ◎◎) (:◎ ◎) (:◎ (:◎)) (:◎ (:◎ ◎)) (:◎ 𝟬) (:◎ 𝝞) (:◎ ◎◎) (:◎ ) (:◎ ◎) (:◎ ◎◎) (:◎◎ ◎) (:◎◎ (:◎)) (:◎◎ (:◎ ◎)) (:◎◎ 𝟬) (:◎◎ 𝝞) (:◎◎ ◎◎) (:◎◎ ) (:◎◎ ◎) (:◎◎ ◎◎)

The direct approach is rarely useful because the universe of pure data grows enormously rapidly with **width** and **depth**.  So much so that

* sample.pure : 2 3 

is already challenging for a laptop.  

### Size of sample.pure versus width and depth:

|         | width 1  | width 2 | width 3 | width 4  | width 5 |
|---------|---------:|--------:|--------:|---------:|--------:|
|depth 1  |    2     |    3    |    4    |     5    |    6    |
|depth 2  |    5     |     91   |    4369   |   406,901    |    62,193,781    |
|depth 3  |    26     |     68,583,243   |    ?    |     ?    |    ?    |
|depth 4  |    677     |     ?   |    ?    |     ?    |    ?    |
|depth 5  |    458,330     |     ?   |    ?    |     ?    |    ?    |

### Practical searching 

Although **sample.pure** samples all pure data and, therefore, all mathematical objects, these samples are typically far too huge to be practically helpful.  Instead, the operator **sample.data** is much more practical.  This operation works by providing a sequence of codas and a width.

* sample.data <...sequence of codas..> : width 

A few examples will illustrate. 

In [7]:
#
#   Just one argument coda "a" and width <= 5 gives 
#
sample.data a : 5 

(:a a) ◎ (:a a a) (:a) (:a a a a)

In [8]:
#
#   Note that the sample is delivered, as usual, as (:data1) (:data2) ... 
#
#   If you provide two codas, a and b, all permutations are included. 
#
sample.data a b : 5

(:a a a b) (:b a b) (:b b b) (:b b a a) (:a a b a) (:b b) (:b a a) (:a a a) (:a b a) (:a b b a) (:b b a) (:a b a b) (:b a b b) (:b a a a) (:a a b b) (:a) (:a b b b) (:a a b) (:a a a a) (:a b a a) (:b a) (:b) (:b a a b) (:b b b a) (:a b) (:b a b a) (:b b b b) (:b b a b) (:a b b) ◎ (:a a)

In [9]:
#
#   Including the coda <{$}> generates source code intermixed with permutations of the other coda
#   that you prescribe. 
#
sample.data <A> <B> <{$}> a : 3

(:B {B : A}) (:{a : B} A) (:a {A B}) (:A {a : B}) (:A {B}) (:A {B A}) (:A {B : a}) (:A a) (:{B A} A) (:a) (:{B a} B) (:a {a : B}) ◎ (:{B : A} B) (:{A : B} a) (:a {B : a}) (:B a) (:{B A} a) (:{A B} B) (:{A : B} A) (:{B : a} B) (:a {B}) (:{A B}) (:a {a B}) (:a B) (:A B) (:A {A : B}) (:{a B} B) (:a A) (:{a : B}) (:a a) (:{B} A) (:{a : B} B) (:A {B : A}) (:{B : a}) (:{B a} A) (:{B}) (:B B) (:B {a B}) (:{A B} A) (:B {B a}) (:{B : a} A) (:B {A : B}) (:A) (:{B A}) (:B {B : a}) (:{B : a} a) (:A A) (:A {B a}) (:{B : A} a) (:{B} a) (:{a : B} a) (:a {A : B}) (:a {B : A}) (:a {B A}) (:{B : A} A) (:{B : A}) (:{B a} a) (:B {a : B}) (:B A) (:a {B a}) (:{A : B} B) (:{A B} a) (:B {A B}) (:{a B} a) (:{B} B) (:{A : B}) (:A {A B}) (:{B A} B) (:{B a}) (:{a B}) (:A {a B}) (:B {B}) (:B) (:{a B} A) (:B {B A})

In [10]:
#
#   One can add some chosen number of "variables" 
#
sample.data <A> <B> <{$}> x? y? (x:) : 2 

(:(y:)) (:{B : (y:)}) (:{B A}) (:{B (x:)}) (:{(y:) : B}) (:(x:)) (:{A : B}) (:{(y:) B}) (:{B : (x:)}) (:{B (x:)}) (:{A B}) (:{B : A}) ◎ (:{B}) (:{B (y:)}) (:{(x:) : B}) (:{(x:) B}) (:{(x:) : B}) (:B) (:{B : (x:)}) (:(x:)) (:{(x:) B}) (:A)

In [11]:
#
#   To meaningfully search, one should add domains representing established definitions.  To do this, first see 
#   all currently available definitions like so...
#
defs:

◎ 𝟬 𝝞 language pass null bin put get domain left right if nif * readpath dir terminal help source demo info module defs nat code_sort ints int_sum int_prod int_sort int_min int_max int_inv int_div nats floats float_sum float_prod float_sort float_min float_max float_inv float_div rev first tail last rep nth1 once count bool not = equal or and nor xor nand xnor iff imply some home start startcontext homecontext setDefaultDepth getDefaultDepth step eval def ap aq by ax unicode wrap startswith endswith join split coda codx codes digits letters printable alphabet allcodes pure replace rreplace with undefined theorem sample.even sample.odd sample.pure sample.window sample.data nth pre post in import arg let while kernel ker mor hasnt has isnt is sample.letter sample.variable sample.domain sample.number sample.module repn

In [12]:
#
#    Each definition is in a "module" which is either a python file (.py) or a coda file (.co). 
#
module : defs:

   Language Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic IO IO IO Help Help Help Help Help Help Number Number Number Number Number Number Number Number Number Number Number Number Number Number Number Number Number Number Number Sequence Sequence Sequence Sequence Sequence Sequence Sequence Sequence Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Import Import Import Import Evaluate Evaluate Evaluate Evaluate Define Apply Apply Apply Apply Code Code Code Code Code Code Compile Compile Text Text Text Text Text Text Text Variable Variable Variable Variable Theorem Sample Sample Sample Sample Sample Sequence Sequence Sequence Logic Import Define  Apply Apply Apply Apply Basic Basic Basic Basic   Sample   

In [13]:
#
#    Typically one would want some but not all definitions for a search.  For instance, one typically doesn't want to search 
#    over IO operations or Sample operations (to avoid recursion) or over help system operations.  A typical choice is...
#
defs : Apply Basic Logic Number Sequence  

pass null bin put get domain left right if nif * nat code_sort ints int_sum int_prod int_sort int_min int_max int_inv int_div nats floats float_sum float_prod float_sort float_min float_max float_inv float_div rev first tail last rep nth1 once count bool not = equal or and nor xor nand xnor iff imply some ap aq by ax nth pre post in while kernel ker mor hasnt has isnt is

In [14]:
#
#    To create a typical sample, we start with A, B and {$} for language purposes, add a few "variables" (x? y? z?) and 
#    mix in the above definitions. 
#    
let sample : sample.data <A> <B> <{$}> (x? y? z?) (defs:Apply Basic Logic Number Sequence) : 2 
sample?

(:once) (:{(x:) : B}) (:{B : A}) (:{B aq}) (:{B =}) (:{floats B}) (:{tail : B}) (:{B int_min}) (:{B float_inv}) (:{float_sort : B}) (:float_div) (:{once B}) (:{B nth}) (:{B : kernel}) (:{bool : B}) (:{B nat}) (:{B nif}) (:{B : left}) (:{imply : B}) (:{hasnt B}) (:{B float_sum}) (:{B : equal}) (:count) (:{B nth1}) (:{is : B}) (:{(y:) B}) (:*) (:or) (:{int_sum B}) (:{ax B}) (:{pass : B}) (:nor) (:{equal : B}) (:{float_div : B}) (:{hasnt : B}) (:{code_sort : B}) (:{B xnor}) (:right) (:{B nats}) (:aq) (:{xor : B}) (:{B get}) (:{B ap}) (:{ax : B}) (:{B : nat}) (:not) (:{B : some}) (:{bool B}) (:{B : has}) (:{bin B}) (:{first B}) (:equal) (:{* B}) (:{int_max : B}) (:{nth B}) (:{B : tail}) (:{bin : B}) (:{float_inv : B}) (:{B : float_sum}) (:pre) (:{int_div : B}) (:nth1) (:{int_prod B}) (:{B code_sort}) (:{(z:) B}) (:{rep : B}) (:int_sort) (:{B is}) (:while) (:{B pre}) (:{B : not}) (:{domain B}) (:{B : while}) (:{int_sort : B}) (:hasnt) (:{B while}) (:{B some}) (:{B : aq}) (:{nth : B}) (:int_

In [15]:
#
#   sample? is then a collection of data that can be used, for instance, to test for theorems or search for 
#   algebraic structures of interest, as shown in study tutorials. 
#
count : sample? 
undefined : sample?

358
(x:) (y:) (z:)

For larger scale searching, one would typically increase from width=2 and search in parallel and, perhaps add more variables.

This can be useful for theorem testing and for searching for algebraic structures of interest as illustrated in the theorem tutorial and in various study notebooks. 