-
-
Notifications
You must be signed in to change notification settings - Fork 15
/
scrollsets.scroll
292 lines (226 loc) · 9.71 KB
/
scrollsets.scroll
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
authors Breck Yunits
https://breckyunits.com Breck Yunits
buildConcepts planets.csv planets.json planets.tsv
buildMeasures planetMeasures.tsv
sortBy -Values
date 4/21/2024
tags All ScrollSets
title ScrollSets: source code for CSVs
header.scroll
keyboardNav
mediumColumns 1
printTitle
printAuthors
scrollsets.png
caption More examples of ScrollSets from sets.scroll.pub.
https://sets.scroll.pub/ sets.scroll.pub
The source code for this blog post contains a ScrollSet about the planets and generates this HTML file as well as a CSV, a TSV, and a JSON file. This page demonstrates *ScrollSets*.
dateline
link scrollsets.scroll source code for this blog post
link planets.csv CSV
link planets.tsv TSV
link planets.json JSON
ScrollSets are useful for small single day projects and large multi-year projects with thousands of concepts like PLDB (a Programming Language Database).
https://pldb.io PLDB
***
ScrollSets are normal plain text files written in Scroll that also contain measurements of concepts and output that data into formats ready for data visualization and analysis tools.
https://scroll.pub/ Scroll
match 1
ScrollSets are line oriented but represent a table(s). You might call them _deconstructed csvs_ or _deconstructed spreadsheets_.
- Use LLMs to *instantly generate ScrollSets* that are ready for human verification and improvement.
- Intermingle structured data with markup to *annotate any and every part of a ScrollSets* while still generating strict tabular files for data analysis tools.
- Put data, schema, citations, and documentation *all* in one (or more) plain text file(s) to easily share, collaborate on, and improve, all *tracked by git for trust*.
- Add unlimited citations (or none) to *every* measurement.
# Quick Code Example:
codeWithHeader planets.scroll
This ScrollSets has 2 measures (columns) and 2 concepts (rows).
Documentation, column definitions, rows and *any notes/markup/content* can go in the same file.
# Measures (aka Header, aka Columns, aka Schema)
idParser
// Every concept needs an "id" (or other concept delimiter)
extends abstractIdParser
moonsParser
extends abstractIntegerParser
# Concepts (aka Rows)
id mars
moons 2
// I verified moon count with Google. - BY
id jupiter
moons 63
// Note: the moons of Jupiter have their own Wikipedia Page
https://en.wikipedia.org/wiki/Moons_of_Jupiter moons of Jupiter
buildConcepts demo.csv
# The code above generates an HTML page and this:
codeWithHeader demo.csv
id,moons
mars,2
jupiter,63
# Overview:
- ScrollSets are built from 4 atomic elements:
- concepts
- think of rows in a spreadsheet
- denoted by a line starting with `id `
- concepts are multiple lines of measurements
- measures
- think of these as the column names in a spreadsheet, along with meta information about the column
- aka "parsers"
- measures are defined in Parsers that start with a line like `moonsParser`
- values
- these are just the values of the measurements
- measurements
- concept & measure & value = measurement
- 1 measurement = 1 line
- measurements can have nested comments that are stripped when compiling to TSV/CSV
# How to use
- A concept is like a row in a database. All concepts need an id (or other concept delimiter). When you write `id [conceptId]`, Scroll knows that is the beginning of a new concept.
- Measure definitions (aka "parsers") must come before the first concept and are written as Parsers, just like any other Scroll Parser. Measure parsers need to extend one of the abstract measure parser classes defined in `measures.scroll`.
// A schema is a set of measure definitions. You can think of measures as columns. Measure names (currently) can only contain [a-zA-Z0-9_]. They cannot contain spaces or periods (the period is reserved for nested measures).
- Measurements are then done like this `appeared 2024`
# FAQ
? Isn't the better idea to enhance existing spreadsheet GUIs with LLM generation capabilities?
Almost certainly. Using ScrollSets will be much slower and worse than future spreadsheet apps with carefully crafted LLM integrations.
However, it's important to also have simple, lower tech, timeless tools and ScrollSets is one of those.
? Can't you do this same thing with YAML and/or Markdown?
Yes! You can easily achieve the same thing as LLMs & ScrollSets using LLMs & YAML, or LLMs & YAML & Markdown.
https://yaml.org/ YAML
https://github.github.com/gfm/ Markdown
For YAML, just put your documentation and schema in YAML comments up top and then have a tiny script to read that YAML and dump CSV/TSV/JSON or whatever. YAML gives you loads of data structures to use and is widely supported in many languages. But generating HTML from the same file would require more work.
If you want to intermix markup content with your data, you can use Markdown to add the marked up content and then have code sections embedding the YAML and a tiny script to parse out those YAML blocks and write your data to disk.
? So, why use Scroll for storing data instead of YAML?
Either can do the job. I expect the Scroll design to end up being more ergonomic, but that might not be true or may be unimportant.
// ergonomic: relating to or designed for efficiency and comfort in the working environment.
If you don't like Scroll's (evolving) version and want to switch it will always be straightforward to automatically refactor to YAML.
? What other related work is out there?
This is a simple pattern to implement, so I'm sure it is likely it has been done a few times before. Please let me know so I can include links to--and learn from--any other prior art.
? What are the advanced features?
- Types correctly exported in JSON
- Supports nested measures
- Support for computed measures
- Autojoins across files on ids^roadmap
- Auto generates normalized tables for array measures^roadmap
- Support for text blobs^roadmap
^roadmap Planned.
label +
? What is the origin of ScrollSets?
LLM dataset generation is a _major_ breakthrough in datasets. ScrollSets are, at best, a minor improvement. They are designed to work alongside LLMs to help solve the Dataset Needed problem.
https://breckyunits.com/dataset-needed.html Dataset Needed
ScrollSets evolved out of TrueBase. ScrollSets have eliminated the need for the TrueBase software (and existing TrueBase sites should be migrated to ScrollSets), but were informed by the TrueBase build experience.
https://truebase.treenotation.org TrueBase
Although ScrollSets are designed for a world with LLMs, the design is meant to be useful without them as well, and would also have been mildly useful 30 years ago.
? What were the design goals?
- Have an LLM do the bulk of the work while humans supervise to remove hallucinations.
- Can store everything (documentation, schema, all concepts) in 1 clean plain text file or split into many files (using the `import` parser).
- The ScrollSet syntax balances _looseness_ useful in creative thinking with the _tightness_ needed by tabular data visualization and analysis tools.
? Why are measures and concepts root-level features and not indented?
The normal way to implement this in Scroll would be something like:
code
measures
id string
moons int
concept
id mars
moons 2
concept
id jupiter
moons 63
The flat design was chosen for ergonomic reasons. ScrollSets seem like they might be useful enough to be worth breaking from Scroll convention a bit. Like all things in Scroll, ScrollSets are an experiment, and maybe this design will evolve.
# Extended Example: a Planets ScrollSet
Below is the ScrollSet embedded in this Scroll file.
planets.csv
printTable
tableSearch
# Measurements of the measures
planetMeasures.tsv
printTable
## Extended Measures Example
belowAsCodeUntil // end measures
idParser
extends abstractIdParser
diameterParser
extends abstractIntegerMeasureParser
description What is the diameter of the planet?
surfaceGravityParser
extends abstractIntegerMeasureParser
description What is the surface gravity of the planet?
yearsToOrbitSunParser
extends abstractFloatMeasureParser
description How many Earth years does it take for the planet to orbit the Sun?
moonsParser
extends abstractIntegerMeasureParser
description How many moons does the planet have?
boolean isMeasureRequired true
float sortIndex 1.1
akaParser
extends abstractStringMeasureParser
description What are the alternative names for the planet?
ageParser
extends abstractIntegerMeasureParser
description How old is this planet?
hasLifeParser
extends abstractBooleanMeasureParser
description Does this planet have life?
wikipediaParser
extends abstractUrlMeasureParser
description URL to the Wikipedia page.
// end measures
# Extended Concepts Example
belowAsCodeUntil // end concepts
id Mars
moons 2
// Til Mars has 2 moons!
diameter 6794
surfaceGravity 4
yearsToOrbitSun 1.881
hasLife false
id Jupiter
moons 63
// The moons of Jupiter have their own Wikipedia Page
https://en.wikipedia.org/wiki/Moons_of_Jupiter moons of Jupiter
diameter 142984
surfaceGravity 25
yearsToOrbitSun 11.86
hasLife false
id Earth
moons 1
diameter 12756
surfaceGravity 10
yearsToOrbitSun 1
aka Pale Blue Dot
hasLife true
wikipedia https://en.wikipedia.org/wiki/Earth
age 4500000000
// Note: It was only during the 19th century that geologists realized Earth's age was at least many millions of years.
id Mercury
moons 0
diameter 4879
surfaceGravity 4
yearsToOrbitSun 0.241
hasLife false
id Saturn
moons 64
diameter 120536
surfaceGravity 9
yearsToOrbitSun 29.46
hasLife false
id Uranus
moons 27
diameter 51118
surfaceGravity 8
yearsToOrbitSun 84.01
hasLife false
id Venus
moons 0
diameter 12104
surfaceGravity 9
yearsToOrbitSun 0.615
hasLife false
id Neptune
moons 14
diameter 49572
surfaceGravity 11
yearsToOrbitSun 164.79
hasLife false
// end concepts
# Related
printRelated ScrollSets
footer.scroll