Skip to content

Commit

Permalink
Simplify Scroll Datasets syntax more
Browse files Browse the repository at this point in the history
  • Loading branch information
Breck Yunits authored and Breck Yunits committed Apr 21, 2024
1 parent 5671e7c commit 3eaa3bf
Show file tree
Hide file tree
Showing 6 changed files with 46 additions and 58 deletions.
Binary file added blog/datasets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
66 changes: 35 additions & 31 deletions blog/datasets.scroll
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ title Scroll Datasets: source code for CSVs

byLine https://breckyunits.com Breck Yunits

image datasets.png
caption More examples of Scroll Datasets from datasets.scroll.pub.
http://datasets.scroll.pub/ datasets.scroll.pub

date 4/21/2024

The source code for this blog post contains a dataset about the planets and generates this HTML file as well as a CSV, a TSV, and a JSON file. It demonstrates Scroll Datasets.
Expand All @@ -27,7 +31,7 @@ Scroll Datasets are normal plain text blog posts written in Scroll that also con
https://scroll.pub/ Scroll
match 1

Scroll Datasets are line oriented but represent a table(s). You might call them _deconstructed spreadsheets_.
Scroll Datasets are line oriented but represent a table(s). You might call them _deconstructed csvs_ or _deconstructed spreadsheets_.

- Use LLMs to *instantly generate datasets* that are ready for human review and improvement.
- Intermingle structured data with markup to *annotate any and every part of a dataset* while still generating strict tabular files for data analysis tools.
Expand All @@ -41,29 +45,29 @@ code

Documentation, column definitions, rows and *any notes/markup/content* can go in the same file.

# Schema:
# Measures (aka Header, aka Columns, aka Schema)

id: string
moons: int

# Data:
# Concepts (aka Rows)

:::
::

id: mars
moons: 2

I verified moon count with Google. - BY

:::
::

id: jupiter
moons: 63

The moons of Jupiter have their own Wikipedia Page
https://en.wikipedia.org/wiki/Moons_of_Jupiter moons of Jupiter

:::
::

writeDataset demo.csv

Expand All @@ -77,15 +81,15 @@ code

# Overview:
- A dataset consists of 4 atomic elements:
- measures (think columns)
- measures (think columns or the header row in a CSV)
- concepts (think rows)
- values (think values)
- measurements (concept & measure & value = measurement)

# How to use
- Measure definitions are done like this `appeared:: int`
- A concept is like a row in a database. Concepts are delimited by `::`.
- Measure definitions must come before the first concept (`::`) and are written like: `appeared: int`
// A schema is a set of measure definitions. You can think of measures as columns. Measure names (currently) can only contain [a-zA-Z0-9_]. They cannot contain spaces or periods (the period is reserved for nested measures).
- A concept is like a row in a database. Concepts are marked by the `:::`.
- Measurements are done like this `appeared: 2024`

# FAQ
Expand Down Expand Up @@ -160,28 +164,28 @@ tableSearch

## Measures

id:: string
id: string

title:: string
title: string

diameter:: int
description What is the diameter of the planet?
diameter: int
What is the diameter of the planet?

surfaceGravity:: int
description What is the surface gravity of the planet?
surfaceGravity: int
What is the surface gravity of the planet?

yearsToOrbitSun:: float
description How many Earth years does it take for the planet to orbit the Sun?
yearsToOrbitSun: float
How many Earth years does it take for the planet to orbit the Sun?

moons:: int
description How many moons does the planet have?
moons: int
How many moons does the planet have?

aka:: string
description What are the alternative names for the planet?
aka: string
What are the alternative names for the planet?

# Data
# Concepts

:::
::

id: mars
title: Mars
Expand All @@ -192,7 +196,7 @@ moons: 2

// Til Mars has 2 moons!

:::
::

id: jupiter
title: Jupiter
Expand All @@ -204,7 +208,7 @@ moons: 63
The moons of Jupiter have their own Wikipedia Page
https://en.wikipedia.org/wiki/Moons_of_Jupiter moons of Jupiter

:::
::

id: earth
title: Earth
Expand All @@ -219,7 +223,7 @@ age: 4500000000

* Note: It was only during the 19th century that geologists realized Earth's age was at least many millions of years.

:::
::

id: mercury
title: Mercury
Expand All @@ -228,7 +232,7 @@ surfaceGravity: 4
yearsToOrbitSun: 0.241
moons: 0

:::
::

id: saturn
title: Saturn
Expand All @@ -237,7 +241,7 @@ surfaceGravity: 9
yearsToOrbitSun: 29.46
moons: 64

:::
::

id: uranus
title: Uranus
Expand All @@ -246,7 +250,7 @@ surfaceGravity: 8
yearsToOrbitSun: 84.01
moons: 27

:::
::

id: venus
title: Venus
Expand All @@ -255,7 +259,7 @@ surfaceGravity: 9
yearsToOrbitSun: 0.615
moons: 0

:::
::

id: neptune
title: Neptune
Expand All @@ -264,6 +268,6 @@ surfaceGravity: 11
yearsToOrbitSun: 164.79
moons: 14

:::
::

import footer.scroll
21 changes: 3 additions & 18 deletions grammar/datasets.grammar
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ measureTypeCell
enum string int bool url uid float enum

conceptStartParser
crux :::
crux ::
description Begins a concept.
extends abstractCommentParser
javascript
Expand All @@ -19,23 +19,8 @@ conceptStartParser
measureNameCell
highlightScope keyword

measureDefinitionParser
description Define a measure for a dataset.
cells measureNameCell measureTypeCell
pattern ^[a-zA-Z0-9_]+::( |$)
// Currently just treated as a comment while testing if Datasets is worth developing.
extends abstractCommentParser
javascript
compile() {
return `<div>${Utils.linkify(this.getLine())}</div>`
}
example
:::
name:: string
order:: int

measurementParser
description Add a measurement to a concept.
description Add a measurement to a concept. Also used for defining measures.
cells measureNameCell
pattern ^[a-zA-Z0-9_]+:( |$)
catchAllCellType valueCell
Expand All @@ -45,7 +30,7 @@ measurementParser
return `<div>${Utils.linkify(this.getLine())}</div>`
}
example
:::
::
id: earth
order: 3

Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "scroll-cli",
"version": "77.0.0",
"version": "77.1.0",
"description": "Tools for thoughts.",
"main": "scroll.js",
"engines": {
Expand Down
5 changes: 4 additions & 1 deletion releaseNotes.scroll
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,11 @@ title Scroll Release Notes

startColumns

# 77.1.0 4/21/2024
- 🎉 Simplified datasets further after user tests.

# 77.0.0 4/21/2024
- 🎉 Added Scroll Datasets, which consists of the `:::`, `printDataset`, `writeDataset`, and `*:` and `*::` keywords.
- 🎉 Added Scroll Datasets, which consists of the `::`, `printDataset`, `writeDataset`, and `*:` and `*::` keywords.
link blog/datasets.html Scroll Datasets

- ⚠️ BREAKING: if you had lines starting with a word then colon, that used the catchall paragraph, such as `Sidenote: yada yada.`, those will now be parsed incorrectly as measures. Just explicitly make them paragraphs `* Sidenote: yada yada.`
Expand Down
10 changes: 3 additions & 7 deletions scroll.js
Original file line number Diff line number Diff line change
Expand Up @@ -89,11 +89,11 @@ const parseDataset = content => {
const schema = {}
tree.forEach(node => {
const word = node.getWord(0)
if (word.endsWith("::")) schema[word.replace("::", ":")] = node
if (word.endsWith(":")) schema[word] = node
})
return schema
}
const conceptDelimiter = /^:::/m
const conceptDelimiter = /^::/m
let schema = null
const concepts = content
.split(conceptDelimiter)
Expand Down Expand Up @@ -551,11 +551,7 @@ import footer.scroll`
// If this proves useful maybe make slight adjustments to Scroll lang to be more imperative.
if (file.has(scrollKeywords.writeDataset)) {
file.scrollProgram.findNodes(scrollKeywords.writeDataset).forEach(node => {
const link = node.getWord(1)
if (!link) {
this.log(`⚠️ No filename provided after ${scrollKeywords.writeDataset} keyword. Skipping`)
return
}
const link = node.getWord(1) || permalink.replace(".html", ".tsv")

const extension = link.split(".").pop()
fileSystem.write(folder + link, file.makeDataset(extension))
Expand Down

0 comments on commit 3eaa3bf

Please sign in to comment.