-
Notifications
You must be signed in to change notification settings - Fork 0
Defining Projections
There are two ways to define projections: TypeScript-based and YAML- or JSON-based. The former approach allows for maximum flexibility in the design of the widgets but also results in increased development and maintenance effort. The latter approach is limited to purely textual projections but requires less effort for setup and learning and allows for flexible changes and extension at runtime.
Projections in YAML or JSON are defined as extensions of an existing object: Either a projection package, a subprojection, or a root projection. Each YAML- or JSON file can contain an arbitrary number of such extensions, therefore the root element of each file is an array of objects. Each object corresponds to one extension and has a type
field that indicates what kind of entity the object represents. All JSON- and YAML-based projection definitions obey the JSON schema defined here.
We recommend storing the projection definitions in *.puredit.yaml or *.puredit.json files. If the VS Code extension is installed in your VS Code editor, it will automatically use the included JSON schema to validate the files and give code completion. Also, when a folder is opened in VS Code, it will be automatically scanned for files with this ending and loaded automatically. However, YAML- or JSON files can also be added and removed manually in the VS Code settings under Puredit: Declarative Projection Descriptors
In the following sections, we will go through all possibilities to define new and extend existing projections using YAML- or JSON. For the sake of shortness, we will only show a YAML sample for each use case but since there is feature parity between definitions in JSON and YAML the corresponding JSON code for each sample can be obtained using an online YAML to JSON translator.
The most simple use case to define a flat root projection, i.e. a root projection without any subprojections as in the following code sample: We define an extension to the package py-polars
with one root projection named Polars:Dataframe:CreateFromCsv
. Since the names of all projections must be unique across all packages loaded in the projectional editor, we recommend structuring the names as in the code samples here.
The description
field contains a description of the semantic meaning of the code that is matched by the pattern and will be used by the editor when the projections are searched and displayed in the code completion. The field isExpression
indicates if the pattern should be a statement pattern or an expression pattern.
The field template
is used to define the code pattern we want to match. In this example, we want to match occurrences of the function call pl.read_csv("/with/some/file/path")
to create a dataframe from s CSV file. The file path should be variable. The static parts of the code we want to match occur in the pattern exactly as in the source code. All parameters are enclosed in <% %>
.
The field segmentWidgets
is an array of strings defining the widgets to replace the code matched by the pattern. In this example, the pattern contains neither chains nor aggregations, therefore we only require one widget. Just as in the template, the parameters are enclosed in <% %>
.
The field parameters
is an array of objects, where each object represents one of the parameters in the template. In this example, we model the identifier pl
as a context variable since it must be defined by some import before the pattern. The file path is modeled as an argument matching the node type string.
NOTE: The pattern matching algorithm allows to define template arguments matching multiple arbirary node types (both leaf and non-leaf node types). However, the TextInput control used to display them is only designed to handle arguments matching strings, integers, or identifiers. Combinations of these or other nodes types will also be displayed but the user experience editing them will probably be bad!
NOTE: The names of the projections loaded in an editor must be unique across all packages! Overwriting existing projections (i.e. creating a projection with the same name as an existing projection) is not supported an results in undefined behavior!
NOTE: The names of the parameters in the template and widgets must match exactly the names of the parameters defined under the field
parameters
! Also, the names of the parameters must be unique in the projection.
- package: py-polars
type: packageExtension
rootProjections:
- name: Polars:Dataframe:CreateFromCsv
type: rootProjection
description: Create a new dataframe from CSV.
isExpression: true
template: "<%pl%>.read_csv(<%filePath%>)"
segmentWidgets:
- New dataframe from CSV file <%filePath%>
parameters:
- name: pl
type: contextVariable
- name: filePath
type: argument
nodeTypes:
- string
The next sample is more complex: We want to define a pattern to match code samples like the following:
no_nulls = weather_data.select(avg_temp="Avg. Temp", city="Station.City").drop_nulls()
reduced_cols = weather_data.select(avg_temp="Avg. Temp", city="Station.City")
names = student_data.select("name").drop_nulls()
Also here, we define a root projection but the pattern in this example also contains a chain, hence we also need to define subprojections. Additionally, one of the subprojections contains an aggregation such that we also have to nest the subprojections.
We begin by defining the root projection as in the example above but the second parameter (dataframeChain
) is more complex since it is a chain and therefore requires subprojections: The subprojection for the chain start is defined under the field startSubProjection
. It has the same fields as a root projection only the field isExpression
does not exist. Next, the subprojections for the chain links are defined as an array of objects under the field linkSubProjections
.
While the first chain link is rather trivial, the second one also contains subprojections, since the columns to select are modeled as an aggregation. In this case, we aggregate the arguments of the select function, hence the field nodeType
of the aggregation must be set to argument_list
. The online Tree-sitter Playground can be used to find the correct node type for an aggregation.
NOTE: The parameter
columns
in the tempalte deliberately covers the parentheses of the function call, since they are part of theargument_list
node in the syntax tree! Nevertheless, the parentheses must be covered by the widgets of the select subprojection. This is the case for all aggregations: The start and end token of the projection must be covered by the aggregation in the pattern and by the widgets.
Since an aggregation of argument_list
does not require a start pattern, we leave out the field startSubProjection
and proceed with the partSubProjections
, which are defined as an array of objects as the chain links.
- type: packageExtension
package: py-polars
rootProjections:
- type: rootProjection
name: Polars:Dataframe:AnotherChain
description: Transform a dataframe and store the result.
isExpression: false
template: "<%result%> = <%dataframeChain%>"
segmentWidgets:
- "Define <%result%> as transformation of"
parameters:
- type: argument
name: result
nodeTypes:
- identifier
- type: chain
name: dataframeChain
minimumLength: 1
startSubProjection:
type: chainStart
name: Polars:Dataframe:BaseDataframe
description: Dataframe to transform.
template: "<%sourceDataFrame%>"
segmentWidgets:
- "Dataframe <%sourceDataFrame%> transformed by"
parameters:
- type: argument
name: sourceDataFrame
nodeTypes:
- identifier
linkSubProjections:
- type: chainLink
name: Polars:Dataframe:Shift
description: Shift the rows in a dataframe.
template: shift(<%numberOfRows%>)
segmentWidgets:
- "shifting rows by <%numberOfRows%>"
parameters:
- type: argument
name: numberOfRows
nodeTypes:
- integer
- type: chainLink
name: Polars:Dataframe:SelectList
description: Select columns from a dataframe.
template: select(<%columns%>)
segmentWidgets:
- "selecting column(s)"
- "end columns"
parameters:
- type: aggregation
name: columns
nodeType: list
partSubProjections:
- type: aggregationPart
name: Polars:AnotherColumn
description: Column to select from a dataframe.
template: "<%columnName%>"
segmentWidgets:
- "<%columnName%>"
parameters:
- type: argument
name: columnName
nodeTypes:
- string
In the examples before, we defined all projections from scratch. However, we can also extend existing projections with new subprojections. To do so, we require
- The name of the projection to extend, in our example
Polars:Column:Chain
- The name of the parameter to use the subprojection for, in our example
columnChain
. This name must be looked up in the definition of the projection to extend.
In this example, we add two chain links to the existing projection Polars:Column:Chain
.
- package: py-polars
type: projectionExtension
parentProjection: Polars:Column:Chain
parentParameter: columnChain
subProjections:
- name: Polars:Column:Median
type: chainLink
description: Take the median of a column.
template: "median(<%column%>)"
segmentWidgets:
- "<%column%> taking its median"
parameters:
- name: column
type: argument
nodeTypes:
- string
- name: Polars:Column:Mean
type: chainLink
description: Take the mean of a column.
template: "mean(<%column%>)"
segmentWidgets:
- "<%column%> taking its mean"
parameters:
- name: column
type: argument
nodeTypes:
- string
To allow reusing existing subprojections, existing projections can be referenced in aggregations as in the example below: We define a new chain link for the existing Polars:Dataframe:Chain
: Namely a subprojection to match a variant of the select
function where the columns to select are not passed as positional arguments but as a list. The aggregation parts projections are not defined as the allowed values in the list can already be matched by existing projections. Consequently, these projections are referenced.
- type: projectionExtension
package: py-polars
parentProjection: Polars:Dataframe:Chain
parentParameter: dataframeChain
subProjections:
- type: chainLink
name: Polars:Dataframe:SelectList
description: Select a list of columns
template: "select(<%columns%>)"
segmentWidgets:
- "select column(s)"
- "end columns"
parameters:
- type: aggregation
name: columns
nodeType: list
partSubProjections:
- type: aggregationPartReference
referencedProjection: Polars:Column
- type: aggregationPartReference
referencedProjection: Polars:Column:Chain
A projection defined in TypeScript is essentially a plain JavaScript object containing all the information we defined in YAML or JSON above.
We begin by defining the root projection: To this end, we first define the required parameters using the helper functions imported from @puredit/parser
. Notice how the chain parameter references the templates from the subprojections below. We then define the pattern as a tagged template string which is passed to the parser through the method expressionPattern
together with the name for the pattern and the projection.
Next, the widget is defined. Since the sample projection is rather simple and only consists of text, we use the function simpleProjection
from @puredit/projections
to define the widget. The function yields a Svelte component which will be rendered in the projectional editor. Alternatively, the widget can be defined explicitly as a svelte component (see e.g. the transpose projection in the py-pytorch package)
The pattern and the widget as well as all subprojections used by the projection or its subprojections are then put together in the root projection object.
NOTE: The array
subProjections
must contain all subprojections used in the root projection or in subprojections of the root projection!
// main.ts
import { arg, chain } from "@puredit/parser";
import type { simpleProjection, RootProjection } from "@puredit/projections";
import { parser } from "../parser";
import { baseDataframeSubProjection } from "./baseDataframe";
import { selectSubProjection } from "./select";
import { columnSubProjection } from "./column";
import { columnWithAliasSubProjection } from "./columnWithAlias";
import { dropAllNullsSubProjection } from "./dropAllNulls";
const resultDataframe = arg("resultDataframe", ["identifier"])
const dataframeChain = chain(
"dataframeChain",
selectStartSubProjection.template,
[
selectSubProjection.template,
dropAllNullsSubProjection.template,
],
1
);
const pattern = parser.expressionPattern("Polars:Dataframe:Chain")`${resultDataframe} = ${dataframeChain}`;
const widget = simpleProjection(["Define", resultDataframe, "as transformation of"]);
export const dataFrameChainProjection: RootProjection = {
pattern,
description: "Transform a dataframe.",
requiredContextVariables: [],
segmentWidgets: [widget],
subProjections: [
chainStartSubProjection,
selectSubProjection,
columnSubProjection,
columnWithAliasSubProjection,
dropAllNullsSubProjection,
],
};
Next, we define the required subprojections: Code follows a similar structure as for the root projection: We first define the parameters and then use these to construct the pattern. The pattern is then passed to the method subPattern
of the parser. The widgets are defined as in the root projection and then put together with the pattern in the subprojection object.
// baseDataframe.ts
import { arg } from "@puredit/parser";
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";
const sourceDataFrame = arg("sourceDataFrame", ["identifier"]);
export const template = parser.subPattern("Polars:Dataframe:ChainStart")`${sourceDataFrame}`;
const widget = simpleProjection(["Dataframe", sourceDataFrame, "transformed by"]);
export const baseDataframeSubProjection: SubProjection = {
template,
description: "Dataframe to transform.",
requiredContextVariables: [],
segmentWidgets: [widget],
};
// select.ts
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";
import { agg } from "@puredit/parser";
import { columnSubProjection } from "./column";
import { columnWithAliasSubProjection } from "./columnWithAlias";
const columns = agg("columns", "argument_list", [
columnSubProjection.template,
columnWithAliasSubProjection.template,
]);
const template = parser.subPattern("Polars:Dataframe:SelectColumns")`select${columns}`;
const beginWidget = simpleProjection(["reading column(s)"]);
const endWidget = simpleProjection(["end columns"]);
export const selectSubProjection: SubProjection = {
template,
description: "Select columns from a dataframe.",
requiredContextVariables: [],
segmentWidgets: [beginWidget, endWidget],
};
// column.ts
import { arg } from "@puredit/parser";
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";
const columnName = arg("columnName", ["string"]);
const template = parser.subPattern("Polars:Column")`${columnName}`;
const widget = simpleProjection([columnName]);
export const columnSubProjection: SubProjection = {
template,
description: "Column to select from a dataframe.",
requiredContextVariables: [],
segmentWidgets: [widget],
};
// columnWithAlias.ts
import { arg } from "@puredit/parser";
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";
const columnName = arg("columnName", ["string"]);
const columnAlias = arg("columnAlias", ["identifier"]);
const template = parser.subPattern("Polars:ColumnWithAlias")`${columnAlias}=${columnName}`;
const widget = simpleProjection([columnName, "as", columnAlias]);
export const columnWithAliasSubProjection: SubProjection = {
template,
description: "Column with alias to select from a dataframe.",
requiredContextVariables: [],
segmentWidgets: [widget],
};
// dropAllNulls.ts
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";
const template = parser.subPattern("Polars:Dataframe:DropAllNulls")`drop_nulls()`;
const widget = simpleProjection(["removing nulls in all columns"]);
export const dropAllNullsSubProjection: SubProjection = {
template,
description: "Drop all nulls in dataframe.",
requiredContextVariables: [],
segmentWidgets: [widget],
};
🏠 Home
❕ General
🎭 For DSL Developers
🔧 For Framework Developers