Skip to content

Defining Projections

Andre Trump edited this page Aug 26, 2024 · 6 revisions

There are two ways to define projections: TypeScript-based and YAML- or JSON-based. The former approach allows for maximum flexibility in the design of the widgets but also results in increased development and maintenance effort. The latter approach is limited to purely textual projections but requires less effort for setup and learning and allows for flexible changes and extension at runtime.

YAML- or JSON-based Projections

Projections in YAML or JSON are defined as extensions of an existing object: Either a projection package, a subprojection, or a root projection. Each YAML- or JSON file can contain an arbitrary number of such extensions, therefore the root element of each file is an array of objects. Each object corresponds to one extension and has a type field that indicates what kind of entity the object represents. All JSON- and YAML-based projection definitions obey the JSON schema defined here.

We recommend storing the projection definitions in *.puredit.yaml or *.puredit.json files. If the VS Code extension is installed in your VS Code editor, it will automatically use the included JSON schema to validate the files and give code completion. Also, when a folder is opened in VS Code, it will be automatically scanned for files with this ending and loaded automatically. However, YAML- or JSON files can also be added and removed manually in the VS Code settings under Puredit: Declarative Projection Descriptors

In the following sections, we will go through all possibilities to define new and extend existing projections using YAML- or JSON. For the sake of shortness, we will only show a YAML sample for each use case but since there is feature parity between definitions in JSON and YAML the corresponding JSON code for each sample can be obtained using an online YAML to JSON translator.

Defining a flat root projection

The most simple use case to define a flat root projection, i.e. a root projection without any subprojections as in the following code sample: We define an extension to the package py-polars with one root projection named Polars:Dataframe:CreateFromCsv. Since the names of all projections must be unique across all packages loaded in the projectional editor, we recommend structuring the names as in the code samples here.

The description field contains a description of the semantic meaning of the code that is matched by the pattern and will be used by the editor when the projections are searched and displayed in the code completion. The field isExpression indicates if the pattern should be a statement pattern or an expression pattern.

The field template is used to define the code pattern we want to match. In this example, we want to match occurrences of the function call pl.read_csv("/with/some/file/path") to create a dataframe from s CSV file. The file path should be variable. The static parts of the code we want to match occur in the pattern exactly as in the source code. All parameters are enclosed in <% %>.

The field segmentWidgets is an array of strings defining the widgets to replace the code matched by the pattern. In this example, the pattern contains neither chains nor aggregations, therefore we only require one widget. Just as in the template, the parameters are enclosed in <% %>.

The field parameters is an array of objects, where each object represents one of the parameters in the template. In this example, we model the identifier pl as a context variable since it must be defined by some import before the pattern. The file path is modeled as an argument matching the node type string.

NOTE: The pattern matching algorithm allows to define template arguments matching multiple arbirary node types (both leaf and non-leaf node types). However, the TextInput control used to display them is only designed to handle arguments matching strings, integers, or identifiers. Combinations of these or other nodes types will also be displayed but the user experience editing them will probably be bad!

NOTE: The names of the projections loaded in an editor must be unique across all packages! Overwriting existing projections (i.e. creating a projection with the same name as an existing projection) is not supported an results in undefined behavior!

NOTE: The names of the parameters in the template and widgets must match exactly the names of the parameters defined under the field parameters! Also, the names of the parameters must be unique in the projection.

- package: py-polars
  type: packageExtension
  rootProjections:
  - name: Polars:Dataframe:CreateFromCsv
    type: rootProjection
    description: Create a new dataframe from CSV.
    isExpression: true
    template: "<%pl%>.read_csv(<%filePath%>)"
    segmentWidgets:
    - New dataframe from CSV file <%filePath%>
    parameters:
    - name: pl
      type: contextVariable
    - name: filePath
      type: argument
      nodeTypes:
      - string

Defining a root projection with subprojections

The next sample is more complex: We want to define a pattern to match code samples like the following:

no_nulls = weather_data.select(avg_temp="Avg. Temp", city="Station.City").drop_nulls()
reduced_cols = weather_data.select(avg_temp="Avg. Temp", city="Station.City")
names = student_data.select("name").drop_nulls()

Also here, we define a root projection but the pattern in this example also contains a chain, hence we also need to define subprojections. Additionally, one of the subprojections contains an aggregation such that we also have to nest the subprojections.

We begin by defining the root projection as in the example above but the second parameter (dataframeChain) is more complex since it is a chain and therefore requires subprojections: The subprojection for the chain start is defined under the field startSubProjection. It has the same fields as a root projection only the field isExpression does not exist. Next, the subprojections for the chain links are defined as an array of objects under the field linkSubProjections.

While the first chain link is rather trivial, the second one also contains subprojections, since the columns to select are modeled as an aggregation. In this case, we aggregate the arguments of the select function, hence the field nodeType of the aggregation must be set to argument_list. The online Tree-sitter Playground can be used to find the correct node type for an aggregation.

NOTE: The parameter columns in the tempalte deliberately covers the parentheses of the function call, since they are part of the argument_list node in the syntax tree! Nevertheless, the parentheses must be covered by the widgets of the select subprojection. This is the case for all aggregations: The start and end token of the projection must be covered by the aggregation in the pattern and by the widgets.

Since an aggregation of argument_list does not require a start pattern, we leave out the field startSubProjection and proceed with the partSubProjections, which are defined as an array of objects as the chain links.

- type: packageExtension
  package: py-polars
  rootProjections:
  - type: rootProjection
    name: Polars:Dataframe:AnotherChain
    description: Transform a dataframe and store the result.
    isExpression: false
    template: "<%result%> = <%dataframeChain%>"
    segmentWidgets: 
      - "Define <%result%> as transformation of"
    parameters:
    - type: argument
      name: result
      nodeTypes:
        - identifier

    - type: chain
      name: dataframeChain
      minimumLength: 1
      startSubProjection:
        type: chainStart
        name: Polars:Dataframe:BaseDataframe
        description: Dataframe to transform.
        template: "<%sourceDataFrame%>"
        segmentWidgets:
        - "Dataframe <%sourceDataFrame%> transformed by"
        parameters:
        - type: argument
          name: sourceDataFrame
          nodeTypes:
          - identifier

      linkSubProjections:
      - type: chainLink
        name: Polars:Dataframe:Shift
        description: Shift the rows in a dataframe.
        template: shift(<%numberOfRows%>)
        segmentWidgets:
        - "shifting rows by <%numberOfRows%>"
        parameters:
        - type: argument
          name: numberOfRows
          nodeTypes:
            - integer

      - type: chainLink
        name: Polars:Dataframe:SelectList
        description: Select columns from a dataframe.
        template: select(<%columns%>)
        segmentWidgets:
        - "selecting column(s)"
        - "end columns"
        parameters:
        - type: aggregation
          name: columns
          nodeType: list
          partSubProjections:
          - type: aggregationPart
            name: Polars:AnotherColumn
            description: Column to select from a dataframe.
            template: "<%columnName%>"
            segmentWidgets:
            - "<%columnName%>"
            parameters:
            - type: argument
              name: columnName
              nodeTypes:
              - string

Adding a subprojection to an existing projection

In the examples before, we defined all projections from scratch. However, we can also extend existing projections with new subprojections. To do so, we require

  1. The name of the projection to extend, in our example Polars:Column:Chain
  2. The name of the parameter to use the subprojection for, in our example columnChain. This name must be looked up in the definition of the projection to extend.

In this example, we add two chain links to the existing projection Polars:Column:Chain.

- package: py-polars
  type: projectionExtension
  parentProjection: Polars:Column:Chain
  parentParameter: columnChain
  subProjections:
  - name: Polars:Column:Median
    type: chainLink
    description: Take the median of a column.
    template: "median(<%column%>)"
    segmentWidgets:
    - "<%column%> taking its median"
    parameters:
    - name: column
      type: argument
      nodeTypes:
      - string

  - name: Polars:Column:Mean
    type: chainLink
    description: Take the mean of a column.
    template: "mean(<%column%>)"
    segmentWidgets:
    - "<%column%> taking its mean"
    parameters:
    - name: column
      type: argument
      nodeTypes:
      - string

Referencing existing projections

To allow reusing existing subprojections, existing projections can be referenced in aggregations as in the example below: We define a new chain link for the existing Polars:Dataframe:Chain: Namely a subprojection to match a variant of the select function where the columns to select are not passed as positional arguments but as a list. The aggregation parts projections are not defined as the allowed values in the list can already be matched by existing projections. Consequently, these projections are referenced.

- type: projectionExtension
  package: py-polars
  parentProjection: Polars:Dataframe:Chain
  parentParameter: dataframeChain
  subProjections:
  - type: chainLink
    name: Polars:Dataframe:SelectList
    description: Select a list of columns
    template: "select(<%columns%>)"
    segmentWidgets:
      - "select column(s)"
      - "end columns"
    parameters:
      - type: aggregation
        name: columns
        nodeType: list
        partSubProjections:
          - type: aggregationPartReference
            referencedProjection: Polars:Column
          - type: aggregationPartReference
            referencedProjection: Polars:Column:Chain

TypeScript-based Projections

A projection defined in TypeScript is essentially a plain JavaScript object containing all the information we defined in YAML or JSON above.

We begin by defining the root projection: To this end, we first define the required parameters using the helper functions imported from @puredit/parser. Notice how the chain parameter references the templates from the subprojections below. We then define the pattern as a tagged template string which is passed to the parser through the method expressionPattern together with the name for the pattern and the projection.

Next, the widget is defined. Since the sample projection is rather simple and only consists of text, we use the function simpleProjection from @puredit/projections to define the widget. The function yields a Svelte component which will be rendered in the projectional editor. Alternatively, the widget can be defined explicitly as a svelte component (see e.g. the transpose projection in the py-pytorch package)

The pattern and the widget as well as all subprojections used by the projection or its subprojections are then put together in the root projection object.

NOTE: The array subProjections must contain all subprojections used in the root projection or in subprojections of the root projection!

// main.ts
import { arg, chain } from "@puredit/parser";
import type { simpleProjection, RootProjection } from "@puredit/projections";
import { parser } from "../parser";
import { baseDataframeSubProjection } from "./baseDataframe";
import { selectSubProjection } from "./select";
import { columnSubProjection } from "./column";
import { columnWithAliasSubProjection } from "./columnWithAlias";
import { dropAllNullsSubProjection } from "./dropAllNulls";

const resultDataframe = arg("resultDataframe", ["identifier"])
const dataframeChain = chain(
  "dataframeChain",
  selectStartSubProjection.template,
  [
    selectSubProjection.template,
    dropAllNullsSubProjection.template,
  ],
  1
);
const pattern = parser.expressionPattern("Polars:Dataframe:Chain")`${resultDataframe} = ${dataframeChain}`;
const widget = simpleProjection(["Define", resultDataframe, "as transformation of"]);

export const dataFrameChainProjection: RootProjection = {
  pattern,
  description: "Transform a dataframe.",
  requiredContextVariables: [],
  segmentWidgets: [widget],
  subProjections: [
    chainStartSubProjection,
    selectSubProjection,
    columnSubProjection,
    columnWithAliasSubProjection,
    dropAllNullsSubProjection,
  ],
};

Next, we define the required subprojections: Code follows a similar structure as for the root projection: We first define the parameters and then use these to construct the pattern. The pattern is then passed to the method subPattern of the parser. The widgets are defined as in the root projection and then put together with the pattern in the subprojection object.

// baseDataframe.ts
import { arg } from "@puredit/parser";
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";

const sourceDataFrame = arg("sourceDataFrame", ["identifier"]);
export const template = parser.subPattern("Polars:Dataframe:ChainStart")`${sourceDataFrame}`;
const widget = simpleProjection(["Dataframe", sourceDataFrame, "transformed by"]);

export const baseDataframeSubProjection: SubProjection = {
  template,
  description: "Dataframe to transform.",
  requiredContextVariables: [],
  segmentWidgets: [widget],
};
// select.ts
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";
import { agg } from "@puredit/parser";
import { columnSubProjection } from "./column";
import { columnWithAliasSubProjection } from "./columnWithAlias";

const columns = agg("columns", "argument_list", [
  columnSubProjection.template,
  columnWithAliasSubProjection.template,
]);
const template = parser.subPattern("Polars:Dataframe:SelectColumns")`select${columns}`;
const beginWidget = simpleProjection(["reading column(s)"]);
const endWidget = simpleProjection(["end columns"]);

export const selectSubProjection: SubProjection = {
  template,
  description: "Select columns from a dataframe.",
  requiredContextVariables: [],
  segmentWidgets: [beginWidget, endWidget],
};
// column.ts
import { arg } from "@puredit/parser";
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";

const columnName = arg("columnName", ["string"]);
const template = parser.subPattern("Polars:Column")`${columnName}`;
const widget = simpleProjection([columnName]);

export const columnSubProjection: SubProjection = {
  template,
  description: "Column to select from a dataframe.",
  requiredContextVariables: [],
  segmentWidgets: [widget],
};
// columnWithAlias.ts
import { arg } from "@puredit/parser";
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";

const columnName = arg("columnName", ["string"]);
const columnAlias = arg("columnAlias", ["identifier"]);
const template = parser.subPattern("Polars:ColumnWithAlias")`${columnAlias}=${columnName}`;
const widget = simpleProjection([columnName, "as", columnAlias]);

export const columnWithAliasSubProjection: SubProjection = {
  template,
  description: "Column with alias to select from a dataframe.",
  requiredContextVariables: [],
  segmentWidgets: [widget],
};
// dropAllNulls.ts
import { simpleProjection, SubProjection } from "@puredit/projections";
import { parser } from "../parser";

const template = parser.subPattern("Polars:Dataframe:DropAllNulls")`drop_nulls()`;
const widget = simpleProjection(["removing nulls in all columns"]);

export const dropAllNullsSubProjection: SubProjection = {
  template,
  description: "Drop all nulls in dataframe.",
  requiredContextVariables: [],
  segmentWidgets: [widget],
};