# Code generation for recommender workflows

Anton Antonov  
RakuForPrediction at WordPress   
October 2025

----

## Introduction

This notebook demonstrates several different ways to generate Raku code for the package "ML::SparseMatrixRecommender". Both grammar-based interpreters and LLM-based translators are utilized.

----

## Setup

In [3]:
use Data::Reshapers;
use Data::Importers;
use Data::Summarizers;

use DSL::Translators;
use DSL::Examples;
use ML::NLPTemplateEngine;

use Math::SparseMatrix :ALL;
use Math::SparseMatrix::DOK;
use Math::SparseMatrix::Utilities;

use ML::SparseMatrixRecommender;

----

## Ingestion

Ingest a Titanic data CSV file from the Web:

In [2]:
my $url = 'https://raw.githubusercontent.com/antononcube/MathematicaVsR/refs/heads/master/Data/MathematicaVsR-Data-Titanic.csv';
my @dsData = data-import($url, headers => 'auto');
#@dsData .= map({ $_<id> = 'id.' ~ $_<id>; $_ });
#@dsData .= map({ $_<passengerAge> = $_<passengerAge>.Int; $_ });
@dsData.&dimensions

(1309 5)

Show that the data is a list of maps:

In [4]:
deduce-type(@dsData);

Vector(Assoc(Atom((Str)), Atom((Str)), 5), 1309)

---

## SMR

Create a recommender object over Titanic data:

In [None]:
my $smrObj = 
    ML::SparseMatrixRecommender.new
    .create-from-wide-form(@dsData,
        item-column-name => "id",
        tag-types => <passengerSex passengerClass passengerAge passengerSurvival>,
        :add-tag-types-to-column-names,
        tag-value-separator => ":")
    .apply-term-weight-functions("IDF", "None", "Cosine")

Here is a profile-based recommendation pipeline:

In [None]:
#% html
my @field-names = 'score', 'id', |@dsData.head.keys.grep(* ne 'id').sort;
$smrObj
.recommend-by-profile({"passengerSex:male" => 1, "passengerAge:30" => 20}, 12)
.join-across(@dsData)
.take-value
==> to-html(:@field-names)   

In [None]:
$smrObj.take-matrices.keys

Here is a (history) recommendation pipeline:

In [None]:
#% html
my @field-names = [|<score id>, |$smrObj.take-matrices.keys.sort];
sink $smrObj
.recommend(<1089 1132>, 10, :!remove-history)
.echo-value
.join-across(@dsData, on => 'id')
.echo-value(as => {&to-pretty-table($_, :@field-names)});

----

## DSL translation (grammars, ProdGDT)

Here a recommender pipeline specified with natural language commands is translated into R code using the ProdGDT Web service:

In [None]:
'
create from @dsData; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st
'
==> dsl-web-translation(to => 'R', format => 'CODE')

Here is similar pipeline is translated with a sub of the package "DSL::Translators":

In [None]:
'
create from @dsData; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value
'
==> ToDSLCode(to => 'Raku', format => 'CODE')
==> {.subst('.', "\n."):g}()

----

## DSL Translation (LLM examples)

Show known DSL translation examples in "DSL::Examples":

In [None]:
#% html
dsl-examples().map({ $_.key X $_.value.keys }).flat(1).map({ <language workflow> Z=> $_ })».Hash.sort.Array
==> to-dataset()
==> to-html(field-names => <language workflow>)

Define an LLM translation function:

In [None]:
my &llm-pipeline-segment = llm-example-function(dsl-examples()<Raku><SMRMon>);

Here is a recommender pipeline specified with natural language commands:

In [None]:
my $spec = q:to/END/;
new recommender;
create from @dsData; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
classify by profile passengerSex:female, and passengerClass:1st on the tag passengerSurvival;
echo value
END

sink my @commands = $spec.lines;

Translate to Raku code:

In [None]:
@commands
.map({ .&llm-pipeline-segment })
.map({ .subst(/:i Output \h* ':'?/, :g).trim })
.join("\n.")

Or just calling directly on `$spec`:

In [None]:
&llm-pipeline-segment($spec)

----

## NLP Template Engine

Translated "free text" recommender pipeline specification using `concretize` of "ML::NLPTemplateEngine":

In [None]:
'create with dfTitanic; apply the LSI functions IDF, None, Cosine;recommend by profile 1st and male'
==> concretize(lang => "Raku")


-----

## LLM graph


Making of an LLM graph that:
- Tries all three translation methods above.
- If the DSL grammar-based method does not work then the LLM-based ones are tried.
- The LLM methods are tried in parallel.
- There is a judge that picks which on of the LLM methods produced better result.    

In [9]:
my %rules =
    dsl-grammar => { 
        eval-function => sub ($spec, $lang = 'Raku') { ToDSLCode($spec, to => $lang, format => 'CODE') }
    },

    llm-examples => { 
        llm-function => 
            sub ($spec, $lang = 'Raku', $split = False) { 
                my &llm-pipeline-segment = llm-example-function(dsl-examples(){$lang}<SMRMon>);
                return do if $split {
                    note 'with spec splitting...';
                    my @commands = $spec.lines;
                    @commands.map({ .&llm-pipeline-segment }).map({ .subst(/:i Output \h* ':'?/, :g).trim }).join("\n.")
                } else {
                    note 'no spec splitting...';
                    &llm-pipeline-segment($spec).subst(";\n", "\n."):g
                }
            },
    },

    nlp-template-engine => {
        llm-function => sub ($spec, $lang = 'Raku') { concretize($spec, :$lang) }
    },

    judge => sub ($spec, $lang, $dsl-grammar, $llm-examples, $nlp-template-engine) {
            [
                "Choose the generated code that most fully adheres to the spec:\n",
                $spec,
                "\nfrom the following $lang generation results:\n\n",
                "1) DSL-grammar:\n$dsl-grammar\n",
                "2) LLM-examples:\n$llm-examples\n",
                "3) NLP-template-engine:\n$nlp-template-engine\n",
                "and copy it:"
            ].join("\n\n")
    },
    
    report => {
            eval-function => sub ($spec, $lang, $dsl-grammar, $llm-examples, $nlp-template-engine, $judge) {
                [
                    '# Best generated code',
                    "Three $lang code generations were submitted for the spec:",
                    '```text',
                    $spec,
                    '```',
                    'Here are the results:',
                    to-html( ['dsl-grammar', 'llm-examples', 'nlp-template-engine'].map({ [ name => $_, code => ::('$' ~ $_)] })».Hash.Array, field-names => <name code> ).subst("\n", '<br/>'):g,
                    '## Judgement',
                    $judge.contains('```') ?? $judge !! "```$lang\n" ~ $judge ~ "\n```"
                ].join("\n\n")
            }
    }        
;

my $gBestCode = LLM::Graph.new(%rules)

LLM::Graph(size => 5, nodes => dsl-grammar, judge, llm-examples, nlp-template-engine, report)

In [21]:
my $spec = q:to/END/;
make a brand new recommender with the data @dsData;
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
END

$gBestCode.eval(:$spec, lang => 'Raku', :split)

with spec splitting...


LLM::Graph(size => 5, nodes => dsl-grammar, judge, llm-examples, nlp-template-engine, report)

In [22]:
#% markdown
$gBestCode.nodes<report><result>

# Best generated code

Three Raku code generations were submitted for the spec:

```text

make a brand new recommender with the data @dsData;
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;


```

Here are the results:

<table border="1"><thead><tr><th>name</th><th>code</th></tr></thead><tbody><tr><td>dsl-grammar</td><td></td></tr><tr><td>llm-examples</td><td>ML::SparseMatrixRecommender.new(:ds =&gt; @dsData)<br/>.apply-term-weight-functions(&#39;IDF&#39;, &#39;None&#39;, &#39;Cosine&#39;)<br/>.recommend-by-profile({&#39;passengerSex.male&#39; =&gt; 1, &#39;passengerClass.1st&#39; =&gt; 1})<br/>.join-across(@dsData, on =&gt; &#39;id&#39;)<br/>.echo-value()</td></tr><tr><td>nlp-template-engine</td><td>my $smrObj = ML::SparseMatrixRecommender.new<br/>.create-from-wide-form([&quot;male&quot;, &quot;1st&quot;]set, item-column-name=&#39;id&#39;, :add-tag-types-to-column-names, tag-value-separator=&#39;:&#39;)<br/>.apply-term-weight-functions(&#39;IDF&#39;, &#39;None&#39;, &#39;Cosine&#39;)<br/>.recommend-by-profile([&quot;passengerSex:male&quot;, &quot;passengerClass:1st&quot;], 12, :!normalize)<br/>.join-across([&quot;male&quot;, &quot;1st&quot;]set)<br/>.echo-value();</td></tr></tbody></table>

## Judgement

```Raku
ML::SparseMatrixRecommender.new(:ds => @dsData)
.apply-term-weight-functions('IDF', 'None', 'Cosine')
.recommend-by-profile({'passengerSex.male' => 1, 'passengerClass.1st' => 1})
.join-across(@dsData, on => 'id')
.echo-value()
```

### Graph

In [12]:
#% html
$gBestCode.dot(engine => 'dot', :9graph-size, node-width => 1.7, node-color => 'grey', edge-color => 'grey', edge-width => 0.4, theme => 'default'):svg

---

## Fallback: DSL-grammar to LLM-examples

Instead of having DSL-grammar and LLM computations running in parallel, we can make LLM-graph in which the LLM computations are invoked if the DSL-grammar parsing-and-interpretation fails. Here is such a graph:

In [18]:
my %rules =
    dsl-grammar => { 
        eval-function => sub ($spec, $lang = 'Raku') { 
            my $res = ToDSLCode($spec, to => $lang, format => 'CODE'); 
            my $checkStr = 'my $obj = ML::SparseMatrixRecommender.new';
            return do with $res.match(/ $checkStr /):g { 
                $/.list.elems > 1 ?? $res.subst($checkStr) !! $res 
            }
        }
    },

    llm-examples => { 
        llm-function => 
            sub ($spec, $lang = 'Raku', $split = False) {
                my &llm-pipeline-segment = llm-example-function(dsl-examples(){$lang}<SMRMon>);
                return do if $split {
                    my @commands = $spec.lines;
                    @commands.map({ .&llm-pipeline-segment }).map({ .subst(/:i Output \h* ':'?/, :g).trim }).join("\n.")
                } else {
                    &llm-pipeline-segment($spec).subst(";\n", "\n."):g
                }
            },
        test-function => sub ($dsl-grammar) { !($dsl-grammar ~~ Str:D && $dsl-grammar.trim.chars) }
    },
    
    code => {
            eval-function => sub ($dsl-grammar, $llm-examples) {
                $dsl-grammar ~~ Str:D && $dsl-grammar.trim ?? $dsl-grammar !! $llm-examples
            }
    }        
;

my $gRobust = LLM::Graph.new(%rules):!async

LLM::Graph(size => 3, nodes => code, dsl-grammar, llm-examples)

Here the LLM graph run over a spec that can be parsed by DSL-grammar (notice the very short computation time):

In [8]:
my $spec = q:to/END/;
create from @dsData; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
END

$gRobust.eval(:$spec, lang => 'Raku', :!split)

LLM::Graph(size => 3, nodes => code, dsl-grammar, llm-examples)

Here is the obtained result:

In [9]:
$gRobust.nodes<code><result>

my $obj = ML::SparseMatrixRecommender.new.create-from-wide-form(@dsData).apply-term-weight-functions(global-weight-func => "IDF", local-weight-func => "None", normalizer-func => "Cosine").recommend-by-profile(["passengerSex:male", "passengerClass:1st"]).join-across(@dsData, on => "id" ).echo-value()

Here is the spec cannot be parsed by DSL-grammar interpreter:

In [19]:
my $spec = q:to/END/;
new recommender with @dsData, please; 
also apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
END

$gRobust.eval(:$spec, lang => 'Raku', :!split)

Cannot parse the command; error in rule recommender-object-phrase:sym<English> at line 1; target 'new recommender with @dsData, please' position 16; parsed 'new recommender', un-parsed 'with @dsData, please' .


LLM::Graph(size => 3, nodes => code, dsl-grammar, llm-examples)

Nevertheless, we obtain result via LLM-examples:

In [20]:
$gRobust.nodes<code><result>

ML::SparseMatrixRecommender.new(@dsData)
  .apply-term-weight-functions('IDF', 'None', 'Cosine')
  .recommend-by-profile({'passengerSex'=>'male', 'passengerClass'=>'1st'})
  .join-across(@dsData, on => 'id')
  .echo-value()

Here is the corresponding graph plot:

In [22]:
#% html
$gRobust.dot(engine => 'dot', :7graph-size, node-width => 1.7, node-color => 'grey', edge-color => 'grey', edge-width => 0.4, theme => 'default'):svg