# Robust LLM pipelines

### ***Raku version 0.9***

Anton Antonov   
[South FL Data Science Study Group](https://www.meetup.com/data-science-study-group-south-florida/events/301740295/)   
July 2024   


-------

## What is this about?

We focus on five ways to build robust LLM pipelines (from software architecture / engineering perspective.)   
The first (DSL) one is the most important.

1. DSL for configuration-execution-conversion
   - Infrastructural, language-design level solution
2. Detailed, well crafted prompts
   - AKA "Prompt engineering"
3. Few-shot training with examples
4. Via a Question Answering System (QAS) and code templates
5. Using grammars
   - Pareto principle application
   - Or for filtering of multiple outputs

-----

## Universality of the methodology

- Programmed in three different languages (Python, Raku, WL)

- The method(s) are applied regardless of the software support

- This presentation is with Raku-kernel Jupyter ***chatbook*** in VS Code
    - There is a corresponding Python chatbook package
    - Mathematica has chatbooks

**Remark:** The DSL for configuration-execution-conversion is described in detail in Stephen Wolfram's article: 

- ["The New World of LLM Functions: Integrating LLM Technology into the Wolfram Language"](https://writings.stephenwolfram.com/2023/05/the-new-world-of-llm-functions-integrating-llm-technology-into-the-wolfram-language/), [SW1].

-----

## Motivation example(s)

Here is a setup of an LLM persona that generates [Google Charts](https://developers.google.com/chart) code:

In [None]:
#% chat gc prompt, conf=chatgpt, model=gpt-4o, max-tokens=4096, temperature=0.4
@CodeWriterX|'Google Charts'

Because of recent events consider the following LLM request and responses:

In [None]:
#% chat gc > html
Show a regional map of Cuba and the Caribbean islands.

In [None]:
#% chat gc > html
Show a regional map of Cuba and the Caribbean islands.
Mark Havana's marine port. Use the div-id 'port'.

-----

## DSL

### Why?

Separation of:

  - LLM access configuration
  - Invocation
  - Post-processing of results

Note that:

- This is a fundamental, infrastructural design
- It is always applied regardless of how well is facilitated programmatically
- (And, yes, we claim it is facilitated very well with Python, Raku, WL implementations.)

### Example

Consider the following **LLM-function**:

In [7]:
my &gdp = llm-function(
    {"GDP of $^a in $^b. \n" ~ llm-prompt('NothingElse')('JSON')}, 
    e => $conf4o,
    form => sub-parser('JSON'):drop
)

-> **@args, *%args { #`(Block|3365964960968) ... }

In [8]:
my $res = &gdp('top 10 countries', 2022)

[countries => [{GDP => 25400, country => United States} {GDP => 17700, country => China} {GDP => 5040, country => Japan} {GDP => 4200, country => Germany} {GDP => 3200, country => India} {GDP => 3100, country => United Kingdom} {GDP => 2900, country => France} {GDP => 2100, country => Italy} {GDP => 2000, country => Canada} {GDP => 1800, country => South Korea}]]

In [9]:
.say for |$res

countries => [{GDP => 25400, country => United States} {GDP => 17700, country => China} {GDP => 5040, country => Japan} {GDP => 4200, country => Germany} {GDP => 3200, country => India} {GDP => 3100, country => United Kingdom} {GDP => 2900, country => France} {GDP => 2100, country => Italy} {GDP => 2000, country => Canada} {GDP => 1800, country => South Korea}]


**Question:** What did we expect to get as a result?

If the result has an *expected* shape we can do this plot:

In [None]:
#% html
js-google-charts('BarChart', 
    $res.Hash, 
    :$format, 
    :$backgroundColor,
    :$legendTextStyle,
    :$hAxis,
    :$vAxis,
    div-id => 'gdp'
)

Here is the configuration:

In [None]:
.say for |$conf4o.Hash

**Question:** Do we expect the same *data shape* of the results when running the LLM request / function:
- Multiple times
- With different LLMs / configurations
- With different parameters


---------

## Sequence diagrams

### Creation

Here is a sequence diagram that follows the steps of a typical creation procedure of LLM configuration- and evaluator objects, and the corresponding LLM-function that utilizes them:

In [10]:
#% mermaid
sequenceDiagram
  participant User
  participant llmfunc as llm-function
  participant llmconf as llm-configuration
  participant LLMConf as LLM configuration
  participant LLMEval as LLM evaluator
  participant AnonFunc as Anonymous function
  User ->> llmfunc: prompt<br>conf spec
  llmfunc ->> llmconf: conf spec
  llmconf ->> LLMConf: conf spec
  LLMConf ->> LLMEval: wrap with
  LLMEval ->> llmfunc: evaluator object
  llmfunc ->> AnonFunc:  create with:<br>evaluator object<br>prompt
  AnonFunc ->> llmfunc: handle
  llmfunc ->> User: handle

### Execution

Here is a sequence diagram for making a LLM configuration with a global (engineered) prompt, and using that configuration to generate a chat message response:

In [11]:
#% mermaid
sequenceDiagram
  participant WWWOpenAI as WWW::OpenAI
  participant User
  participant llmfunc as llm-function
  participant llmconf as llm-configuration
  participant LLMConf as LLM configuration
  participant LLMChatEval as LLM chat evaluator
  participant AnonFunc as Anonymous function
  User ->> llmconf: engineered prompt
  llmconf ->> User: configuration object
  User ->> llmfunc: prompt<br> configuration object
  llmfunc ->> LLMChatEval: configuration object
  LLMChatEval ->> llmfunc: evaluator object
  llmfunc ->> AnonFunc: create with:<br> evaluator object<br>prompt
  AnonFunc ->> llmfunc: handle
  llmfunc ->> User: handle
  User ->> AnonFunc: invoke with<br>message argument
  AnonFunc ->> WWWOpenAI: engineered prompt<br>message
  WWWOpenAI ->> User: LLM response 


-----

## On prompt engineering

It is important to have a good, large collection of LLM prompts, that is easy to search.

Also, the prompts should be ready to plug-in into LLM-functions or pipelines.

Some prompts from the collection are a "simple" strings, some are (function) templates.
Here is an example of the latter:

In [12]:
llm-prompt('NothingElse')

-> $a = "paragraph" { #`(Block|3366026942096) ... }

Here is the template above filled-in with "SQL":

In [13]:
llm-prompt('NothingElse')('SQL')

ONLY give output in the form of a SQL.
Never explain, suggest, or converse. Only return output in the specified form.
If code is requested, give only code, no explanations or accompanying text.
If a table is requested, give only a table, no other explanations or accompanying text.
Do not describe your output. 
Do not explain your output. 
Do not suggest anything. 
Do not respond with anything other than the singularly demanded output. 
Do not apologize if you are incorrect, simply try again, never apologize or add text.
Do not add anything to the output, give only the output as requested. Your outputs can take any form as long as requested.

--------

## Using examples

### Why?

- Easier that verbalizing a suitable prompt
- More concrete and precise results than with a prompt
- Fits inductive nature of LLMs
    - Hence, better results are expected

### Example

Consider the following number format normalization function:

In [14]:
my &num-norm = 
    llm-example-function([
        '1,034' => '1_034', '13,003,553' => '13_003_553', '9,323,003,553' => '9_323_003_553',
        '43 thousand USD' => '23E3', '3.9 thousand' => '3.9E3',
        '23 million USD' => '23E6', '2.3 million' => '2.3E6',
        '3.2343 trillion USD' => '3.2343E12', '0.3 trillion' => '0.3E12'
    ]);

-> **@args, *%args { #`(Block|3366027093120) ... }

We want to have a function that replaces whatever human readable number forms with "proper", "parsable" numbers:

In [15]:
&num-norm('3,78 thousand')

3.78E3

**Remark:** This a powerful technique which we are going to use the ***Grammar-LLM*** chain-of-responsibility example below.

------

## Using a Question Answering System (QAS)

### Why?

- With QAS we get answers to "parameter questions"
- It is *assumed* that getting smaller output is more robust that getting full blown code
- It is definitely cheaper
- LLMs would not know your "secret" programming code
- Its API, at most

### Example (mod graph)

In [16]:
my $query = q:to/END/;
I want to make a directed graph of the equivalence of the integers between 0 and 100 over mod 55.
END

I want to make a directed graph of the equivalence of the integers between 0 and 100 over mod 55.


In [17]:
my %ans =
    llm-textual-answer($query, [
        'What is the number range?',
        'What is the left boundary of the number range?',
        'What is the right boundary of the number range?',
        'Which mod to use?',
        'Is the graph directed or not? True/False'
    ],
    e => $conf3
):pairs;

.say for |%ans

What is the number range? => 0 to 100
Which mod to use? => 55
What is the left boundary of the number range? => 0
What is the right boundary of the number range? => 100
Is the graph directed or not? True/False => True


In [18]:
my @redges = (
    (%ans{'What is the left boundary of the number range?'}).Int
    ..
    (%ans{'What is the right boundary of the number range?'}).Int
).map({ $_.Str => (($_ ** 2) mod (%ans{'Which mod to use?'}.Int)).Str });

my $gMod = Graph.new(@redges, directed => %ans{'Is the graph directed or not? True/False'}.lc eq 'true' ?? True !! False)

Graph(vertexes => 101, edges => 101, directed => True)

In [None]:
#%js
js-d3-graph-plot(
    $gMod.edges(:dataset),
    :$background, :$title-color, :$edge-thickness, 
    vertex-size => 5,
    vertex-color => 'SlateBlue',
    directed => $gMod.directed,
    title => "Mod {%ans{'Which mod to use?'}} graph", 
    width => 800,
    height => 800, 
    force => {charge => {strength => -100}, link => {minDistance => 4}}
    )

### Example 2 (LSA pipeline)

In [19]:
my $lsaCommand = q:to/END/;
Extract 20 topics from the text corpus aAbstracts using the method NNMF. 
Show statistical thesaurus with the words neural, function, and notebook.
END

Extract 20 topics from the text corpus aAbstracts using the method NNMF. 
Show statistical thesaurus with the words neural, function, and notebook.


In [20]:
concretize($lsaCommand, template => 'LatentSemanticAnalysis', lang => 'Python', llm => 'gemini', model => 'gemini-1.5-flash');

lsaObj = (LatentSemanticAnalyzer()
.make_document_term_matrix(docs=aAbstracts,
                           stop_words=Automatic,
                           stemming_rules=False,
                           min_length=3)
.apply_term_weight_functions(global_weight_func="IDF",
						   local_weight_func="None",
						   normalizer_func="Cosine")
.extract_topics(number_of_topics=40, min_number_of_documents_per_term=20, method="NNMF", max_steps = 16)
.echo_topics_interpretation(number_of_terms=20, wide_form=True)
.echo_statistical_thesaurus(terms=stemmerObj.stemWords(["neural", "function", "notebook"]),
						  wide_form=True,
						  number_of_nearest_neighbors=12,
						  method="cosine",
						  echo_function=lambda x: print(x.to_string())))

**Remark:** The LSA pipeline produced above might need manual editing -- the target users of `concretize` are data scientists (full- or part-time.)

An extension of that idea is to have an LLM-based classifier:

In [21]:
llm-classify($lsaCommand, <Classification LatentSemanticAnalysis QuantilerRegression Recommendations>, e => $conf3)

[LatentSemanticAnalysis]

**REMARK:** Using different models for `llm-classify` can produce different results. Because of that: 

1. Accuracy, precision, and recall have to be evaluated for the queries we focus on
2. The making of a more dedicated LLM classification persona should be considered

-------

## Grammar-LLM chain-of-responsibility utilization

### Why?

If we can make a grammar that can handle, say, 60÷80% of the user-formulated requests (in a given problem area) why use LLMs all the time?

### Example

Setup:

- Assume we have conducted an opinion pull about programming languages.
- We want to collect the number of positive and negative opinions for each programming language.
- Most responders used simple statements, but some used elaborated ones.

Here is a grammar for parsing responses of language preferences questionnaire: 

In [22]:
my grammar LoveHateProg {
	rule TOP { <who> <verb> <lang> <:punct>? }
	regex who { 'I' | 'We' }
	regex verb { 'love' | 'hate' | '♥️'* | '🤮' }
	regex lang { 'Julia' | 'Perl' | 'Python' | 'R' | 'Raku' | 'WL' }
}

(LoveHateProg)

Here are corresponding actions:

In [23]:
my class LangPref {
    method TOP($/) { make $<lang>.made => $<verb>.made }
    method verb($/) { make so $/.Str ~~ / love | '♥️' / }
    method lang($/) { make $/.Str }
}

(LangPref)

Here is how such are response is parsed:

In [25]:
LoveHateProg.parse('I 🤮 R', actions => LangPref).made

R => False

Make an LLM function that produces similar results:

In [26]:
my &flop = llm-example-function([
    'I love R' => '{"R" : true}', 
    'I love Pytgon' => '{"Python" : true}', 
    'I love Mathematica' => '{"WL" : true}', 
    'We think we like Python' => '{"Python" : true}',
    'We hate Perl' => '{"Perl" : false}',
    'I ❤️ Perl' => '{"Perl" : true}',
    'we 🤮 Python' => '{"Python" : false}'],
    e => $conf4o,
    form => sub-parser('JSON'):drop
)

-> **@args, *%args { #`(Block|3365994786192) ... }

In [27]:
&flop("We like R most of the time")

{R => True}

Make a function:

In [28]:
sub get-lang-opinion($st) {
    my $op = LoveHateProg.parse($st, actions => LangPref).made;
    return 'Grammar' => $op with $op;
    return 'LLM' => &flop($st).head;
}

&get-lang-opinion

Here are statements and their parsing:

In [29]:
my @statements = [
    "I love Pytgon",
    "I love Python",
    "I 🤮 WL",
    "We like Perl",
    "We like R most of the time",
    "I hate Python",
    "I like WL",
    "I hate Mathematica",
    "I ♥️ Raku"
];

my @res = @statements.map({ get-lang-opinion($_) });

@res.elems

9

Here are the *parsed* results:

In [30]:
.say for @res

LLM => Python => True
Grammar => Python => True
Grammar => WL => False
LLM => Perl => True
LLM => R => True
Grammar => Python => False
LLM => WL => True
LLM => WL => False
Grammar => Raku => True


**Remark:** Note that misspelled languages, unknown verbs, and longer statements are LLM-handled. The rest are grammar-handled.

Collection:

In [31]:
my @opinions = @res».value.classify(*.key).map({ $_.key => $_.value.categorize(*.value).nodemap(*.elems) });

.say for @opinions;

Raku => {True => 1}
Perl => {True => 1}
R => {True => 1}
WL => {False => 2, True => 1}
Python => {False => 1, True => 2}


---------

## Using Literate Programming scripts

Here we ingest some text (from Wikipedia):

In [36]:
my $txtEN = data-import('https://en.wikipedia.org/wiki/Béla_Bollobás', 'plaintext');

text-stats($txtEN);

(chars => 16961 words => 2211 lines => 313)

Here we extract a table themes:

In [39]:
#% html
my $tblThemes = llm-synthesize(llm-prompt("ThemeTableJSON")($txtEN, "article", 10), e => $conf4o, form => sub-parser('JSON'):drop);
$tblThemes ==> data-translation(field-names=><theme content>)

theme,content
Early life and education,Participated in International Mathematical Olympiads; studied under Paul Erdős.
Career,Fellow of Trinity College; research in combinatorics and graph theory.
Awards and honours,Fellow of Royal Society; multiple prestigious awards.
Personal life,Married to Gabriella; has one son; sports enthusiast.
Selected works,"Authored books on graph theory, combinatorics, and percolation."
References,Citations and sources for information on Béla Bollobás.
External links,Links to interviews and additional resources.


We can have Literate Programming (LP) scripts that have several text-processing steps above.

*(Live demo with "Can AI Solve Science.")*

-----

## References

### Articles, blog posts

[AA1] Anton Antonov,
["Workflows with LLM functions"](https://rakuforprediction.wordpress.com/2023/08/01/workflows-with-llm-functions/),
(2023),
[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).

[SW1] Stephen Wolfram,
["The New World of LLM Functions: Integrating LLM Technology into the Wolfram Language"](https://writings.stephenwolfram.com/2023/05/the-new-world-of-llm-functions-integrating-llm-technology-into-the-wolfram-language/),
(2023),
[Stephen Wolfram's Writings](https://writings.stephenwolfram.com).

### Notebooks

[AAn1] Anton Antonov,
["Workflows with LLM functions (in Raku)"](https://community.wolfram.com/groups/-/m/t/2982320),
(2023),
[Wolfram Community](https://community.wolfram.com).

[AAn2] Anton Antonov,
["Workflows with LLM functions (in Python)"](https://community.wolfram.com/groups/-/m/t/3027081),
(2023),
[Wolfram Community](https://community.wolfram.com).

[AAn3] Anton Antonov,
["Workflows with LLM functions (in WL)"](https://community.wolfram.com/groups/-/m/t/3027081),
(2023),
[Wolfram Community](https://community.wolfram.com).


### Packages

#### Raku

[AAp1] Anton Antonov,
[LLM::Functions Raku package](https://github.com/antononcube/Raku-LLM-Functions),
(2023-2024),
[GitHub/antononcube](https://github.com/antononcube).
([raku.land](https://raku.land/zef:antononcube/LLM::Functions))

[AAp2] Anton Antonov,
[LLM::Prompts Raku package](https://github.com/antononcube/Raku-LLM-Prompts),
(2023-2024),
[GitHub/antononcube](https://github.com/antononcube).
([raku.land](https://raku.land/zef:antononcube/LLM::Prompts))

[AAp3] Anton Antonov,
[Jupyter::Chatbook Raku package](https://github.com/antononcube/Raku-Jupyter-Chatbook),
(2023-2024),
[GitHub/antononcube](https://github.com/antononcube).
([raku.land](https://raku.land/zef:antononcube/Jupyter::Chatbook))

#### Python

[AAp4] Anton Antonov,
[LLMFunctionObjects Python package](https://pypi.org/project/LLMFunctionObjects/),
(2023-2024),
[PyPI.org/antononcube](https://pypi.org/user/antononcube).

[AAp5] Anton Antonov,
[LLMPrompts Python package](https://pypi.org/project/LLMPrompts/),
(2023-2024),
[GitHub/antononcube](https://pypi.org/user/antononcube/).

[AAp6] Anton Antonov,
[JupyterChatbook Python package](https://pypi.org/project/JupyterChatbook/),
(2023-2024),
[GitHub/antononcube](https://pypi.org/user/antononcube/).

[MWp1] Marc Wouts,
[jupytext Python package](https://github.com/mwouts/jupytext),
(2021-2024),
[GitHub/mwouts](https://github.com/mwouts).

#### R

[TKp1] Tomasz Kalinowski, Kevin Ushey, JJ Allaire, RStudio, Yuan Tang,
[reticulate R package](https://rstudio.github.io/reticulate/),
(2016-2024)


### Videos

[AAv1] Anton Antonov,
["Integrating Large Language Models with Raku"](https://www.youtube.com/watch?v=-OxKqRrQvh0),
(2023),
[The Raku Conference 2023 at YouTube](https://www.youtube.com/@therakuconference6823).

------

## *Setup*

In [None]:
use LLM::Configurations;
use ML::FindTextualAnswer;
use ML::NLPTemplateEngine;
use Hash::Merge;
use Graph;

use JavaScript::D3;
use JavaScript::Google::Charts;

use Data::Reshapers;
use Data::Summarizers;
use Data::Generators;
use Data::Importers;

### JavaScript

Here we prepare the notebook to visualize with JavaScript:

In [None]:
#% javascript
require.config({
     paths: {
     d3: 'https://d3js.org/d3.v7.min'
}});

require(['d3'], function(d3) {
     console.log(d3);
});

Verification:

In [None]:
#% js
js-d3-list-line-plot(10.rand xx 40, background => 'none', stroke-width => 2)

Here we set a collection of visualization variables:

In [None]:
my $title-color = 'Ivory';
my $stroke-color = 'SlateGray';
my $tooltip-color = 'LightBlue';
my $tooltip-background-color = 'none';
my $background = '1F1F1F';
my $color-scheme = 'schemeTableau10';
my $edge-thickness = 3;
my $vertex-size = 6;
my $mmd-theme = q:to/END/;
%%{
  init: {
    'theme': 'forest',
    'themeVariables': {
      'lineColor': 'Ivory'
    }
  }
}%%
END
my %force = collision => {iterations => 0, radius => 10},link => {distance => 180};
my %force2 = charge => {strength => -30, iterations => 4}, collision => {radius => 50, iterations => 4}, link => {distance => 30};

my %opts = :$background, :$title-color, :$edge-thickness, :$vertex-size;

{background => 1F1F1F, edge-thickness => 3, title-color => Ivory, vertex-size => 6}

### Google Charts

In [None]:
my $format = 'html';
my $titleTextStyle = { color => 'Ivory' };
my $backgroundColor = '#1F1F1F';
my $legendTextStyle = { color => 'Silver' };
my $legend = { position => "none", textStyle => {fontSize => 14, color => 'Silver'} };

my $hAxis = { title => 'x', titleTextStyle => { color => 'Silver' }, textStyle => { color => 'Gray'}, logScale => False, format => 'scientific'};
my $vAxis = { title => 'y', titleTextStyle => { color => 'Silver' }, textStyle => { color => 'Gray'}, logScale => False, format => 'scientific'};

my $annotations = {textStyle => {color => 'Silver', fontSize => 10}};
my $chartArea = {left => 50, right => 50, top => 50, bottom => 50, width => '90%', height => '90%'};

{bottom => 50, height => 90%, left => 50, right => 50, top => 50, width => 90%}