# TLDR LLM solutions for ~not reading~ software manuals 

#### Anton Antonov   
#### RakuForPrediction at WordPress   
#### RakuForPrediction-book at GitHub
#### August 2023

------

## Introduction

In this [Jupyter notebook](https://github.com/antononcube/RakuForPrediction-book/blob/main/Notebooks/Jupyter/TLDR-LLM-solutions-for-software-manuals.ipynb) we use [Large Language Model (LLM) functions](https://rakuforprediction.wordpress.com/2023/08/01/workflows-with-llm-functions/), [AAp1, AA1], for generating (hopefully) executable, correct, and harmless code for Operating System resources managements.

In order to be concrete and useful, we take the Markdown files of the articles ["It's time to rak!"](https://dev.to/lizmat/series/20329), [EM1], that explain the motivation and usage of the Raku module ["App::Rak"](https://raku.land/zef:lizmat/App::Rak), [EMp1], and we show how meaningful, file finding shell commands can be generated via LLMs exposed to the code-with-comments from those articles.

In other words, we prefer to apply the attitude Too Long; Didn't Read (TLDR) to the articles and related Raku module 
[README](https://github.com/lizmat/App-Rak/blob/main/README.md) (or user guide) file.
(Because "App::Rak" is useful, but it has too many parameters that we prefer not to learn that much about.)  

**Remark:** We say that "App::Rak" uses a Domain Specific Language (DSL), which is done with Raku's Command Line Interface (CLI) features.

### Procedure outline

1. Clone the corresponding [article repository](https://github.com/lizmat/articles)
2. Locate and ingest the "App::Rak" dedicated Markdown files
3. Extract code blocks from the Markdown files
   - Using ["Markdown::Grammar"](https://raku.land/zef:antononcube/Markdown::Grammar), [AAp5], functions
4. Get comment-and-code line pairs from the code blocks
   - Using Raku text manipulation capabilities
      - (After observing code examples) 
5. Generate from the comment-and-code pairs LLM few-shot training rules
6. Use the LLM example function to translate natural language commands into (valid and relevant) "App::Rak" DSL commands
   - With a few or a dozen natural language commands 
7. Use LLMs to generate natural language commands in order to test LLM-TLDR-er further 

Step 6 says how we do our TLDR -- we use LLM-translations of natural language commands.

### Alternative procedure

Instead of using Raku to process text we can make LLM functions for extracting the comment-and-code pairs.
(That is also shown below.)    

### Extensions

1. Using LLMs to generate:
    - Stress tests for "App::Rak"
    - Variants of the gathered commands
        - And make new training rules with them
    - EBNF grammars for gathered commands
2. Compare OpenAI and PaLM and or their different models 
    - Which one produces best results?
    - Which ones produce better result for which subsets of commands?

### Article's structure

The exposition below follows the outlines of procedure subsections above. 

The stress-testing extensions and EBNF generation extension have thier own sections: "Translating randomly generated commands" and "Grammar generation" respectively. 

**Remark:** The article/document/notebook was made with the Jupyter framework, using the Raku package ["Jupyter::Kernel"](https://raku.land/cpan:BDUGGAN/Jupyter::Kernel), [BD1].  

--------

## Setup

In [1]:
use Markdown::Grammar;
use Data::Reshapers;
use Data::Summarizers;
use LLM::Functions;
use Text::SubParsers;
use Data::Translators;

------

## Workflow

### File names

In [2]:
my $dirName = $*HOME ~ '/GitHub/lizmat/articles';
my @fileNames = dir($dirName).grep(*.Str.contains('time-to-rak'));
@fileNames.elems

4

### Texts ingestion

Here we ingest the text of each file:

In [3]:
my %texts = @fileNames.map({ $_.basename => slurp($_) });
%texts.elems

4

Here are the number of characters per document:

In [4]:
%texts>>.chars

{its-time-to-rak-1.md => 7437, its-time-to-rak-2.md => 8725, its-time-to-rak-3.md => 14181, its-time-to-rak-4.md => 9290}

Here are the number of words per document:

In [5]:
%texts>>.words>>.elems

{its-time-to-rak-1.md => 1205, its-time-to-rak-2.md => 1477, its-time-to-rak-3.md => 2312, its-time-to-rak-4.md => 1553}

### Get Markdown code blocks

With the function `md-section-tree` we extract code blocks from Markdown documentation files into data structures amenable for further programmatic manipulation (in Raku.)
Here we get code blocks from each text:

In [6]:
my %docTrees = %texts.map({ $_.key => md-section-tree($_.value, modifier => 'Code', max-level => 0) });
%docTrees>>.elems

{its-time-to-rak-1.md => 1, its-time-to-rak-2.md => 11, its-time-to-rak-3.md => 24, its-time-to-rak-4.md => 16}

Here we put all blocks into one array:

In [7]:
my @blocks = %docTrees.values.Array.&flatten;
@blocks.elems

52

### Extract command-and-code line pairs

Here from each code block we parse-extract comment-and-code pairs and we form the LLM training rules:

In [8]:
my @rules;
@blocks.map({ 
    given $_ { 
        for m:g/ '#' $<comment>=(\V+) \n '$' $<code>=(\V+) \n / -> $m {
           @rules.push( ($m<comment>.Str.trim => $m<code>.Str.trim) ) 
         } } }).elems

52

Here is the number of rules:

In [9]:
@rules.elems

69

Here is a sample of the rules:

In [10]:
.say for @rules.pick(4)

Show all lines with numbers between 1 and 65 => rak '/ \d+ <?{ 1 <= $/.Int <= 65 }> /'
same, without equal sign => rak foo -C4
search for 'sub min' in Rakudo's source => rak 'sub min' --rakudo
save --ignorecase as -i, without description => rak --ignorecase --save=i


### Nice tabulation with LLM function

In order to tabulate "nicely" the rules in the Jupyter notebook, we make an LLM functions to produce an HTML table and then specify the corresponding "magic cell." 
(This relies on the Jupyter-magics features of [BDp1].) Here is an LLM conversion function, [AA1]:

In [11]:
my &ftbl = llm-function({"Convert the $^a table $^b into an HTML table."}, e=>llm-configuration('PaL<', max-tokens=>800))

-> **@args, *%args { #`(Block|5984101446384) ... }

Here is the HTML table derivation:

In [12]:
%%html
my $tblHTML=&ftbl("plain text", to-pretty-table(@rules.pick(12).sort, align => 'l', field-names => <Key Value>))

Key,Value
"Look for ""seven"" at the start of all lines in file ""twenty""",rak --type=starts-with seven twenty
"Look for ""six"" as a word on any line in file ""twenty""",rak --type=words six twenty
"Look for ""ve"" at the end of all lines in file ""twenty""",rak --type=ends-with ve twenty
Only accept Raku and Markdown files,"rak foo --extensions=#raku,#markdown"
"Search for ""foo"" in the ""lib"" directory",rak foo lib
"Show all unique ""name"" fields in JSON files",rak --json-per-file '*' --unique
"Show the lines starting with ""e""",rak ^e twenty
"look for ""Foo"", while taking case into account",rak Foo
"look for ""foo"" in all files",rak foo
remove the --frobnicate custom option,rak --save=frobnicate


### Nice tabulation with "Markdown::Grammar"

Instead of using LLMs for HTML conversion it is more "productive" to use the HTML interpreter provided by "Markdown::Grammar":

In [13]:
%%html
sub to-html($x) { md-interpret($x.Str.lines[1..*-2].join("\n").subst('+--','|--', :g).subst('--+','--|', :g), actions=>Markdown::Actions::HTML.new) }
to-pretty-table(@rules.pick(12).sort) ==> to-html

Value,Key
"rak '.subst(""teen"",10)' twenty --type=code","Convert all occurrences of ""teen"" into ""10"""
rak --type=contains ve twenty,"Look for ""ve"" anywhere on any line in file ""twenty"""
"rak foo --extensions=bat,ps1",Only accept files with the .bat or the .ps1 extension
rak foo --extensions=,Only accept files without extension
rak foo *,Search all files and all subdirectories
rak '{.tc if /^ t /}' twenty,"Show all lines starting with ""t"" titlecased"
rak '/ h t? /' twenty,"Show all lines that have an ""h"" in them, optionally followed by a ""t"""
rak '/ \d+ /',Show all lines with numbers between 1 and 65
rak §six twenty,"Show the lines that contain ""six"" as a word"
rak --help=filesystem --pager=less,produce extensive help on filesystem filters


**Remark:** Of course, in order to program the above sub we need *to know* how to use "Markdown::Grammar". Producing HTML tables with LLMs is much easier -- only knowledge of "spoken English" is required.   

Alternatively, we can use the functions `to-dataset` and `data-translation` provided by "Data::Translators", [AAp6]:

In [14]:
%%html
@rules.pick(12).Array ==> to-dataset() ==> data-translation

Value,Key
rak --paths='~/Github/rakudo' --under-version-control --save=rakudo,save searching in Rakudo's committed files as --rakudo
rak foo,"look for ""foo"" in all files"
rak foo lib,"Search for ""foo"" in files of the ""lib"" directory"
rak --type=regex 'e.*t' twenty,"number of characters between them (.*), in file twenty"
rak ^seven$ twenty,"Show all the lines that consist of ""seven"""
rak --type=contains ve twenty,"Look for ""ve"" anywhere on any line in file ""twenty"""
rak --find --/file,Show all directory names from current directory down
rak foo -C,"search for ""foo"" and show two lines of context"
rak --description='Do not care about case' --save=i,add / change description -i at a later time
rak ^e twenty,"Show the lines starting with ""e"""


### Code generation examples

Here we define an LLM function for generating "App::Rak" shell commands:

In [15]:
my &frak = llm-example-function(@rules, e => llm-evaluator('PaLM'))

-> **@args, *%args { #`(Block|5984270256032) ... }

In [16]:
my @cmds = ['Find files that have ".nb" in their names', 'Find files that have ".nb"  or ".wl" in their names',
 'Show all directories of the parent directory', 'Give me files without extensions and that contain the phrase "notebook"', 
 'Show all that have extension raku or rakumod and contain Data::Reshapers'];

my @tbl = @cmds.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;

@tbl.&dimensions

(5 2)

Here is a table showing the natural language commands and the corresponding translations to the "App::Rak" CLI DSL:

In [17]:
%%html
@tbl ==> data-translation(field-names => <Command App::Rak>)

Command,App::Rak
"Find files that have "".nb"" in their names",rak --extensions=nb
"Find files that have "".nb"" or "".wl"" in their names","rak --extensions=nb,wl --find"
Show all directories of the parent directory,rak --find --/file --parent
"Give me files without extensions and that contain the phrase ""notebook""",rak --extensions= --type=contains notebook
Show all that have extension raku or rakumod and contain Data::Reshapers,"rak --extensions=raku,rakumod '/ Data::Reshapers /'"


### Verification

Of course, the obtained "App::Rak" commands have to be verified to:
- Work  
- Produce expected results

We can program to this verification with Raku or with the Jupyter framework, but we not doing that here.
(We do the verification manually outside of this notebook.)

**Remark:** I tried a dozen of generated commands. Most *worked*. One did not work because of the current limitations of "App::Rak". Others needed appropriate nudging to produce the desired results.

Here is an example of command that produces code that "does not work":

In [18]:
&frak("Give all files that have extensions .nd and contain the command Classify")

rak '*.nd < Classify' --extensions=nd

Here are a few more:

In [19]:
&frak("give the names of all files in the parent directory")

rak --find --parent

In [20]:
&frak("Find all directories in the parent directory")

rak --find --/file --parent

Here is a generated command that exposes an "App::Rak" [limitation](https://github.com/lizmat/App-Rak/issues/44):

In [21]:
&frak("Find all files in the parent directory")

rak --find --parent

-------

## Translating randomly generated commands

Consider testing the applicability of the approach by generating a "good enough" sample of natural language commands for finding files or directories.

We can generate such commands via LLM. Here we define an LLM function with two parameters the returns a Raku list:

In [22]:
my &fcg = llm-function({"Generate $^_a natural language commands for finding $^b in a file system. Give the commands as a JSON list."}, form => sub-parser('JSON'))

-> **@args, *%args { #`(Block|5984150766584) ... }

In [23]:
my @gCmds1 = &fcg(4, 'files').flat;
@gCmds1.raku

["Find all files with the word 'document' in their name", "Show me all the files in my Downloads folder", "List all files with the .pdf extension", "Where are the files I saved last week?"]

Here are the corresponding translations to the "App::Rak" DSL:

In [24]:
%%html
my @tbl1 = @gCmds1.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl1 ==> to-pretty-table(align=>'l', field-names => <Command App::Rak>) ==> to-html

Command,App::Rak
Find all files with the word 'document' in their name,rak --find document
Show me all the files in my Downloads folder,rak --find Downloads
List all files with the .pdf extension,rak --extensions=pdf
Where are the files I saved last week?,rak --saves --last-week


Let use redo the generation and translation using different specs: 

In [25]:
my @gCmds2 = &fcg(4, 'files that have certain extensions or contain certain words').flat;
@gCmds2.raku

["Find all files with the extension .pdf", "Find all files with the word 'training' in their name", "Search for all files with the extension .txt", "Look for all files with the word 'manual' in their name"]

In [26]:
%%html
my @tbl2 = @gCmds2.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl2 ==> to-pretty-table( align=>'l', field-names => <Command App::Rak>) ==> to-html

Command,App::Rak
Find all files with the extension .pdf,rak --extensions=pdf
Find all files with the word 'training' in their name,rak training --find
Search for all files with the extension .txt,rak --extensions=txt
Look for all files with the word 'manual' in their name,rak manual --find


**Remark:** Ideally, there would be an LLM-based system that 1) hallucinates "App::Rak" commands, 2) executes them, and 3) files GitHub issues if it thinks the results are sub-par.
(All done authomatically.) On a more practical note, we can use a system that has the first two components "only" to stress test "App::Rak".

-------

## Alternative programming with LLM

In this subsection we show how to extract comment-and-code pairs using LLM functions. (Instead of *working hard* with Raku regexes.)

Here is LLM function that specifies the extraction:

In [27]:
my &fcex = llm-function({"Extract consecutive line pairs in which the first start with '#' and second with '\$' from the text $_. Group the lines as key-value pairs and put them in JSON format."}, 
form => 'JSON') 

-> **@args, *%args { #`(Block|5984150853688) ... }

Here are three code blocks:

In [28]:
my @focusInds = [3, 12, 45];
[@blocks[@focusInds], ] ==> transpose() ==> to-pretty-table(align=>'l')

+----------------------------------------------------------+
| 0                                                        |
+----------------------------------------------------------+
| ```                                                      |
| # Look for "ve" at the end of all lines in file "twenty" |
| $ rak --type=ends-with ve twenty                         |
| twenty                                                   |
| 5:fi𝐯𝐞                                                   |
| 12:twel𝐯𝐞                                                |
| ```                                                      |
| ```                                                      |
| # Show the lines containing "ne"                         |
| $ rak ne twenty                                          |
| twenty                                                   |
| 1:o𝐧𝐞                                                    |
| 9:ni𝐧𝐞                                                   |
| 19:ni𝐧𝐞teen           

Here we extract the command-and-code lines from the code blocks:

In [29]:
my $exRes = &fcex(@blocks[@focusInds])

error	Cannot interpret the given input with the given spec 'JSON'.
input	{
  "# Look for "ve" at the end of all lines in file "twenty"": "$ rak --type=ends-with ve twenty",
  "# Show the lines containing "ne"": "$ rak ne twenty",
  "# save --after-context as -A, requiring a value": "$ rak --after-context=! --save=A",
  "# save --before-context as -B, requiring a value": "$ rak --before-context=! --save=B",
  "# save --context as -C, setting a default of 2": "$ rak --context='[2]' --save=C",
  "# search for "foo" and show two lines of context": "$ rak foo -C",
  "# search for "foo" and show 4 lines of context": "$ rak foo -C=4",
  "# same, without equal sign": "$ rak foo -C4"
}
parsed	

In [30]:
$exRes.raku

&CORE::infix:<orelse>(Failure.new(exception => X::AdHoc.new(payload => ${:error("Cannot interpret the given input with the given spec 'JSON'."), :input("\{\n  \"# Look for \"ve\" at the end of all lines in file \"twenty\"\": \"\$ rak --type=ends-with ve twenty\",\n  \"# Show the lines containing \"ne\"\": \"\$ rak ne twenty\",\n  \"# save --after-context as -A, requiring a value\": \"\$ rak --after-context=! --save=A\",\n  \"# save --before-context as -B, requiring a value\": \"\$ rak --before-context=! --save=B\",\n  \"# save --context as -C, setting a default of 2\": \"\$ rak --context='[2]' --save=C\",\n  \"# search for \"foo\" and show two lines of context\": \"\$ rak foo -C\",\n  \"# search for \"foo\" and show 4 lines of context\": \"\$ rak foo -C=4\",\n  \"# same, without equal sign\": \"\$ rak foo -C4\"\n}"), :parsed(Empty)}), backtrace => Backtrace.new), *.self)

In [31]:
%%html
&fcex(@blocks[@focusInds]) ==> to-dataset() ==> data-translation

Key,Value
"# save --context as -C, setting a default of 2",$ rak --context='[2]' --save=C
# search for 'foo' and show two lines of context,$ rak foo -C
# Show the lines containing 'ne',$ rak ne twenty
"# save --after-context as -A, requiring a value",$ rak --after-context=! --save=A
"# same, without equal sign",$ rak foo -C4
# Look for 've' at the end of all lines in file 'twenty',$ rak --type=ends-with ve twenty
# search for 'foo' and show 4 lines of context,$ rak foo -C=4
"# save --before-context as -B, requiring a value",$ rak --before-context=! --save=B


-------

## Grammar generation

The "right way" of translating natural language DSLs to CLI DSLs like the one of "App::Rak" is to make a grammar for the natural language DSL and the corresponding interpreter.
This might be a lengthy process, so, we might consider replacing it, or jump-starting it, with LLM-basd grammar generation:
we ask an LLM to generate a grammar for a collection DSL sentences. (For example, the keys of the rules above.) 
In this subsection we make a "teaser" demonstration of latter approach.

Here we create an LLM function for generating grammars over collections of sentences:

In [32]:
my &febnf = llm-function({"Generate an $^a grammar for the collection of sentences:\n $^b "}, e => llm-configuration("OpenAI", max-tokens=>900))

-> **@args, *%args { #`(Block|5984270469112) ... }

Here we generate an EBNF grammar for the "App::Rak" code-example commands:

In [33]:
my $ebnf = &febnf('EBNF', @rules>>.key)



SentenceList -> Sentence | Sentence ListSentence

Sentence -> LookFor | Generate | Convert | Produce | Show | Reverse | Add | Save | Same | SetUp | Remove | Check | Start | Find | ShowUnique | ShowNum

LookFor -> "Look for" Literal WordsOrWord Parts? "in file" Filename

WordsOrWordParts -> Word | WordParts

WordParts -> WordPart ("or" WordPart)*

WordPart -> "ve" | "six" | "seven" | "eight" | "y" | "Y" | "u" | "teen" | "ne" | "e" | "t" | "h" | "foo" | "bar" | "fi"

Literal -> ("anywhere" | "as a word" | "at the start of all lines" | "at the end of all lines" | "as the whole line") ("having" AdditionalMarks)? ("containing" AddContains)? ("matching" AddMatches)? ("between" NumberOfCharacters)?

NumberOfCharacters -> "'" CharRange "'"

CharRange -> Char ("-" Char)?

Char -> [A-Za-z0-9]

AdditionalMarks -> [A-Za-z0-9 ]*

AddContains -> "y" | "Y"

AddMatches -> [A-Za-z0-9]+

Filename -> [A-Za-z0-9]+

Generate -> "Generate" HelpSpec

HelpSpec -> "extensive help on" (HaystackSpec | Filesyst

------

## References

### Articles

[AA1] Anton Antonov, ["Workflows with LLM functions"](https://rakuforprediction.wordpress.com/2023/08/01/workflows-with-llm-functions/), (2023), [RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).

[AA2] Anton Antonov, ["Graph representation of grammars"](https://rakuforprediction.wordpress.com/2023/07/06/graph-representation-of-grammars/), (2023), [RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).

[EM1] Elizabeth Mattijsen, ["It's time to rak! Series' Articles"](https://dev.to/lizmat/series/20329), (2022), [Lizmat series at Dev.to](https://dev.to/lizmat/series).

### Packages, repositories

[AAp1] Anton Antonov, [LLM::Functions Raku package](https://github.com/antononcube/Raku-LLM-Functions), (2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp2] Anton Antonov, [WWW::OpenAI Raku package](https://github.com/antononcube/Raku-WWW-OpenAI), (2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp3] Anton Antonov, [WWW::PaLM Raku package](https://github.com/antononcube/Raku-WWW-PaLM), (2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp4] Anton Antonov, [Text::SubParsers Raku package](https://github.com/antononcube/Raku-Text-SubParsers), (2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp5] Anton Antonov, [Markdown::Grammar Raku package](https://github.com/antononcube/Raku-Markdown-Grammar), (2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp6] Anton Antonov, [Data::Translators Raku package](https://github.com/antononcube/Raku-Data-Translators), (2023), [GitHub/antononcube](https://github.com/antononcube).

[BDp1] Brian Duggan, [Jupyter::Kernel Raku package](https://raku.land/cpan:BDUGGAN/Jupyter::Kernel), (2017-2023), [GitHub/bduggan](https://github.com/bduggan/raku-jupyter-kernel).

[EMp1] Elizabeth Mattijsen, [App::Rak Raku package](https://github.com/lizmat/App-Rak), (2022-2023), [GitHub/lizmat](https://github.com/lizmat).

[EMr1] Elizabeth Mattijsen, [articles](https://github.com/lizmat/articles), (2018-2023) [GitHub/lizmat](https://github.com/lizmat).