# TLDR LLM solutions for ~not reading~ software manuals 

#### Anton Antonov   
#### RakuForPrediction at WordPress   
#### RakuForPrediction-book at GitHub
#### August 2023

------

## Introduction

In this [Jupyter notebook](https://github.com/antononcube/RakuForPrediction-book/blob/main/Notebooks/Jupyter/TLDR-LLM-solutions-for-software-manuals.ipynb) we use [Large Language Model (LLM) functions](https://rakuforprediction.wordpress.com/2023/08/01/workflows-with-llm-functions/), [AAp1, AA1], for generating (hopefully) executable, correct, and harmless code for Operating System resources managements.

In order to be concrete and useful, we take the Markdown files of the articles ["It's time to rak!"](https://dev.to/lizmat/series/20329), that explain the motivation and usage of the Raku module ["App::Rak"](https://raku.land/zef:lizmat/App::Rak), and we show how meaningful, file finding shell commands can be generated via LLMs exposed to the code-with-comments from those articles.

In other words, we prefer to apply the attitude Too Long; Didn't Read (TLDR) to articles and related Raku module [help](https://github.com/lizmat/App-Rak/blob/main/README.md).
(Because "App::Rak" is useful, but it has too many parameters that we prefer not learn that much about.)  

**Remark:** We say that "App::Rak" uses a Domain Specific Language (DSL), which is done with Raku's Command Line Interface (CLI) features.

### Procedure outline

1. Clone the corresponding [article repository](https://github.com/lizmat/articles)
2. Locate and ingest the "App::Rak" dedicated Markdown files
3. Extract code blocks from the Markdown files
   - Using ["Markdown::Grammar"](https://raku.land/zef:antononcube/Markdown::Grammar) functions
4. Get comment-and-code line pairs from the code blocks
   - Using Raku text manipulation capabilities
      - (After observing code examples) 
5. Generate from the comment-and-code pairs LLM few-shot training rules
6. Use the LLM example function to translate natural language commands into (valid and relevant) "App::Rak" DSL commands
   - With a few or a dozen natural language commands 
7. Use LLMs to generate natural language commands in order to test LLM-TLDR-er further 

Step 6 says how we do our TLDR -- we use LLM-translations of natural language commands.

### Alternative procedure

Instead of using Raku to process text we can make LLM functions for extracting the comment-and-code pairs.
(That is also shown below.)    

### Extensions

1. Using LLMs to generate:
    - Variants of the gathered commands
        - And make new training rules with them
    - EBNF grammars for gathered commands
2. Compare OpenAI and PaLM and or their different models 
    - Which one produces best results?
    - Which ones produce better result for which subsets of commands?

### Article's structure

The exposition below follows the outlines of procedure subsections above. The extensions 

The article/document/notebook was made with the Jupyter framework, using the Raku package ["Jupyter::Kernel"](https://raku.land/cpan:BDUGGAN/Jupyter::Kernel), [BD1].  

--------

## Setup

In [166]:
use Markdown::Grammar;
use Data::Reshapers;
use Data::Summarizers;
use LLM::Functions;
use Text::SubParsers;

------

## Workflow

### File names

In [167]:
my $dirName = $*HOME ~ '/GitHub/lizmat/articles';
my @fileNames = dir($dirName).grep(*.Str.contains('time-to-rak'));
@fileNames.elems

4

### Texts ingestion

Here we ingest the text of each file:

In [168]:
my %texts = @fileNames.map({ $_.basename => slurp($_) });
%texts.elems

4

Here are the number of characters per document:

In [169]:
%texts>>.chars

{its-time-to-rak-1.md => 7437, its-time-to-rak-2.md => 8725, its-time-to-rak-3.md => 14181, its-time-to-rak-4.md => 9290}

Here are the number of words per document:

In [170]:
%texts>>.words>>.elems

{its-time-to-rak-1.md => 1205, its-time-to-rak-2.md => 1477, its-time-to-rak-3.md => 2312, its-time-to-rak-4.md => 1553}

### Get Markdown code blocks

With the function `md-section-tree` we extract code blocks from Markdown documentation files into data structures amenable for further programmatic manipulation (in Raku.)
Here we get code blocks from each text:

In [171]:
my %docTrees = %texts.map({ $_.key => md-section-tree($_.value, modifier => 'Code', max-level => 0) });
%docTrees>>.elems

{its-time-to-rak-1.md => 1, its-time-to-rak-2.md => 11, its-time-to-rak-3.md => 24, its-time-to-rak-4.md => 16}

Here we put all blocks into one array:

In [172]:
my @blocks = %docTrees.values.Array.&flatten;
@blocks.elems

52

Here from each code block we parse-extract comment-and-code pairs:

In [173]:
my @rules;
@blocks.map({ 
    given $_ { 
        for m:g/ '#' $<comment>=(\V+) \n '$' $<code>=(\V+) \n / -> $m {
           @rules.push( ($m<comment>.Str.trim => $m<code>.Str.trim) ) 
         } } }).elems

52

Here is the number of rules:

In [174]:
@rules.elems

69

Here is a sample of the rules:

In [175]:
.say for @rules.pick(4)

produce extensive help on filesystem filters => rak --help=filesystem
same, with a regular expression => rak '/ foo $/'
save --before-context as -B, requiring a value => rak --before-context=! --save=B
Find files that have "lib" in their name from the current dir => rak lib --find


In order to tabulate "nicely" the rules in the Jupyter notebook, we make an LLM functions to produce an HTML table and then specify the corresponding "magic cell." 
(This relies on the Jupyter-magics features of [BDp1].) Here is an LLM conversion function, [AA1]:

In [176]:
my &ftbl = llm-function({"Conver the $^a table $^b into an HTML table."}, e=>llm-configuration('PaL<', max-tokens=>800))

-> **@args, *%args { #`(Block|5060670600040) ... }

Here is the HTML table derivation:

In [177]:
%%html
my $tblHTML=&ftbl("plain text", to-pretty-table(@rules.pick(12).sort, align => 'l', field-names => <Key Value>))

Key,Value
"Look for lines with either ""foo"" or ""bar""",rak '/ one | four /' twenty
Look for strings containing y or Y,rak --type=contains --ignorecase Y twenty
Only accept Raku and Markdown files,"rak foo --extensions=#raku,#markdown"
Only accept files with the .bat or the .ps1 extension,"rak foo --extensions=bat,ps1"
Show all filenames from current directory on down,rak --find attRot
"Show all the lines with exactly on character between ""e"" and ""t""",rak /e.t/ twenty
"look for ""Foo"", while taking case into account",rak Foo
"look for ""foo"" case-insensitively",rak foo
produce extensive help on filesystem filters,rak --help=filesystem --pager=less
"same, without equal sign",rak foo -C4


It is more "productive", though, to use the HTML interpreter provided by "Markdwon::Grammar":

In [178]:
%%html
my $textTBL=to-pretty-table(@rules.pick(12).sort).Str;
md-interpret($textTBL.subst('+--','|--', :g).subst('--+','--|', :g), actions=>Markdown::Actions::HTML.new)

-------------------------------------------------------------,--------------------------------------------------------------
Key,Value
-------------------------------------------------------------,--------------------------------------------------------------
"Check there's no ""frobnicate"" option anymore",rak --frobnicate
"Convert all occurrences of ""teen"" into ""10""","rak '.subst(""teen"",10)' twenty --type=code"
"Look for ""eight"" as the whole line in file ""twenty""",rak --type=equal eight twenty
"Look for ""seven"" at the start of all lines in file ""twenty""",rak --type=starts-with seven twenty
"Look for strings matching ""u"" having any additional marks",rak --type=contains --ignoremark ú twenty
Show all directory names from current directory down,rak --find --/file
Show number of files / lines authored by Scooby Doo,"rak --blame-per-line '*.author eq ""Scooby Doo""' --count-only"
"Show the lines containing ""ne""",rak ne twenty


### Code generation examples

Here we define an LLM function for generating "App::Rak" shell commands:

In [179]:
my &frak = llm-example-function(@rules, e => llm-evaluator('PaLM'))

-> **@args, *%args { #`(Block|5060670665440) ... }

In [180]:
my @cmds = ['Find files that have ".nb" in their names', 'Find files that have ".nb"  or ".wl" in their names',
 'Show all directories of the parent directory', 'Give me files without extensions and that contain the phrase "notebook"', 
 'Show all that have extension raku or rakumod and contain Data::Reshapers'];

my @tbl = @cmds.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;

@tbl.&dimensions

(5 2)

Here is a table showing the natural language commands and  

In [181]:
to-pretty-table(@tbl, align=>'l', field-names => <Command App::Rak>)

+--------------------------------------------------------------------------+-----------------------------------------------------------------+
| Command                                                                  | App::Rak                                                        |
+--------------------------------------------------------------------------+-----------------------------------------------------------------+
| Find files that have ".nb" in their names                                | rak --find --extensions=nb                                      |
| Find files that have ".nb"  or ".wl" in their names                      | rak --find --extensions=nb,wl                                   |
| Show all directories of the parent directory                             | rak --find --/file --parent                                     |
| Give me files without extensions and that contain the phrase "notebook"  | rak --type=words notebook --extensions=                         |

-------

## Translating random generated commands

Consider testing the applicability of the approach by generating a "good enough" sample of natural language commands for finding files or directories.

We can generate such commands via LLM. Here we define an LLM function with two parameters the returns a Raku list:

In [182]:
my &fcg = llm-function({"Generate $^_a natural language commands for finding $^b in a file system. Give the commands as a JSON list."}, form => sub-parser('JSON'))

-> **@args, *%args { #`(Block|5060694760760) ... }

In [183]:
my @gCmds1 = &fcg(4, 'files').flat;
@gCmds1.raku

["Find a file in my folder", "Search for a file in my directory", "Locate a file in my folder", "Look for a file in my file system"]

Here are the corresponding translations to the "App::Rak" DSL:

In [184]:
my @tbl1 = @gCmds.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
to-pretty-table(@tbl1, align=>'l', field-names => <Command App::Rak>)

+------------------------------------------+---------------------------+
| Command                                  | App::Rak                  |
+------------------------------------------+---------------------------+
| Show me all the files inside this folder | rak --find .              |
| Where are the files located?             | current directory on down |
| What files are inside this directory?    | rak --find                |
| List all the files in this folder        | rak --find                |
+------------------------------------------+---------------------------+

Let use redo the generation and translation using different specs: 

In [185]:
my @gCmds2 = &fcg(4, 'files that have certain extensions or contain certain words').flat;
@gCmds2.raku

["Find all files with the .docx extension", "Search for all files that contain 'Report' in their name", "Show me all files with the .txt extension", "Locate all files that have 'Project' in their name"]

In [186]:
my @tbl2 = @gCmds2.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
to-pretty-table(@tbl2, align=>'l', field-names => <Command App::Rak>)

+----------------------------------------------------------+-----------------------+
| Command                                                  | App::Rak              |
+----------------------------------------------------------+-----------------------+
| Find all files with the .docx extension                  | rak --extensions=docx |
| Search for all files that contain 'Report' in their name | rak --find Report     |
| Show me all files with the .txt extension                | rak --extensions=txt  |
| Locate all files that have 'Project' in their name       | rak --find Project    |
+----------------------------------------------------------+-----------------------+

-------

## Alternative programming with LLM

In [187]:
my &fcex = llm-function({"Extract the consecutive line pairs in which the first start with '#' and second with '\$' from the text $_. Group them as a key-value pairs and put them JSON format."}) 

-> **@args, *%args { #`(Block|5060694799560) ... }

In [188]:
&fcex(@blocks[0])



{
  "# produce results without any highlighting": "$ rak foo --/highlight",
  "# produce results as if piping to a file": "$ rak foo --no-human"
}

-------

## Extentions

The "right way" of translating natural language DSLs to CLI DSLs like the one of "App::Rak" is to make a grammar for the natural language DSL and the corresponding interpreter.
This might be a lengthy process, so, we might consider replacing it, or jump-starting it, with LLM-basd grammar generation:
we ask an LLM to generate a grammar for a collection DSL sentences. (For example, the keys of the rules above.) 
In this subsection we make a "teaser" demonstration of latter approach.

Here we create an LLM function for generating grammars over collections of sentences:

In [189]:
my &febnf = llm-function({"Generate an $^a grammar for the collection of sentences:\n $^b "}, e => llm-configuration("OpenAI", max-tokens=>900))

-> **@args, *%args { #`(Block|5060670827264) ... }

Here we generate an EBNF grammar for the "App::Rak" code-example commands:

In [190]:
my $ebnf = &febnf('EBNF', @rules>>.key)

 Look for the lines that contains two consecutive words that start with "ba" Show all the lines where the fifth character is "e"

SentenceList → Sentence | SentenceList Sentence

Sentence → ProduceResultsPipe | SpecifyLiteral | SpecifyRegExp | SaveIgnoreCase | SaveIgnoremark | AddChangeDescIgnoreCase | LiteralStringCheck | SaveWhitespace | SearchRakudo | SaveAfterContext | SaveBeforeContext | SaveContext | SearchContext | SmartCase | SearchCase | RemoveOption | StartRak | SearchFile | SearchSubDir | Extension | NoExtension | BehaviourFiles | HelpFilesystem | SearchDir | FindName | FindNumber | FindScooby | FindAnywhere | FindWord | FindStart | FindEnd | NumberCharacters | FindY | FindU | FindNE | FindSix | FindSeven | FindEight | FreqLetters | ShowContain | TitleCase | ReverseOrder | Optionally

ProduceResultsPipe → "produce" "results" "without" "any" "highlighting"
SpecifyLiteral → "specify" "a" "literal" "pattern" "at" "the" "end" "of" "a" "line"
SpecifyRegExp → "same," "with" "a" "r

------

## References

### Articles

[AA1] Anton Antonov, ["Workflows with LLM functions"](https://rakuforprediction.wordpress.com/2023/08/01/workflows-with-llm-functions/), (2023), [RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).

[AA2] Anton Antonov, ["Graph representation of grammars"](https://rakuforprediction.wordpress.com/2023/07/06/graph-representation-of-grammars/), (2023), [RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).

### Packages, paclets

[AAp1] Anton Antonov, [LLM::Functions Raku package](https://github.com/antononcube/Raku-LLM-Functions), (2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp2] Anton Antonov, [WWW::OpenAI Raku package](https://github.com/antononcube/Raku-WWW-OpenAI), (2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp3] Anton Antonov, [WWW::PaLM Raku package](https://github.com/antononcube/Raku-WWW-PaLM), (2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp4] Anton Antonov, [Text::SubParsers Raku package](https://github.com/antononcube/Raku-Text-SubParsers), (2023), [GitHub/antononcube](https://github.com/antononcube).

[BDp1] Brian Duggan, [Jupyter::Kernel Raku package](https://raku.land/cpan:BDUGGAN/Jupyter::Kernel), (2017-2023), [GitHub/bduggan](https://github.com/bduggan/raku-jupyter-kernel).