# DocMarkuper <p style="font-size: 14px; color: dimgray;">LLM prompt</p>

<p style="font-size: 18px; font-style: italic; color: gray">Generate markup documentation comments</p>

----

## Introduction

This notebook demonstrates the use of LLMs to generate document markup text over the code of large(r) code bases.
We consider code base with more than 10 files to be large. 
(Meaning, the LLM-generation manually, file-by-file, is seen as tedious.)
We use a recently developed prompt "DocMarkuper".

### Procedure outline

1. Pick a Raku package with a GitHub repository 
    - With large enough number of files
    - Say, ["WWW::OpenAI"](https://raku.land/zef:antononcube/WWW::OpenAI)
2. Get the file URLs (at GitHub)
    - Using LLMs:
      - Generic interface
        - [Grok worked](https://x.com/i/grok/share/pbp6JKiPc7TjQGAaIgC7Ra4G2)
      - Or Raku code
        - ChatGPT's models *did not* work; Gemini's *did*
3. Batch documentation generation
    - For each file URL:
        - Slurp the file
        - Process it to have RakuDoc comments
        - Dump it into an output folder
        - Keep track of the file size and processing time 
4. Review the generated RakuDoc files
5. Compute basic statistics and plot size-vs-time graph

----

## Setup

### LLM configurations

In [32]:
sink my $conf5-mini = llm-configuration('ChatGPT', model => 'gpt-5-mini');
sink my $conf5 = llm-configuration('ChatGPT', model => 'gpt-5');
sink my $conf41-mini = llm-configuration('ChatGPT', model => 'gpt-4.1-mini', temperature => 0.55, max-tokens => 4096);
sink my $conf41 = llm-configuration('ChatGPT', model => 'gpt-4.1', temperature => 0.45, max-tokens => 8192);
sink my $conf4o-mini = llm-configuration('ChatGPT', model => 'gpt-4o-mini', temperature => 0.45, max-tokens => 8192);
sink my $conf4o = llm-configuration('ChatGPT', model => 'gpt-4o', temperature => 0.45, max-tokens => 8192);
sink my $conf-gemini-flash = llm-configuration('Gemini', model => 'gemini-2.0-flash', temperature => 0.45, max-tokens => 8192);

### JavaScript::D3

In [24]:
#%javascript
require.config({
     paths: {
     d3: 'https://d3js.org/d3.v7.min'
}});

require(['d3'], function(d3) {
     console.log(d3);
});

In [25]:
#%js
js-d3-list-line-plot(10.rand xx 30, background => 'none')

----

## Basic example

Generate RakuDoc (or POD6) markup text from Raku code:

In [79]:
#% markdown

my $code = q:to/END/;
multi sub full-zip-code(Int $x) {
    my $x2 = $x.Str;
    return $x2.chars < 5 ?? '0' x (5 - $x2.chars) ~ $x2 !! $x2;
}

multi full-zip-code(Str $x) { return $x; }
END

my $res = llm-synthesize([
    llm-prompt("DocMarkuper")('Raku', $code),
    llm-prompt("NothingElse")('Raku')
    ])

```raku
=begin pod
= Function: full-zip-code(Int $x)
Returns a 5-digit zip code string, padding with leading zeros if necessary.
=end pod
multi sub full-zip-code(Int $x) {
    my $x2 = $x.Str;
    return $x2.chars < 5 ?? '0' x (5 - $x2.chars) ~ $x2 !! $x2;
}

=begin pod
= Function: full-zip-code(Str $x)
Returns the input string as is.
=end pod
multi full-zip-code(Str $x) { return $x; }
```

----

## Scope

Make Javadoc strings for Java code:

In [10]:
#% markdown
my $code = q:to/END/;
public class Greeting {
    private String message;

    public Greeting(String initialMessage) {
        this.message = initialMessage;
    }

    public void displayMessage() {
        System.out.println(message);
    }

    public static void main(String[] args) {
        Greeting helloGreeting = new Greeting(\"Hello, Java!\");

        helloGreeting.displayMessage();

        Greeting welcomeGreeting = new Greeting(\"Welcome to the \
world of classes!\");
        welcomeGreeting.displayMessage();
    }
}
END

llm-prompt("DocMarkuper")("Java", $code)
==> llm-synthesize

```java
/**
 * Represents a greeting message that can be displayed.
 */
public class Greeting {
    private String message;

    /**
     * Constructs a Greeting with an initial message.
     *
     * @param initialMessage the message to be stored and displayed
     */
    public Greeting(String initialMessage) {
        this.message = initialMessage;
    }

    /**
     * Prints the current greeting message to the standard output.
     */
    public void displayMessage() {
        System.out.println(message);
    }

    /**
     * The main method creates Greeting instances and displays their messages.
     *
     * @param args command-line arguments (not used)
     */
    public static void main(String[] args) {
        Greeting helloGreeting = new Greeting("Hello, Java!");

        helloGreeting.displayMessage();

        Greeting welcomeGreeting = new Greeting("Welcome to the world of classes!");
        welcomeGreeting.displayMessage();
    }
}
```

Just a prompt to be completed with code strings:

In [1]:
llm-prompt("DocMarkuper")()

You are an expert Raku programmer and a good writer of technical documentation.
For the given Raku code below preface every function or class method with corresponding RakuDoc lines.
Keep the user comments.
The documentation comments you generate are both concise and descriptive.
CODE:


----

## Neat Examples

Generate doc-comments for a Python class, generate examples of using that class, and place the code in a Python external evaluation cell:

In [81]:
sink my $code = q:to/END/;
class Car:
    def __init__(self, make, model, year):
        self.make = make
        self.model = model
        self.year = year
        self.speed = 0

    def accelerate(self, amount):
        self.speed += amount
        print(f\"The {self.year} {self.make} {self.model} accelerates to \
{self.speed} mph.\")

    def brake(self, amount):
        self.speed -= amount
        if self.speed < 0:
            self.speed = 0
        print(f\"The {self.year} {self.make} {self.model} slows down to \
{self.speed} mph.\")

    def display_info(self):
        print(f\"Car Info: {self.year} {self.make} {self.model}, Current \
Speed: {self.speed} mph\")
END

In [82]:
#% markdown 
llm-prompt("DocMarkuper")("Python", $code)
==> llm-synthesize

```python
class Car:
    """
    A class to represent a car with basic attributes and behaviors.

    :param make: Manufacturer of the car.
    :type make: str
    :param model: Model of the car.
    :type model: str
    :param year: Manufacturing year of the car.
    :type year: int
    """

    def __init__(self, make, model, year):
        """
        Initialize a new Car instance.

        :param make: Manufacturer of the car.
        :type make: str
        :param model: Model of the car.
        :type model: str
        :param year: Manufacturing year of the car.
        :type year: int
        """
        self.make = make
        self.model = model
        self.year = year
        self.speed = 0

    def accelerate(self, amount):
        """
        Increase the car's speed by a specified amount.

        :param amount: Amount to increase the speed by (mph).
        :type amount: int or float
        """
        # Increase speed by specified amount
        self.speed += amount
        print(f"The {self.year} {self.make} {self.model} accelerates to {self.speed} mph.")

    def brake(self, amount):
        """
        Decrease the car's speed by a specified amount, not allowing negative speed.

        :param amount: Amount to decrease the speed by (mph).
        :type amount: int or float
        """
        # Decrease speed by specified amount
        self.speed -= amount
        if self.speed < 0:
            self.speed = 0
        print(f"The {self.year} {self.make} {self.model} slows down to {self.speed} mph.")

    def display_info(self):
        """
        Display information about the car including current speed.
        """
        # Print current car information
        print(f"Car Info: {self.year} {self.make} {self.model}, Current Speed: {self.speed} mph")
```

----

## Batch documentation generation

Does not work with GPT-4.1 and GPT-5:

```raku
llm-synthesize([
    'List the individual files at this URL:',
    'https://github.com/antononcube/Raku-WWW-OpenAI/tree/main/lib/WWW/OpenAI',
    llm-prompt('NothingElse')('JSON')    
],
    e => $conf5,
    form => sub-parser('JSON'):drop
)
```

```
# {error => Unable to list files at the provided URL.}
```

Grok dutifully gives the list:

```raku
my @file-names = <Audio.rakumod Batches.rakumod ChatCompletions.rakumod Embeddings.rakumod Files.rakumod FindTextualAnswer.rakumod ImageEdits.rakumod ImageGenerations.rakumod ImageVariations.rakumod Models.rakumod Moderations.rakumod Request.rakumod TextCompletions.rakumod>;
```

Gemini works!!

In [None]:
my @file-names = llm-synthesize([
    'List the individual files at this URL:',
    'https://github.com/antononcube/Raku-WWW-OpenAI/tree/main/lib/WWW/OpenAI',
    llm-prompt('NothingElse')('JSON')    
],
    e => $conf-gemini-flash,
    form => sub-parser('JSON'):drop
)

[WWW/OpenAI.rakumod WWW/OpenAI/Audio.rakumod WWW/OpenAI/Chat.rakumod WWW/OpenAI/Completions.rakumod WWW/OpenAI/Edits.rakumod WWW/OpenAI/Embeddings.rakumod WWW/OpenAI/Files.rakumod WWW/OpenAI/FineTunes.rakumod WWW/OpenAI/Images.rakumod WWW/OpenAI/Models.rakumod WWW/OpenAI/Moderations.rakumod]

Post-process the result:

In [None]:
@file-names = @file-names.map({ $_.subst('WWW/') })

[Audio.rakumod Batches.rakumod ChatCompletions.rakumod Embeddings.rakumod Files.rakumod FindTextualAnswer.rakumod ImageEdits.rakumod ImageGenerations.rakumod ImageVariations.rakumod Models.rakumod Moderations.rakumod Request.rakumod TextCompletions.rakumod]

Make the absolute URLs and sort them by size (or LLM-tokens):

In [None]:
my $raw-url = 'https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI';
my @urls = @file-names.map({ "$raw-url/$_"});
@urls .= sort({ slurp($_).chars });

[https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/Moderations.rakumod https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/Embeddings.rakumod https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/Files.rakumod https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/ImageVariations.rakumod https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/Batches.rakumod https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/ImageEdits.rakumod https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/FindTextualAnswer.rakumod https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/Audio.rakumod https://raw.githubusercontent.com/antononcube/Raku-WWW-OpenAI/refs/heads/main/lib/WWW/OpenAI/ImageGener

Define output RakuDoc/POD6 directory:

In [43]:
sink my $outputDirName = $*CWD ~ '/WWW-OpenAI-RakuDoc';

Batch processing: 
- For each file/URL:
    - Slurp the file
    - Process it to have RakuDoc comments
    - Dump it into an output folder
    - Keep track of the file size and processing time 

In [None]:
my @stats = do for @urls.kv -> $index, $file {
    my $tStart = now;
    
    my $basename = $file.split("/").tail;
    
    say "$index : $basename";
    
    my $code = slurp($file);
    $code = $code.subst(/ ^^ '#' .*? $$/):g;

    my $res = llm-synthesize([
        llm-prompt("DocMarkuper")('Raku', $code),
        "\n\n",
        llm-prompt('NothingElse')('RakuDoc/Pod6')
        ],
        e => $conf41-mini
    );
    
    spurt("$outputDirName/{$basename}", $res.subst(/ ^ '```raku' | '```' $/):g);
    my $time = now - $tStart;
    say "\tTotal time: $time";
    {:$index, file => $basename, :$time, |text-stats($code)}
}

0 : Moderations.rakumod
	Total time: 4.356633541
1 : Embeddings.rakumod
	Total time: 6.592157041
2 : Files.rakumod
	Total time: 6.476077321
3 : ImageVariations.rakumod
	Total time: 23.578356867
4 : Batches.rakumod
	Total time: 9.352477943
5 : ImageEdits.rakumod
	Total time: 17.739491059
6 : FindTextualAnswer.rakumod
	Total time: 11.585346637
7 : Audio.rakumod
	Total time: 17.264578126
8 : ImageGenerations.rakumod
	Total time: 9.60693558
9 : TextCompletions.rakumod
	Total time: 9.164557062
10 : Request.rakumod
	Total time: 11.214289112
11 : Models.rakumod
	Total time: 5.438209626
12 : ChatCompletions.rakumod
	Total time: 7.296680404


[{chars => 1374, file => Moderations.rakumod, index => 0, lines => 47, time => 4.356633541, words => 103} {chars => 2672, file => Embeddings.rakumod, index => 1, lines => 70, time => 6.592157041, words => 221} {chars => 3007, file => Files.rakumod, index => 2, lines => 95, time => 6.476077321, words => 279} {chars => 3603, file => ImageVariations.rakumod, index => 3, lines => 103, time => 23.578356867, words => 324} {chars => 3828, file => Batches.rakumod, index => 4, lines => 106, time => 9.352477943, words => 341} {chars => 4286, file => ImageEdits.rakumod, index => 5, lines => 119, time => 17.739491059, words => 394} {chars => 6175, file => FindTextualAnswer.rakumod, index => 6, lines => 160, time => 11.585346637, words => 583} {chars => 6967, file => Audio.rakumod, index => 7, lines => 178, time => 17.264578126, words => 716} {chars => 7257, file => ImageGenerations.rakumod, index => 8, lines => 173, time => 9.60693558, words => 752} {chars => 7514, file => TextCompletions.rakumod,

**Remark:** The package ["Data::Importers"](https://raku.land/zef:antononcube/Data::Importers) extends `slurp` to ingest the content of URLs.

---

## Execution stats

In [72]:
my @field-names = <index file time chars words lines>;
sink records-summary(@stats, field-names => @field-names.tail(*-2).List)

+-----------------------------+-----------------------+----------------------+----------------------+
| time                        | chars                 | words                | lines                |
+-----------------------------+-----------------------+----------------------+----------------------+
| Min    => 4.356633541       | Min    => 1374        | Min    => 103        | Min    => 47         |
| 1st-Qu => 6.534117181       | 1st-Qu => 3305        | 1st-Qu => 301.5      | 1st-Qu => 99         |
| Mean   => 10.74352233223077 | Mean   => 5507.307692 | Mean   => 499.923077 | Mean   => 146.846154 |
| Median => 9.352477943       | Median => 6175        | Median => 450        | Median => 160        |
| 3rd-Qu => 14.4249623815     | 3rd-Qu => 7583        | 3rd-Qu => 734        | 3rd-Qu => 185        |
| Max    => 23.578356867      | Max    => 9565        | Max    => 906        | Max    => 256        |
|                             |                       |                      |    

In [73]:
#% html
@stats 
==> { $_.map({ $_<time> = $_<time>.round(0.05); $_}) }()
==> to-html(:@field-names)

index,file,time,chars,words,lines
0,Moderations.rakumod,4.35,1374,103,47
1,Embeddings.rakumod,6.6,2672,221,70
2,Files.rakumod,6.5,3007,279,95
3,ImageVariations.rakumod,23.6,3603,324,103
4,Batches.rakumod,9.35,3828,341,106
5,ImageEdits.rakumod,17.75,4286,394,119
6,FindTextualAnswer.rakumod,11.6,6175,583,160
7,Audio.rakumod,17.25,6967,716,178
8,ImageGenerations.rakumod,9.6,7257,752,173
9,TextCompletions.rakumod,9.15,7514,669,192


In [75]:
#% js
js-d3-list-line-plot(
    @stats.map(*<lines time>).sort(*.head), 
    title-color => 'ivory',
    title => 'Size vs LLM-time', x-label => 'size', y-label => 'time, s', 
    :3stroke-width,
    background => '#1F1F1F', 
    :grid-lines)

----

## References

### Packages

[AAp1] Anton Antonov,
[LLM::Functions, Raku package](https://github.com/antononcube/Raku-LLM-Functions),
(2023-2025),
[GitHub/antononcube](https://github.com/antononcube).

[AAp2] Anton Antonov,
[LLM::Prompts, Raku package](https://github.com/antononcube/Raku-LLM-Prompts),
(2023-2025),
[GitHub/antononcube](https://github.com/antononcube).