# ShellAI – Natural Language Interfaces for the Custom Shell

This notebook explores the idea of using **AI models** to accept natural language input in your shell and translate it into concrete commands that your existing shell and command modules can execute.

It is **not** about model training in depth or API keys, but about:
- The *idea* of AI-assisted shell interaction.
- Typical model choices and their size/constraints.
- How to leverage your existing **command anatomy** (`cmd_spec_t`, `argtable3`, docs) as training or retrieval data.
- Design patterns (fine-tuning vs RAG) that keep the AI suggestions aligned with what your shell actually supports.


## 1. Motivation and high-level idea

Your course shell is already powerful:
- It has a grammar (BNFC shell lab) for pipelines, redirection, background jobs.
- It has a growing library of commands, each implemented with a standard anatomy:
  - `cmd_spec_t` (name, summary, long_help).
  - Argument parsing and `--help` via `argtable3`.
  - Registration in a command registry.

**ShellAI** asks: can we add a **natural language front-end** on top of this?

Example:

- User types:

  ```text
  @show me all `.c` files modified today
  ```

- AI helper returns:

  ```text
  ls -lt --time-style=+%Y-%m-%d *.c
  ```

- Shell prints:

  ```text
  Suggested: ls -lt --time-style=+%Y-%m-%d *.c
  Run this? (y/n)
  ```

- If the user accepts, the shell feeds that line into the normal parse → execute pipeline.

The key property: **the AI never runs commands directly**. It only suggests a string; the shell remains responsible for parsing, checking, and executing.


## 2. Architecture: AI-assisted shell as a tool user

We treat the AI as a *helper process* that maps natural language → shell commands.

Baseline architecture:

- Shell (`mysh`):
  - REPL and parser (BNFC mini-shell).
  - Command registry and built-ins (`cmd_spec_t`, `run`, `print_usage`).
  - Integration point for AI:
    - If line starts with `@`, send the rest to `mysh_llm`.
    - Read one line back: suggested command.
    - Ask the user for confirmation and, if accepted, execute.

- AI helper (`mysh_llm`):
  - Standalone program (could be Python, C, etc.).
  - Talks to an AI model (local or remote) using its preferred API.
  - Can optionally perform retrieval over your command docs before calling the model.
  - Outputs exactly **one line**: a suggested shell command.

This separation has several advantages:
- You can swap models or implementations without changing the shell.
- The shell remains deterministic and testable; the AI is only a suggestion engine.
- You can log all `@` queries and suggested commands for debugging and safety.


## 3. What models exist for CLI-like behavior?

Many modern language models have strong familiarity with Unix command lines, even if they are **not dedicated CLI models**.

### 3.1 General-purpose large models

Examples (sizes and capabilities may change over time):

- Proprietary, cloud-hosted:
  - GPT-4-class models (OpenAI) – tens to hundreds of billions of parameters (not publicly specified), trained on code and shell usage.
  - Claude 3, Gemini, etc. – similar scale and training mix.
- Open models:
  - LLaMA 3 family (e.g., 8B, 70B parameter variants).
  - Mistral / Mixtral (e.g., 7B, 8x7B MoE).
  - Other code-focused models (e.g., Code LLaMA, StarCoder-like models).

These models generally *already know* common Unix commands and patterns (e.g. `ls`, `grep`, pipes), because their training corpora include code repositories, configuration files, and shell transcripts.

### 3.2 CLI-specific or agentic wrappers

Most "shell GPT" tools are actually **wrappers** around a general model:
- They prompt the model with examples of command usage.
- They sometimes enforce safety rules (never run `rm -rf /`, etc.).
- They may inspect the filesystem or command outputs.

Purely CLI-specific small models exist in research, but in practice you are more likely to:
- Use a general model + good prompting.
- Or fine-tune a mid-size open model (e.g., 7B–13B parameters) with shell-specific data.

For teaching, you do **not** need to pick a specific model. Instead, focus on:
- How to design prompts.
- How to constrain outputs.
- How to use your own shell’s knowledge (from the command anatomy) to improve reliability.


## 4. Using your command anatomy as knowledge

Your command modules already encode **structured knowledge** that an AI could use:

- `cmd_spec_t`:
  - `name` – the literal command name.
  - `summary` – one-line description.
  - `long_help` – longer description / Markdown.
- `argtable3` definitions:
  - Short/long option names.
  - Expected value types (file, string, integer, etc.).
  - Help strings per option.
- Generated docs:
  - `--help` or `--help-md` outputs.
  - Additional README or example sections.

We can exploit this in two main ways:

1. **Fine-tuning / transfer learning**: turn this information into a training dataset.
2. **Retrieval-Augmented Generation (RAG)**: dynamically feed relevant command docs to the model at query time.

Both approaches share a first step: **extract a machine-readable catalog** of your commands.

### 4.1 Building a command catalog

You can imagine a JSON file like:

```json
{
  "commands": [
    {
      "name": "hello",
      "summary": "print a friendly greeting",
      "description": "Print a greeting, optionally addressing a specific NAME.",
      "usage": "hello [-h] [-n NAME]",
      "options": [
        { "short": "-h", "long": "--help", "help": "show help and exit" },
        { "short": "-n", "long": "--name", "arg": "NAME", "help": "name to greet" }
      ]
    },
    {
      "name": "wc",
      "summary": "print newline, word, and byte counts", 
      "description": "Count lines, words, and bytes in files or stdin.",
      "usage": "wc [OPTION]... [FILE]...",
      "options": [ /* ... */ ]
    }
  ]
}
```

You can generate this in several ways:
- From `cmd_spec_t` and `argtable3` directly (C-side introspection).
- By running each command with `--help-json` or `--help-md` and parsing the output.

Once you have such a catalog, you can either:
- Use it to generate synthetic training pairs (for fine-tuning).
- Index it for retrieval in a RAG setup.


## 5. Strategy A: Fine-tuning on your commands (transfer learning)

**Idea:** teach a base model to map natural language to your specific commands by training it on examples derived from your command catalog.

### 5.1 Creating training examples

For each command and option combination, you can synthesize pairs like:

- Input (natural language):

  ```text
  greet Alice with the hello command
  ```

- Output (target CLI):

  ```text
  hello -n Alice
  ```

Another example:

- Input:

  ```text
  count lines and words in all .c files
  ```

- Output:

  ```text
  wc -lw *.c
  ```

You can generate many such examples by combining:
- Command names and summaries.
- Option descriptions.
- Simple templates for common tasks ("list", "count", "search", etc.).

### 5.2 Fine-tuning a model

In practice:
- Choose a base model, e.g. a 7B–13B open model that you can run on a GPU or cloud service.
- Format training data as instruction–response pairs.
- Run a small fine-tuning job (LoRA / parameter-efficient fine-tuning) using an open-source framework.

From a **course** perspective, you do not need to actually run this; instead, you can:
- Discuss the process conceptually.
- Show a few example training pairs in this notebook.
- Optionally provide a script template for students who want to experiment.

Fine-tuning improves fluency and domain-specific behavior, but it is **heavier** to set up and maintain.


## 6. Strategy B: Retrieval-Augmented Generation (RAG)

RAG avoids fine-tuning by giving a general model **fresh context** on each query.

### 6.1 High-level RAG loop

1. User types an `@` query:

   ```text
   @list only hidden files in this directory
   ```

2. Shell passes the natural language text to your AI helper (`mysh_llm`).

3. `mysh_llm` performs **retrieval**:
   - Uses embeddings to find the most relevant commands and options from your command catalog.
   - For example, it might retrieve docs for `ls` and its `-a`/`--all` flag.

4. `mysh_llm` constructs a prompt to the base model:

   ```text
   You are a command suggestion engine for the mysh shell.
   You only suggest commands that exist and options that are documented.

   Available commands (subset):
   - ls: list directory contents.
     Usage: ls [OPTION]... [FILE]...
     Options:
       -a, --all    do not ignore entries starting with '.'

   User request: "list only hidden files in this directory".

   Suggest a single mysh command line:
   ```

5. The model responds, e.g.:

   ```text
   ls -d .??*
   ```

6. `mysh_llm` returns just that line to the shell, which then asks the user to confirm and executes it.

### 6.2 Benefits of RAG

- You can change or extend your command set without retraining the model.
- The model sees **exact** command syntax at runtime, reducing hallucinations.
- You can include additional context, like environment variables or examples, in the retrieved docs.

From a course perspective, RAG is a nice illustration of:
- How to combine **symbolic knowledge** (your commands and options) with an LLM.
- How retrieval and prompting can often replace full fine-tuning.


## 7. Shell integration recap (ties back to BNF_Shell)

On the shell side (C code), the integration is intentionally simple and model-agnostic.

### 7.1 Shell logic for `@` lines

In your BNFC-based shell (see `BNF_Shell` lab), you can implement:

```c
int handle_llm_line(const char *query) {
    // 1. Call external helper: mysh_llm query
    // 2. Capture one line of output: suggested command
    // 3. Print it and ask user to confirm
    // 4. If confirmed, run the existing parse/execute pipeline on that line
}
```

Main loop sketch:

```c
while (getline(&line, &len, stdin) != -1) {
    if (line[0] == '@') {
        handle_llm_line(line + 1); // skip '@'
        continue;
    }
    // normal path: parse line via yyparse, then execute_command(...)
}
```

The **ShellAI** notebook focuses on how `mysh_llm` should behave; the BNF_Shell notebook focuses on plumbing this into your C shell.

You can start with a dummy `mysh_llm` that ignores the query and prints a hard-coded command.
Once the plumbing works, you or your students can experiment with real models and RAG strategies.


## 8. Extra-credit assignment outline

For students interested in ShellAI, an extra-credit assignment could be:

1. **Catalog extraction** (low effort, high value):
   - Write a script or small C program that runs each command with `--help-md` or `--help-json`.
   - Collect the output into a JSON catalog of commands and options.

2. **Prototype RAG helper**:
   - In Python, build a simple `mysh_llm` script that:
     - Loads the command catalog.
     - For each `@` query, picks a likely command by keyword matching (no embeddings needed to start).
     - Builds a prompt showing that command’s usage and options.
     - Calls an LLM API to generate a suggested command line.
     - Prints just that line.

3. **Shell integration**:
   - Modify your shell’s main loop so that `@` lines go through `handle_llm_line`.
   - Confirm that suggested commands are run only after user confirmation.

4. **Reflection**:
   - Write a short note on:
     - Where the model did well.
     - Where it hallucinated options or commands.
     - How RAG or better prompts improved robustness.

This assignment keeps the focus on **systems concepts** (processes, pipes, registries, and structured metadata) while giving a realistic glimpse of how AI-powered developer tools are built.


## 9. Example helper script: `mysh_llm.py`

This repository includes a **partly implemented helper script**:

- `ShellAI/mysh_llm.py`

Its purpose:
- Read a natural-language query from the command line.
- Ask the shell (or a local JSON file) for the current command catalog.
- Build a prompt that lists relevant commands and options.
- Optionally call an OpenAI model to get a suggested command line.
- Fall back to a simple heuristic if no model is configured.
- Print exactly **one line**: the suggested command.

### 9.1 Command catalog protocol (simple MCP-like pattern)

The script first tries to retrieve a catalog via:

```bash
mysh --commands-json
```

Expected JSON structure (example):

```json
{
  "commands": [
    {
      "name": "hello",
      "summary": "print a friendly greeting",
      "description": "Print a greeting, optionally addressing a specific NAME.",
      "usage": "hello [-h] [-n NAME]",
      "options": [
        { "short": "-h", "long": "--help", "arg": null, "help": "show help and exit" },
        { "short": "-n", "long": "--name", "arg": "NAME", "help": "name to greet" }
      ]
    }
  ]
}
```

If this command is not available, `mysh_llm.py` falls back to a local file `commands.json` with the same structure.

This pattern is conceptually similar to **Model Context Protocol (MCP)**:
- The shell plays the role of a **server** exposing structured knowledge (commands, options, docs).
- The helper script is a **client** that fetches that knowledge and feeds it to a model.

For this course, we use a very simple custom protocol (just a JSON command) instead of a full MCP implementation.

### 9.2 Using `mysh_llm.py` from the shell

Once integrated, your shell’s `handle_llm_line` can:

```c
int handle_llm_line(const char *query) {
    // Launch mysh_llm and capture its stdout, e.g. via popen() or pipes.
    // Command: mysh_llm "<query>"
    // Read exactly one line: the suggested CLI (e.g., "ls -lt *.c").
    // Print it, ask the user to confirm, and if yes, feed it back into yyparse/execute.
}
```

Standalone usage from a terminal:

```bash
cd ShellAI
python mysh_llm.py "list all C files"
```

If you do **not** set `OPENAI_API_KEY`, the script will:
- Log (optionally) to stderr that no real model is configured.
- Use a simple heuristic to pick a plausible command from the catalog.

If you configure the OpenAI client:
- `OPENAI_API_KEY` environment variable.
- Optionally `MYSH_LLM_MODEL` (default `gpt-4o-mini`).

then `mysh_llm.py` will:
- Use the command catalog as RAG context.
- Ask the model for a single-line suggestion.
- Return that suggestion to the shell.

This gives you a concrete starting point to experiment with **RAG-based ShellAI** without having to implement all the pieces from scratch.


## 10. Assignment: Implement and integrate `mysh_llm.py`

For this **ShellAI** part of the course, the main deliverable is a working `mysh_llm.py` helper and its integration with your shell.

### 10.1 Learning goals

By doing this assignment, you will:
- Design a simple **protocol** between your shell and an AI helper (JSON catalog, `@` lines).
- Implement a **RAG-style client** that uses your own command metadata instead of letting the model guess.
- Practice using an AI coding assistant as a collaborator, not a replacement:
  - You specify interfaces and constraints.
  - The AI helps fill in boilerplate and refine code.

### 10.2 What you should implement

1. **Command catalog output in your shell** (ties back to the BNF_Shell lab):
   - Add a built-in or flag, e.g. `mysh --commands-json`, that prints a JSON catalog of available commands.
   - At minimum, include for each command:
     - `name`
     - `summary`
     - `description` (can reuse `long_help` or `summary`)
   - Extra credit: add `usage` and `options` based on your `argtable3` definitions.

2. **Complete or adapt `mysh_llm.py`:**
   - Use or extend `ShellAI/mysh_llm.py` as a starting point.
   - Ensure it:
     - Calls your shell catalog command (or reads `commands.json`).
     - Parses the JSON into Python structures.
     - Selects relevant commands using a simple scoring or keyword match.
     - Builds a prompt that lists those commands and options.
     - Either:
       - Uses a real LLM (if you configure `OPENAI_API_KEY`), or
       - Uses a heuristic fallback (no network).
     - Prints exactly **one shell command line**.

3. **Shell integration (`@` lines):**
   - In your BNFC-based shell:
     - Detect when an input line begins with `@`.
     - Strip the `@` and pass the rest to `mysh_llm` via a helper like `handle_llm_line`.
     - Capture the suggested command line.
     - Show it to the user and ask for confirmation.
     - If confirmed, feed the suggested line back into your normal parse → execute pipeline.

You do **not** have to use a real cloud model to get credit; a heuristic-only `mysh_llm.py` that uses your command catalog is acceptable and already illustrates the integration.

### 10.3 Suggested AI prompt for implementing `mysh_llm.py`

You are encouraged to use your favorite AI coding assistant when implementing this helper. Below is a suggested starting prompt you can paste into your AI tool and adapt:

```text
You are my Python systems programming assistant.

Context:
- I am building a custom Unix-like shell called `mysh` for a course.
- Commands are implemented with a standard anatomy (`cmd_spec_t`, argtable3) and registered in a command registry.
- The shell will expose a JSON catalog of commands via a flag:

    mysh --commands-json

  The output is a JSON object like:

    {
      "commands": [
        {
          "name": "hello",
          "summary": "print a friendly greeting",
          "description": "Print a greeting, optionally addressing a specific NAME.",
          "usage": "hello [-h] [-n NAME]",
          "options": [
            { "short": "-h", "long": "--help", "arg": null, "help": "show help and exit" },
            { "short": "-n", "long": "--name", "arg": "NAME", "help": "name to greet" }
          ]
        },
        ...
      ]
    }

Goal:
- Implement a Python script `mysh_llm.py` that the shell will call when a line starts with `@`.
- The script should:
  1) Read a natural language query from argv[1].
  2) Retrieve the command catalog, preferably by running `mysh --commands-json`.
  3) Parse the JSON into Python data structures.
  4) Select a small subset of relevant commands based on keyword overlap with the query.
  5) Build a text prompt that lists those commands and options.
  6) EITHER:
     - Call an LLM API (if `OPENAI_API_KEY` is set) to get a suggested command line, OR
     - Use a simple heuristic fallback (e.g., pick the best command name).
  7) Print exactly ONE line to stdout: the suggested shell command line, with no extra text.

Constraints:
- Use standard Python 3 and the standard library only, except for the optional LLM call.
- For the LLM call, assume I may have `openai` installed and an `OPENAI_API_KEY` set; if not, gracefully skip and use the heuristic.
- Include clear functions for:
  - Loading the catalog from the shell or a local JSON file.
  - Scoring commands relative to the query.
  - Building the prompt.
  - Calling the LLM (optional).
  - Picking a heuristic fallback.

Please:
- Generate the full `mysh_llm.py` implementation.
- Add short comments explaining the main functions.
- Make sure the script can run as:

    python mysh_llm.py "list all C files"

  and print a single suggested command.
```

You may adapt this prompt with details specific to your environment (e.g., the real path to `mysh`, your catalog format, or your chosen LLM provider).

### 10.4 What to hand in

For grading, you should provide:
- Your `mysh_llm.py` script (and any small helper files it needs).
- A short note (markdown or text) describing:
  - How you implemented `mysh --commands-json`.
  - Whether you used a real LLM or only the heuristic mode.
  - One or two example `@` queries and the suggested commands they produced.

Optional: include a short reflection on how using an AI assistant changed the way you designed or implemented `mysh_llm.py`.


## 11. Agent MCP Integration – Designing the shell as an AI backend

This section treats your shell as a **backend for external agents**, using ideas similar to the Model Context Protocol (MCP).

The design goal:
- An external agent (running elsewhere, with its own Python/LLM/etc.) can safely and reliably:
  - Inspect and edit files.
  - Run commands and tests.
  - Manage processes.
  - Later, use simple networking tools.
- The shell provides a **small, well-defined set of capabilities** with structured input/output.
- Internally, your shell is BusyBox-like: most commands are built-ins, running in the same process (using `pthread` for background jobs and daemons), not external `exec` calls.

From an agent’s point of view, it does not matter whether a command is internal or external, as long as:
- The **interface** is stable and discoverable.
- Effects are well-defined.
- Results are machine-readable when needed.

### 11.1 Capabilities an agent needs (minimal toolset)

A useful MCP-style integration can be built from a relatively small set of tools. Here is a suggested **minimum set** that also doubles as a checklist for your shell command set.

**Filesystem tools**

- `fs.list` – list directory contents and basic metadata.
  - CLI analogue: `ls`, `find`.
  - Shell commands to implement:
    - `ls` (with `-a`, `-l`, etc.).
    - Optional: a more structured listing command or flag, e.g. `ls --json`.

- `fs.read` – read file contents.
  - CLI: `cat`, `head`, `tail`.
  - Shell commands:
    - `cat` (core).
    - Optional: `head`, `tail` for partial reads.

- `fs.write` / `fs.append` – create/overwrite/append:
  - CLI: `cp`, `mv`, `rm`, `mkdir`, `rmdir`, `touch`.
  - Shell commands:
    - `cp`, `mv`, `rm`, `mkdir`, `rmdir`, `touch`.

- `fs.stat` – inspect file type/size/timestamps.
  - CLI: `stat`.
  - Shell command:
    - `stat` (possibly with `--json` output for agents).

- `fs.search` – search text in files.
  - CLI: `grep` (or `rg` if you mimic ripgrep).
  - Shell command:
    - `grep` with at least basic regex or fixed-string matching.

**Text editing tools**

For an agent, powerful but complex tools like `sed` are helpful but not strictly required. Since Python is not guaranteed inside your environment, it is better to design **simple, predictable editing commands** as built-ins.

Examples of agent-friendly editing commands:
- `edit replace-line FILE N TEXT` – replace line `N` with `TEXT`.
- `edit insert-line FILE N TEXT` – insert `TEXT` before line `N`.
- `edit delete-line FILE N` – delete line `N`.
- `edit replace FILE PATTERN REPLACEMENT` – simple global replacement.

These can be implemented as internal commands operating on whole files. For human users, you may still provide `sed`-like tools, but for agents, these structured operations are easier to use safely than arbitrary regex pipelines.

**Process and job tools**

Your shell already uses `pthread` and internal job management. For an agent, we want tools that:

- `proc.list` – show running jobs.
  - CLI: `jobs`, `ps` (limited to shell-managed jobs).
  - Shell commands:
    - `jobs` – list jobs with IDs and state.
    - `ps` – optional, to show more detail.

- `proc.kill` – terminate or signal a job.
  - CLI: `kill`.
  - Shell command:
    - `kill JOB_ID` or `kill PID`.

- `proc.wait` – wait for a job to finish.
  - CLI: `wait`.
  - Shell command:
    - `wait JOB_ID`.

**Shell / environment tools**

- `cd`, `pwd` – change and inspect current directory.
- `env.get` / `env.set` – inspect and change environment variables.
  - CLI: `export`, `set`, `unset`.
- `which` / `type` – inspect how a command name is resolved (built-in vs external).

These are all familiar shell built-ins; the key for MCP is to make their behavior **predictable and scriptable**.

**Networking tools (later chapter)**

- Minimal HTTP client:
  - `http get URL` – fetch content.
  - `http post URL DATA` – send data.
- Optional: domain-specific tools for your embedded/FreeRTOS environment (e.g., device discovery or configuration).

For embedded targets without Python, an external agent (running on a PC) would use these commands via MCP or a serial/network bridge; the shell itself does not need to host the AI.

### 11.2 Mapping shell commands to MCP-style tools

Conceptually, you can think of an MCP server exposing tools like:

- `fs_list(path: string) -> [DirEntry]`
- `fs_read(path: string, offset?: int, length?: int) -> string`
- `fs_write(path: string, content: string, mode: string) -> void`
- `proc_list() -> [JobInfo]`
- `proc_kill(id: int, signal: string) -> void`
- `http_get(url: string) -> HttpResponse`

Inside your shell, these tools do **not** need to be separate programs:

- They can be thin wrappers around internal APIs / built-in commands written in C.
- For example, `fs_read` might call the same internal function that backs the `cat` command.
- `proc_list` might call your job table inspection code that backs `jobs`.

Your existing `cmd_spec_t` and registry make it straightforward to:

- Discover available commands and their metadata.
- Map from a tool name (e.g. `fs_read`) to a handler function.
- Eventually generate tool schemas (argument names, types, descriptions) from `argtable3`.

### 11.3 No Python inside the shell: what changes?

In this design, **Python is not required inside the shell environment**:

- The AI agent (and any Python-based RAG code, like `mysh_llm.py`) runs on a host machine or in the cloud.
- The shell runs on Linux, FreeRTOS, or other targets, providing only its own commands and internal APIs.
- Communication between the agent and the shell can be:
  - Local process calls (e.g., running `mysh` with a custom protocol over stdin/stdout).
  - Network sockets or serial links to an MCP-style server that wraps the shell.

This means:
- You **should not** rely on Python scripts for editing files or complex logic inside the embedded environment.
- Instead, design a small set of **deterministic, composable built-ins** (like the structured edit commands above) that an external agent can orchestrate.
- Tools like `sed` are helpful, but not mandatory; what matters is that the agent can:
  - Read a file.
  - Make a controlled change.
  - Write it back.

From the agent’s perspective, the main difference is **where the intelligence runs**:
- In the `mysh_llm.py` approach, some logic runs inside the same host OS as the shell.
- In an embedded or restricted environment, the AI logic stays outside; it uses MCP or a similar protocol to invoke shell capabilities.

Either way, the design principles are the same:
- Keep the shell’s command set **complete** and **consistent** for filesystem, processes, and networking.
- Provide a **structured catalog** (`--commands-json` and, later, per-command schemas).
- Avoid making the agent rely on fragile, human-oriented text parsing when structured outputs are possible.

### 11.4 Using this chapter as a requirements list

You can use the list above as:
- A **requirements checklist** for which built-ins your students implement.
- A **design target** for future MCP integration:
  - First, make sure each capability exists as a robust command.
  - Then, add structured output (`--json` flags, `--commands-json`).
  - Later, wrap those capabilities in a dedicated MCP server if desired.

This keeps your shell portable (Linux, FreeRTOS, embedded) while still being a first-class backend for AI agents that want to manipulate files, processes, and network resources safely and predictably.
