In [1]:
import os

def list_files(startpath):
    res = ""
    for root, dirs, files in os.walk(startpath):
        level = root.replace(startpath, '').count(os.sep)
        indent = ' ' * 4 * (level)
        res += '{}{}/'.format(indent, os.path.basename(root)) + '\n'
        subindent = ' ' * 4 * (level + 1)
        for f in files:
            res += '{}{}'.format(subindent, f) + '\n'
    return res

file_tree = list_files('../follow-my-reading/')
print(file_tree)

/
    .gitlab-ci.yml
    pyproject.toml
    example.config.json
    main.py
    docs.py
    config.py
    makefile
    mypy.ini
    README.md
    .ruff.toml
    poetry.lock
    .gitignore
    Gemfile
plugins/
    eng_tesseract_plugin.py
    rus_vosk_plugin_disabled.py
    __init__.py
    en_ar_easyocr_plugin.py
    en_ru_easyocr_plugin.py
    multilang_tesseract_plugin.py
    ara_tesseract_plugin.py
    rus_tesseract_plugin.py
    whisper_plugin.py
    ara_vosk_plugin.py
    eng_vosk_plugin.py
.git/
    packed-refs
    config
    HEAD
    index
    description
    info/
        exclude
    hooks/
        commit-msg.sample
        fsmonitor-watchman.sample
        prepare-commit-msg.sample
        post-update.sample
        pre-push.sample
        pre-receive.sample
        sendemail-validate.sample
        pre-commit.sample
        pre-applypatch.sample
        applypatch-msg.sample
        pre-merge-commit.sample
        push-to-checkout.sample
        pre-rebase.sample
        update

In [2]:
from llama_index.core.base.llms.base import ChatMessage
from llama_index.llms.ollama import Ollama

# llm = Ollama('llama3.2')
# llm = Ollama("hf.co/unsloth/gemma-3-12b-it-GGUF:Q4_K_M",request_timeout=3*60)
llm = Ollama("mistral",request_timeout=3*60)

In [3]:
SYSTEM_FIRST_PROMPT="""
You are tasked with explaining to a principal software engineer how to draw the best and most accurate system design diagram / architecture of a given project. This explanation should be tailored to the specific project's purpose and structure. To accomplish this, you will be provided with two key pieces of information:

1. The complete and entire file tree of the project including all directory and file names, which will be enclosed in <file_tree> tags in the users message.

2. The README file of the project, which will be enclosed in <readme> tags in the users message.

Analyze these components carefully, as they will provide crucial information about the project's structure and purpose. Follow these steps to create an explanation for the principal software engineer:

1. Identify the project type and purpose:
   - Examine the file structure and README to determine if the project is a full-stack application, an open-source tool, a compiler, or another type of software imaginable.
   - Look for key indicators in the README, such as project description, features, or use cases.

2. Analyze the file structure:
   - Pay attention to top-level directories and their names (e.g., "frontend", "backend", "src", "lib", "tests").
   - Identify patterns in the directory structure that might indicate architectural choices (e.g., MVC pattern, microservices).
   - Note any configuration files, build scripts, or deployment-related files.

3. Examine the README for additional insights:
   - Look for sections describing the architecture, dependencies, or technical stack.
   - Check for any diagrams or explanations of the system's components.

4. Based on your analysis, explain how to create a system design diagram that accurately represents the project's architecture. Include the following points:

   a. Identify the main components of the system (e.g., frontend, backend, database, building, external services).
   b. Determine the relationships and interactions between these components.
   c. Highlight any important architectural patterns or design principles used in the project.
   d. Include relevant technologies, frameworks, or libraries that play a significant role in the system's architecture.

5. Provide guidelines for tailoring the diagram to the specific project type:
   - For a full-stack application, emphasize the separation between frontend and backend, database interactions, and any API layers.
   - For an open-source tool, focus on the core functionality, extensibility points, and how it integrates with other systems.
   - For a compiler or language-related project, highlight the different stages of compilation or interpretation, and any intermediate representations.

6. Instruct the principal software engineer to include the following elements in the diagram:
   - Clear labels for each component
   - Directional arrows to show data flow or dependencies
   - Color coding or shapes to distinguish between different types of components

7. NOTE: Emphasize the importance of being very detailed and capturing the essential architectural elements. Don't overthink it too much, simply separating the project into as many components as possible is best.

Present your explanation and instructions within <explanation> tags, ensuring that you tailor your advice to the specific project based on the provided file tree and README content.
"""

In [4]:
response = llm.chat(
    [
        ChatMessage(
            role="system",
            content=SYSTEM_FIRST_PROMPT,
        ),
        ChatMessage(
            role="user",
            content=f"<file_tree>\n{file_tree}\n</file_tree>",
        ),
    ]
)
explanation = response.message.content
print(explanation)

 <explanation>
      This project appears to be a command-line tool with an API for speech and text processing, utilizing various plugins for different languages and services. The primary purpose of the software seems to be automatic transcription, translation, and comparison of audio files in multiple languages. Let's create a system design diagram tailored to this specific project:

     1. Main Components:
         - Frontend (not present as it is a command-line tool)
         - Backend (API server and core functionalities)
         - Database (Not explicitly mentioned, but configuration files suggest some form of storage)
         - Plugins (Language-specific modules for text recognition, speech synthesis, and translation)
         - External Services (Tesseract OCR and Whisper, among others)

     2. Relationships and Interactions:
         - The user interacts with the API to send audio files for processing.
         - The backend processes the audio files using appropriate plugi

In [5]:
SYSTEM_SECOND_PROMPT = """
You are tasked with mapping key components of a system design to their corresponding files and directories in a project's file structure. You will be provided with a detailed explanation of the system design/architecture and a file tree of the project.

First, carefully read the system design explanation which will be enclosed in <explanation> tags in the users message.

Then, examine the file tree of the project which will be enclosed in <file_tree> tags in the users message.

Your task is to analyze the system design explanation and identify key components, modules, or services mentioned. Then, try your best to map these components to what you believe could be their corresponding directories and files in the provided file tree.

Guidelines:
1. Focus on major components described in the system design.
2. Look for directories and files that clearly correspond to these components.
3. Include both directories and specific files when relevant.
4. If a component doesn't have a clear corresponding file or directory, simply dont include it in the map.

Now, provide your final answer in the following format:

<component_mapping>
1. [Component Name]: [File/Directory Path]
2. [Component Name]: [File/Directory Path]
[Continue for all identified components]
</component_mapping>

Remember to be as specific as possible in your mappings, only use what is given to you from the file tree, and to strictly follow the components mentioned in the explanation. 
"""

In [6]:
response = llm.chat(
    [
        ChatMessage(
            role="system",
            content=SYSTEM_SECOND_PROMPT,
        ),
        ChatMessage(
            role="user",
            content=f"<explanation>{explanation}</explanation>\n\n<file_tree>\n{file_tree}\n</file_tree>",
        ),
    ]
)
component_mapping = response.message.content
print(component_mapping)

 <component_mapping>
1. Backend (API server and core functionalities): api/
   - v1/ (contains API routes and related functionality)
2. Plugins: plugins/
   - eng_tesseract_plugin.py, rus_vosk_plugin_disabled.py, en_ar_easyocr_plugin.py, en_ru_easyocr_plugin.py, multilang_tesseract_plugin.py, ara_tesseract_plugin.py, rus_tesseract_plugin.py, whisper_plugin.py, ara_vosk_plugin.py, eng_vosk_plugin.py
3. External Services: Not explicitly present in the file tree, but plugins seem to communicate with Tesseract OCR and Whisper
4. Database (Not explicitly mentioned but configuration files suggest some form of storage): Not clearly mapped based on the provided file tree
   </component_mapping>


In [7]:
SYSTEM_THIRD_PROMPT = """
You are a principal software engineer tasked with creating a system design diagram using Mermaid.js based on a detailed explanation. Your goal is to accurately represent the architecture and design of the project as described in the explanation.

The detailed explanation of the design will be enclosed in <explanation> tags in the users message.

Also, sourced from the explanation, as a bonus, a few of the identified components have been mapped to their paths in the project file tree, whether it is a directory or file which will be enclosed in <component_mapping> tags in the users message.

To create the Mermaid.js diagram:

1. Carefully read and analyze the provided design explanation.
2. Identify the main components, services, and their relationships within the system.
3. Determine the appropriate Mermaid.js diagram type to use (e.g., flowchart, sequence diagram, class diagram, architecture, etc.) based on the nature of the system described.
4. Create the Mermaid.js code to represent the design, ensuring that:
   a. All major components are included
   b. Relationships between components are clearly shown
   c. The diagram accurately reflects the architecture described in the explanation
   d. The layout is logical and easy to understand

Guidelines for diagram components and relationships:
- Use appropriate shapes for different types of components (e.g., rectangles for services, cylinders for databases, etc.)
- Use clear and concise labels for each component
- Show the direction of data flow or dependencies using arrows
- Group related components together if applicable
- Include any important notes or annotations mentioned in the explanation
- Just follow the explanation. It will have everything you need.

IMPORTANT!!: Please orient and draw the diagram as vertically as possible. You must avoid long horizontal lists of nodes and sections!

You must include click events for components of the diagram that have been specified in the provided <component_mapping>:
- Do not try to include the full url. This will be processed by another program afterwards. All you need to do is include the path.
- For example:
  - This is a correct click event: `click Example "app/example.js"`
  - This is an incorrect click event: `click Example "https://github.com/username/repo/blob/main/app/example.js"`
- Do this for as many components as specified in the component mapping, include directories and files.
  - If you believe the component contains files and is a directory, include the directory path.
  - If you believe the component references a specific file, include the file path.
- Make sure to include the full path to the directory or file exactly as specified in the component mapping.
- It is very important that you do this for as many files as possible. The more the better.

- IMPORTANT: THESE PATHS ARE FOR CLICK EVENTS ONLY, these paths should not be included in the diagram's node's names. Only for the click events. Paths should not be seen by the user.

Your output should be valid Mermaid.js code that can be rendered into a diagram.

Do not include an init declaration such as `%%{init: {'key':'etc'}}%%`. This is handled externally. Just return the diagram code.

Your response must strictly be just the Mermaid.js code, without any additional text or explanations.
No code fence or markdown ticks needed, simply return the Mermaid.js code.

Ensure that your diagram adheres strictly to the given explanation, without adding or omitting any significant components or relationships. 

For general direction, the provided example below is how you should structure your code:

```mermaid
flowchart TD 
    %% or graph TD, your choice

    %% Global entities
    A("Entity A"):::external
    %% more...

    %% Subgraphs and modules
    subgraph "Layer A"
        A1("Module A"):::example
        %% more modules...
        %% inner subgraphs if needed...
    end

    %% more subgraphs, modules, etc...

    %% Connections
    A -->|"relationship"| B
    %% and a lot more...

    %% Click Events
    click A1 "example/example.js"
    %% and a lot more...

    %% Styles
    classDef frontend %%...
    %% and a lot more...
```

EXTREMELY Important notes on syntax!!! (PAY ATTENTION TO THIS):
- Make sure to add colour to the diagram!!! This is extremely critical.
- In Mermaid.js syntax, we cannot include special characters for nodes without being inside quotes! For example: `EX[/api/process (Backend)]:::api` and `API -->|calls Process()| Backend` are two examples of syntax errors. They should be `EX["/api/process (Backend)"]:::api` and `API -->|"calls Process()"| Backend` respectively. Notice the quotes. This is extremely important. Make sure to include quotes for any string that contains special characters.
- In Mermaid.js syntax, you cannot apply a class style directly within a subgraph declaration. For example: `subgraph "Frontend Layer":::frontend` is a syntax error. However, you can apply them to nodes within the subgraph. For example: `Example["Example Node"]:::frontend` is valid, and `class Example1,Example2 frontend` is valid.
- In Mermaid.js syntax, there cannot be spaces in the relationship label names. For example: `A -->| "example relationship" | B` is a syntax error. It should be `A -->|"example relationship"| B` 
- In Mermaid.js syntax, you cannot give subgraphs an alias like nodes. For example: `subgraph A "Layer A"` is a syntax error. It should be `subgraph "Layer A"` 
"""

In [8]:
response = llm.chat(
    [
        ChatMessage(
            role="system",
            content=SYSTEM_THIRD_PROMPT,
        ),
        ChatMessage(
            role="user",
            content=f"<explanation>{explanation}</explanation>\n\n<component_mapping>\n{component_mapping}\n</component_mapping>",
        ),
    ]
)
mermaid = response.message.content
print(mermaid)

 ```mermaid
flowchart TD

    subgraph "API Server":::backend
        V1("v1/"):::api
        Plugins("plugins/"):::plugins
        Database("Database (storage)"):::database
    end

    V1 -->|"Receives audio files"| Plugins
    Plugins -->|"Uses plugins for processing"| TesseractOCR, Whisper
    TesseractOCR, Whisper -->|"Performs text recognition and speech synthesis"| Plugins
    Plugins -->|"Stores results in the database"| Database
    Database -->|"Returns results to the user"| V1

    click V1 "api/"
    click Plugins "../plugins/*.py"
    click TesseractOCR "tesseract"
    click Whisper "whisper"
```


In [13]:
print(mermaid[12:-3])

flowchart TD

    subgraph "API Server":::backend
        V1("v1/"):::api
        Plugins("plugins/"):::plugins
        Database("Database (storage)"):::database
    end

    V1 -->|"Receives audio files"| Plugins
    Plugins -->|"Uses plugins for processing"| TesseractOCR, Whisper
    TesseractOCR, Whisper -->|"Performs text recognition and speech synthesis"| Plugins
    Plugins -->|"Stores results in the database"| Database
    Database -->|"Returns results to the user"| V1

    click V1 "api/"
    click Plugins "../plugins/*.py"
    click TesseractOCR "tesseract"
    click Whisper "whisper"



In [None]:
import mermaid as md
from mermaid.graph import Graph


sequence = Graph('DFC2', mermaid[12:-3])
render = md.Mermaid(sequence)
render # !! note this only works in the notebook that rendered the html.