# Domain-Specific Agent Pipeline Diagram Description

This artifact describes the graphical image of the Domain-Specific Agent Pipeline diagram that would be generated by the `diagram.py` script using the `diagrams` Python library. It also provides instructions to generate the image yourself, as direct image generation is not possible in this text-based interface.

## Visual Description of the Diagram

The diagram is a flowchart representing the Domain-Specific Agent Pipeline, structured as a top-to-bottom layout with orthogonal (right-angle) edges for clarity. It consists of six main clusters, each representing a high-level component of the pipeline, with nested sub-clusters for subcomponents and tools. Below is a detailed description of its appearance:

### Overall Layout
- **Shape**: Rectangular, with a white background and black text, sized to fit all components (approximately 1200x2000 pixels, adjustable based on rendering).
- **Orientation**: Top-to-bottom (`rankdir="TB"`), with clusters stacked vertically and connected by blue arrows to show the high-level flow.
- **Clusters**: Each high-level component is enclosed in a rounded rectangle with a light gray background, labeled at the top in bold (e.g., "Data Collection").
- **Edges**: Orthogonal black arrows connect subcomponents within clusters, with blue arrows connecting the main components across clusters.
- **Padding**: Moderate padding around the diagram ensures components are not cramped.

### High-Level Components and Clusters
The diagram includes six main clusters, each containing subcomponents and a nested "Tools" cluster:

1. **Data Collection Cluster** (Topmost):
   - **Appearance**: A rounded rectangle with a label "Data Collection" at the top.
   - **Subcomponents**: Ten rectangular nodes arranged in a vertical sequence, each with a label and an icon:
     - API Integration (generic client icon, e.g., a computer)
     - Web Scraping (generic client icon)
     - Database Queries (PostgreSQL icon, a database cylinder)
     - Data Cleaning (Pandas icon, a panda head)
     - Data Validation (generic client icon)
     - Data Profiling (generic client icon)
     - Data Visualization (generic client icon)
     - Data Storage (generic client icon)
     - Vector DB Conversion (generic client icon)
     - Version Control (generic client icon)
   - **Tools Sub-Cluster**: A smaller rounded rectangle labeled "Tools" containing eight nodes:
     - Requests, Scrapy/BeautifulSoup, SQLAlchemy, Pandas, Great Expectations, Plotly, DVC, Milvus (each with a generic client icon or Pandas icon for Pandas).
   - **Connections**: Black arrows connect subcomponents in sequence (e.g., API Integration → Data Cleaning → Data Validation → ... → Version Control). Additional arrows from API Integration, Web Scraping, and Database Queries converge on Data Cleaning.
   - **Position**: Top of the diagram, centered.

2. **Memory Management Cluster**:
   - **Appearance**: Below Data Collection, similar rounded rectangle with label "Memory Management".
   - **Subcomponents**: Eleven nodes in a grid-like arrangement (due to multiple parallel paths):
     - Data Isolation, Vector Database, Caching (Redis icon, a red database), Data Retention, Encryption, Indexing, Redis, Archiving, Key Management, Sharding, Backup (mostly generic client icons).
   - **Tools Sub-Cluster**: Contains RBAC, Milvus, Redis, AWS S3, AWS KMS (generic client or Redis icons).
   - **Connections**: Arrows connect related subcomponents (e.g., Data Isolation → Encryption → Key Management; Vector Database → Indexing → Sharding).
   - **Position**: Directly below Data Collection, aligned vertically.

3. **Domain Context Management Cluster**:
   - **Appearance**: Rounded rectangle labeled "Domain Context Management".
   - **Subcomponents**: Nine nodes in three parallel columns:
     - Domain Ontology → OWL → Reasoning
     - Knowledge Graph → Neo4j → Querying
     - DSL → ANTLR → Parsing
   - **Tools Sub-Cluster**: Protégé, Neo4j, ANTLR, SPARQL, Cypher (generic client icons).
   - **Connections**: Arrows within each column (e.g., Domain Ontology → OWL → Reasoning).
   - **Position**: Below Memory Management.

4. **Model Selection Cluster**:
   - **Appearance**: Rounded rectangle labeled "Model Selection".
   - **Subcomponents**: Nine nodes in three columns:
     - Model Evaluation → Metrics → Cross-Validation
     - Model Tuning → Hyperparameter Tuning → Grid Search
     - Model Deployment → Model Serving → FastAPI
   - **Tools Sub-Cluster**: Scikit-learn, Optuna, Docker (Docker icon, a whale), FastAPI (generic client icons).
   - **Connections**: Arrows within each column.
   - **Position**: Below Domain Context Management.

5. **Infrastructure as Code Cluster**:
   - **Appearance**: Rounded rectangle labeled "Infrastructure as Code".
   - **Subcomponents**: Eight nodes in four columns:
     - Terraform (Terraform icon, a wrench) → Resource Provisioning
     - Docker → Containerization
     - Kubernetes (Kubernetes icon, a helm) → Orchestration
     - CI/CD → GitHub Actions (GitHub icon, a cat silhouette)
   - **Tools Sub-Cluster**: Terraform, Docker, Kubernetes, GitHub Actions.
   - **Connections**: Arrows within each column.
   - **Position**: Below Model Selection.

6. **Monitoring & Maintenance Cluster** (Bottommost):
   - **Appearance**: Rounded rectangle labeled "Monitoring & Maintenance".
   - **Subcomponents**: Six nodes in three columns:
     - Prometheus (Prometheus icon, a flame) → Metrics Collection
     - Grafana (Grafana icon, a graph) → Dashboards
     - Logging → ELK Stack
   - **Tools Sub-Cluster**: Prometheus, Grafana, ELK Stack.
   - **Connections**: Arrows within each column.
   - **Position**: Bottom of the diagram.

### High-Level Flow
- **Blue Arrows**: Thick blue arrows connect the first node of each cluster to the first node of the next cluster, forming a vertical backbone:
  - API Integration (Data Collection) → Data Isolation (Memory Management)
  - Data Isolation → Domain Ontology (Domain Context Management)
  - Domain Ontology → Model Evaluation (Model Selection)
  - Model Evaluation → Terraform (Infrastructure as Code)
  - Terraform → Prometheus (Monitoring & Maintenance)
- These arrows are labeled with "to" for clarity.

### Visual Style
- **Nodes**: Rectangles with rounded corners, white fill, black borders, and icons (e.g., PostgreSQL’s cylinder, Pandas’ panda head). Labels are centered in a clear, sans-serif font (e.g., Helvetica, size 12).
- **Clusters**: Light gray rounded rectangles with bold labels at the top.
- **Edges**: Black orthogonal arrows within clusters, blue arrows between clusters, with small text labels ("to") where applicable.
- **Icons**: Standard `diagrams` library icons (e.g., Redis’ red database, Kubernetes’ helm) provide visual distinction for key tools, while generic client icons (a computer) are used for others.
- **Clarity**: Orthogonal edges and generous padding prevent overlap, making the diagram readable even with many nodes.

### Expected Output
- **File**: `agent_pipeline.png`, a high-resolution PNG image (e.g., 1200x2000 pixels, depending on rendering settings).
- **Interactivity**: The PNG can be zoomed and panned in image viewers (e.g., Windows Photos, macOS Preview) or browsers. It’s not clickable like a Draw.io diagram but is clear and detailed for reference.
- **Use Case**: Suitable for presentations, documentation, or printing, capturing the entire pipeline’s topology.

## Instructions to Generate the Image

To generate the `agent_pipeline.png` image yourself, follow these steps using the `diagram.py` script provided in the earlier response. I’ll provide a simplified version of the script here for convenience, modified to avoid custom icons and ensure it runs immediately.

### Step 1: Install Dependencies
1. **Install Python**: Ensure Python 3.7+ is installed (https://www.python.org/downloads/).
2. **Install the `diagrams` Library**:
   ```bash
   pip install diagrams
   ```
3. **Install Graphviz**:
   - **Windows**: Download and install from https://graphviz.org/download/, then add the `bin` folder to your system PATH.
   - **macOS**: `brew install graphviz` (requires Homebrew: https://brew.sh/).
   - **Linux**: `sudo apt-get install graphviz` (Ubuntu/Debian) or equivalent.
   - Verify Graphviz installation: `dot -V` in a terminal should return the version.

### Step 2: Save the Script
Copy the following simplified `diagram.py` script into a file named `diagram.py`. This version uses only built-in `diagrams` nodes to avoid needing custom PNG files.

```python
from diagrams import Diagram, Cluster, Edge
from diagrams.onprem.database import PostgreSQL
from diagrams.onprem.inmemory import Redis
from diagrams.onprem.monitoring import Prometheus, Grafana
from diagrams.onprem.container import Docker
from diagrams.onprem.orchestration import Kubernetes
from diagrams.onprem.workflow import Terraform
from diagrams.onprem.vcs import Github
from diagrams.onprem.analytics import Pandas
from diagrams.onprem.client import Client

graph_attr = {
    "fontsize": "12",
    "bgcolor": "white",
    "splines": "ortho",
    "rankdir": "TB",
    "pad": "0.5",
}

with Diagram("Domain-Specific Agent Pipeline", show=False, outformat="png", graph_attr=graph_attr, filename="agent_pipeline"):
    with Cluster("Data Collection"):
        api = Client("API Integration")
        scrape = Client("Web Scraping")
        db = PostgreSQL("Database Queries")
        clean = Pandas("Data Cleaning")
        validate = Client("Data Validation")
        profile = Client("Data Profiling")
        visualize = Client("Data Visualization")
        store = Client("Data Storage")
        vector = Client("Vector DB Conversion")
        version = Client("Version Control")
        with Cluster("Tools"):
            t_requests = Client("Requests")
            t_scrapy = Client("Scrapy, BeautifulSoup")
            t_sqlalchemy = Client("SQLAlchemy")
            t_pandas = Pandas("Pandas")
            t_great_expectations = Client("Great Expectations")
            t_plotly = Client("Plotly")
            t_dvc = Client("DVC")
            t_milvus = Client("Milvus")
        api >> Edge(label="to") >> clean
        scrape >> Edge(label="to") >> clean
        db >> Edge(label="to") >> clean
        clean >> Edge(label="to") >> validate
        validate >> Edge(label="to") >> profile
        profile >> Edge(label="to") >> visualize
        visualize >> Edge(label="to") >> store
        store >> Edge(label="to") >> vector
        vector >> Edge(label="to") >> version

    with Cluster("Memory Management"):
        isolation = Client("Data Isolation")
        vector_db = Client("Vector Database")
        caching = Redis("Caching")
        retention = Client("Data Retention")
        encryption = Client("Encryption")
        indexing = Client("Indexing")
        redis = Redis("Redis")
        archiving = Client("Archiving")
        key_management = Client("Key Management")
        sharding = Client("Sharding")
        backup = Client("Backup")
        with Cluster("Tools"):
            t_rbac = Client("RBAC")
            t_milvus_mm = Client("Milvus")
            t_redis = Redis("Redis")
            t_aws_s3 = Client("AWS S3")
            t_aws_kms = Client("AWS KMS")
        isolation >> Edge(label="to") >> encryption
        vector_db >> Edge(label="to") >> indexing
        caching >> Edge(label="to") >> redis
        retention >> Edge(label="to") >> archiving
        encryption >> Edge(label="to") >> key_management
        indexing >> Edge(label="to") >> sharding
        archiving >> Edge(label="to") >> backup

    with Cluster("Domain Context Management"):
        ontology = Client("Domain Ontology")
        knowledge_graph = Client("Knowledge Graph")
        dsl = Client("DSL")
        owl = Client("OWL")
        neo4j = Client("Neo4j")
        antlr = Client("ANTLR")
        reasoning = Client("Reasoning")
        querying = Client("Querying")
        parsing = Client("Parsing")
        with Cluster("Tools"):
            t_protege = Client("Protégé")
            t_neo4j = Client("Neo4j")
            t_antlr = Client("ANTLR")
            t_sparql = Client("SPARQL")
            t_cypher = Client("Cypher")
        ontology >> Edge(label="to") >> owl
        knowledge_graph >> Edge(label="to") >> neo4j
        dsl >> Edge(label="to") >> antlr
        owl >> Edge(label="to") >> reasoning
        neo4j >> Edge(label="to") >> querying
        antlr >> Edge(label="to") >> parsing

    with Cluster("Model Selection"):
        evaluation = Client("Model Evaluation")
        tuning = Client("Model Tuning")
        deployment = Client("Model Deployment")
        metrics = Client("Metrics")
        hyperparameter = Client("Hyperparameter Tuning")
        serving = Client("Model Serving")
        cross_validation = Client("Cross-Validation")
        grid_search = Client("Grid Search")
        fastapi = Client("FastAPI")
        with Cluster("Tools"):
            t_sklearn = Client("Scikit-learn")
            t_optuna = Client("Optuna")
            t_docker = Docker("Docker")
            t_fastapi = Client("FastAPI")
        evaluation >> Edge(label="to") >> metrics
        tuning >> Edge(label="to") >> hyperparameter
        deployment >> Edge(label="to") >> serving
        metrics >> Edge(label="to") >> cross_validation
        hyperparameter >> Edge(label="to") >> grid_search
        serving >> Edge(label="to") >> fastapi

    with Cluster("Infrastructure as Code"):
        terraform = Terraform("Terraform")
        docker = Docker("Docker")
        kubernetes = Kubernetes("Kubernetes")
        cicd = Client("CI/CD")
        provisioning = Client("Resource Provisioning")
        containerization = Client("Containerization")
        orchestration = Client("Orchestration")
        github_actions = Github("GitHub Actions")
        with Cluster("Tools"):
            t_terraform = Terraform("Terraform")
            t_docker = Docker("Docker")
            t_kubernetes = Kubernetes("Kubernetes")
            t_github_actions = Github("GitHub Actions")
        terraform >> Edge(label="to") >> provisioning
        docker >> Edge(label="to") >> containerization
        kubernetes >> Edge(label="to") >> orchestration
        cicd >> Edge(label="to") >> github_actions

    with Cluster("Monitoring & Maintenance"):
        prometheus = Prometheus("Prometheus")
        grafana = Grafana("Grafana")
        logging = Client("Logging")
        metrics_collection = Client("Metrics Collection")
        dashboards = Client("Dashboards")
        elk = Client("ELK Stack")
        with Cluster("Tools"):
            t_prometheus = Prometheus("Prometheus")
            t_grafana = Grafana("Grafana")
            t_elk = Client("ELK Stack")
        prometheus >> Edge(label="to") >> metrics_collection
        grafana >> Edge(label="to") >> dashboards
        logging >> Edge(label="to") >> elk

    api >> Edge(label="to", color="blue") >> isolation
    isolation >> Edge(label="to", color="blue") >> ontology
    ontology >> Edge(label="to", color="blue") >> evaluation
    evaluation >> Edge(label="to", color="blue") >> terraform
    terraform >> Edge(label="to", color="blue") >> prometheus
```

### Step 3: Run the Script
1. **Save the Script**:
   - Copy the above code into a file named `diagram.py` using a text editor (e.g., VS Code, Notepad).
2. **Open a Terminal**:
   - Navigate to the directory containing `diagram.py`:
     ```bash
     cd path/to/your/directory
     ```
3. **Run the Script**:
   ```bash
   python diagram.py
   ```
   - This generates `agent_pipeline.png` in the same directory.
4. **View the Image**:
   - Open `agent_pipeline.png` in an image viewer (e.g., Windows Photos, macOS Preview) or a browser.
   - Zoom and pan to explore the diagram’s details.

### Troubleshooting
- **Graphviz Not Found**:
  - If you get an error like `ExecutableNotFound`, ensure Graphviz is installed and added to your system PATH.
  - Reinstall Graphviz and restart your terminal.
- **Module Not Found**:
  - If `diagrams` is not found, re-run `pip install diagrams`.
- **Output Issues**:
  - If no PNG is generated, check for errors in the terminal. Ensure Python and all dependencies are up to date.
  - Try changing `outformat="svg"` in the script to generate an SVG file instead (`agent_pipeline.svg`), which may work better in some environments.

## Alternatives and Clarifications
- **Why No Direct Image?**: My interface is text-only, so I cannot render or embed images directly. The `diagrams` library generates the image on your system, which is the most accurate way to produce the diagram as intended.
- **Different Tool?**: If you meant to use a different tool (e.g., Draw.io, Matplotlib, or a manual drawing), please clarify. For example:
  - **Draw.io**: I can provide XML files (as done previously) to import into Draw.io.
  - **Matplotlib**: I can create a simpler diagram using Matplotlib, but it’s less suited for complex flowcharts.
- **Need the Image File?**: If you cannot run the script (e.g., no Python environment), consider:
  - Sharing the script with someone who can run it.
  - Using an online Python environment (e.g., Google Colab, though Graphviz setup may be tricky).
  - Requesting a different format or tool that aligns with your needs.

## Next Steps
If you encounter issues running the script or need the image in a specific format (e.g., SVG, PDF), let me know, and I can provide additional guidance or adjust the script. If you meant something else by "draw an image" (e.g., a web-based interactive diagram or a different visual style), please provide more details, and I’ll tailor the response accordingly.

In [83]:
# from diagrams.aws.security import KeyManagementService
# from diagrams.aws.storage import SimpleStorageServiceS3
# # File: src/diagrams/diagram.ipynb
# # This script generates a diagram using the Diagrams library with custom icons.
# # Agent Pipeline Diagram
# import os
# from diagrams import Diagram, Cluster, Edge
# from diagrams.onprem.database import PostgreSQL
# from diagrams.onprem.inmemory import Redis
# from diagrams.onprem.monitoring import Prometheus, Grafana
# from diagrams.onprem.container import Docker
# from diagrams.k8s.controlplane import Kubelet
# from diagrams.onprem.iac import Terraform
# from diagrams.onprem.vcs import Github, Git
# # from diagrams.onprem.analytics import Pandas
# from diagrams.custom import Custom
#
# resources_path = os.path.join(os.path.dirname(os.path.abspath(".")), "diagrams/resources")
#
# print(f"Diagram Resources Path: \n{resources_path}")
#
# # # Diagram configuration
# # graph_attr = {
# #     "fontsize": "20",
# #     "bgcolor": "white",
# #     "splines": "ortho",
# #     "rankdir": "TB",    # Top to Bottom layout
# #     "size": "10,10!",   # Increase graph size
# #     "nodesep": "2",     # Increase spacing between nodes
# #     "ranksep": "2",     # Increase spacing between ranks
# # }
#
# with Diagram("Domain Specific Agent Pipeline",
#              show=True,
#              outformat="png",
#              filename="docs/domain_specific_agent_pipeline",
#              # Diagram configuration
#              graph_attr={
#                  "label": "Domain Specific Agent Pipeline",
#                  "fontsize": "20",
#                  # "bgcolor": "white",
#                  "splines": "ortho",
#                  "style": "rounded",
#                  "rankdir": "TB",    # Top to Bottom layout
#                  "size": "10,10!",   # Increase graph size
#                  "nodesep": "2",     # Increase spacing between nodes
#                  "ranksep": "2",     # Increase spacing between ranks
#                 }
#              ):
#
#
#     # High-Level Components
#     with Cluster("Data Collection",
#                  graph_attr={
#                      "label": "Data Collection",
#                      "fontsize": "18",
#                      # "bgcolor": "#f0f0f0",
#                      "splines": "ortho",
#                      "style": "rounded",
#                      "rankdir": "LR"
#                  }):
#         api = Custom("API Integration",
#                      os.path.join(resources_path, "api.png"))
#         scrape = Custom("Web Scraping",
#                         os.path.join(resources_path, "scrape.png"))
#         db = PostgreSQL("Database Queries")
#         clean = Custom("Data Cleaning",
#                        os.path.join(resources_path, "data_cleaning.png"))
#         validate = Custom("Data Validation",
#                           os.path.join(resources_path, "validate.png"))
#         profile = Custom("Data Profiling",
#                          os.path.join(resources_path, "profile.png"))
#         visualize = Custom("Data Visualization",
#                            os.path.join(resources_path, "visualize.png"))
#         store = Custom("Data Storage",
#                        os.path.join(resources_path, "storage.png"))
#         vector = Custom("Vector DB Conversion",
#                         os.path.join(resources_path, "vector_data_store.png"))
#         version = Custom("Version Control",
#                          os.path.join(resources_path, "version_control.png"))
#
#
#         # Tools for Data Collection
#         with Cluster("Tools",
#                      graph_attr={
#                      "label": "Data Collection",
#                      "fontsize": "18",
#                      # "bgcolor": "#f0f0f0",
#                      "splines": "ortho",
#                      "style": "rounded",
#                      "rankdir": "LR"
#                  }):
#             t_requests = Custom("Requests",
#                                 os.path.join(resources_path, "requests.png"))
#             t_scrapy = Custom("Scrapy, BeautifulSoup",
#                               os.path.join(resources_path, "scrapy.png"))
#             t_sqlalchemy = Custom("SQLAlchemy",
#                                   os.path.join(resources_path, "sqlalchemy.png"))
#             t_pandas = Custom("Pandas",
#                               os.path.join(resources_path, "data_cleaning.png"))
#             t_great_expectations = Custom("Great Expectations",
#                                           os.path.join(resources_path, "great_expectations.png"))
#             t_plotly = Custom("Plotly",
#                               os.path.join(resources_path, "plotly.png"))
#             t_dvc = Custom("Data Version Control",
#                            os.path.join(resources_path, "version_control.png"))
#             t_milvus = Custom("Milvus",
#                               os.path.join(resources_path, "milvus.png"))
#
#
#         # Data Collection Flow
#         api >> Edge(label="to") >> clean
#         scrape >> Edge(label="to") >> clean
#         db >> Edge(label="to") >> clean
#         clean >> Edge(label="to") >> validate
#         validate >> Edge(label="to") >> profile
#         profile >> Edge(label="to") >> visualize
#         visualize >> Edge(label="to") >> store
#         store >> Edge(label="to") >> vector
#         vector >> Edge(label="to") >> version
#
#         with Cluster("Memory Management",
#                      graph_attr={
#                      "label": "Data Collection",
#                      "fontsize": "18",
#                      # "bgcolor": "#f0f0f0",
#                      "splines": "ortho",
#                      "style": "rounded",
#                      "rankdir": "LR"
#                  }):
#             isolation = Custom("Data Isolation",
#                                os.path.join(resources_path, "data_isolation.png"))
#             # vector_db = Custom("Vector Database",
#             #                    os.path.join(resources_path, "vector_data_store.png"))
#             caching = Redis("Caching")
#             retention = Custom("Data Retention",
#                                os.path.join(resources_path, "data_retention.png"))
#             encryption = Custom("Encryption",
#                                 os.path.join(resources_path, "data_encryption.png"))
#             indexing = Custom("Indexing",
#                               os.path.join(resources_path, "indexing.png"))
#             redis = Redis("Redis")
#             archiving = Custom("Archiving",
#                                os.path.join(resources_path, "data_archiving.png"))
#             key_management = Custom("Key Management",
#                                     os.path.join(resources_path, "hashicorp_key_management.png"))
#             sharding = Custom("Sharding",
#                               os.path.join(resources_path, "data_sharding.png"))
#             backup = Custom("Backup",
#                             os.path.join(resources_path, "data_backup.png"))
#
#             # Tools for Memory Management
#             with Cluster("Tools",
#                      graph_attr={
#                      "label": "Data Collection",
#                      "fontsize": "18",
#                      # "bgcolor": "#f0f0f0",
#                      "splines": "ortho",
#                      "style": "rounded",
#                      "rankdir": "LR"
#                  }):
#                 t_rbac = Custom("RBAC",
#                                 os.path.join(resources_path, "rbac.png"))
#                 t_milvus_mm = Custom("Milvus",
#                                      os.path.join(resources_path, "milvus.png"))
#                 t_redis = Redis("Redis")
#                 t_aws_s3 = SimpleStorageServiceS3("AWS Storage S3")
#                 t_aws_kms = KeyManagementService("AWS Key Management Service")
#
#             # Memory Management Flow
#             isolation >> Edge(label="to") >> encryption
#             vector >> Edge(label="to") >> indexing
#             caching >> Edge(label="to") >> redis
#             retention >> Edge(label="to") >> archiving
#             encryption >> Edge(label="to") >> key_management
#             indexing >> Edge(label="to") >> sharding
#             archiving >> Edge(label="to") >> backup




    # with Cluster("Domain Context Management"):
    #     ontology = Custom("Domain Ontology", os.path.join(resources_path, "ontology.png"))
    #     knowledge_graph = Custom("Knowledge Graph", os.path.join(resources_path, "knowledge_graph.png"))
    #     dsl = Custom("DSL", os.path.join(resources_path, "dsl.png"))
    #     owl = Custom("OWL", os.path.join(resources_path, "owl.png"))
    #     neo4j = Custom("Neo4j", os.path.join(resources_path, "neo4j.png"))
    #     antlr = Custom("ANTLR", os.path.join(resources_path, "antlr.png"))
    #     reasoning = Custom("Reasoning", os.path.join(resources_path, "reasoning.png"))
    #     querying = Custom("Querying", os.path.join(resources_path, "querying.png"))
    #     parsing = Custom("Parsing", os.path.join(resources_path, "parsing.png"))
    #
    #     # Tools for Domain Context Management
    #     with Cluster("Tools"):
    #         t_protege = Custom("Protégé", os.path.join(resources_path, "protege.png"))
    #         t_neo4j = Custom("Neo4j", os.path.join(resources_path, "neo4j.png"))
    #         t_antlr = Custom("ANTLR", os.path.join(resources_path, "antlr.png"))
    #         t_sparql = Custom("SPARQL", os.path.join(resources_path, "sparql.png"))
    #         t_cypher = Custom("Cypher", os.path.join(resources_path, "cypher.png"))
    #
    #     # Domain Context Management Flow
    #     ontology >> Edge(label="to") >> owl
    #     knowledge_graph >> Edge(label="to") >> neo4j
    #     dsl >> Edge(label="to") >> antlr
    #     owl >> Edge(label="to") >> reasoning
    #     neo4j >> Edge(label="to") >> querying
    #     antlr >> Edge(label="to") >> parsing
    #
    # with Cluster("Model Selection"):
    #     evaluation = Custom("Model Evaluation", os.path.join(resources_path, "evaluation.png"))
    #     tuning = Custom("Model Tuning", os.path.join(resources_path, "tuning.png"))
    #     deployment = Custom("Model Deployment", os.path.join(resources_path, "deployment.png"))
    #     metrics = Custom("Metrics", os.path.join(resources_path, "metrics.png"))
    #     hyperparameter = Custom("Hyperparameter Tuning", os.path.join(resources_path, "hyperparameter.png"))
    #     serving = Custom("Model Serving", os.path.join(resources_path, "serving.png"))
    #     cross_validation = Custom("Cross-Validation", os.path.join(resources_path, "cross_validation.png"))
    #     grid_search = Custom("Grid Search", os.path.join(resources_path, "grid_search.png"))
    #     fastapi = Custom("FastAPI", os.path.join(resources_path, "fastapi.png"))
    #
    #     # Tools for Model Selection
    #     with Cluster("Tools"):
    #         t_sklearn = Custom("Scikit-learn", os.path.join(resources_path, "sklearn.png"))
    #         t_optuna = Custom("Optuna", os.path.join(resources_path, "optuna.png"))
    #         t_docker = Docker("Docker")
    #         t_fastapi = Custom("FastAPI", os.path.join(resources_path, "fastapi.png"))
    #
    #     # Model Selection Flow
    #     evaluation >> Edge(label="to") >> metrics
    #     tuning >> Edge(label="to") >> hyperparameter
    #     deployment >> Edge(label="to") >> serving
    #     metrics >> Edge(label="to") >> cross_validation
    #     hyperparameter >> Edge(label="to") >> grid_search
    #     serving >> Edge(label="to") >> fastapi
    #
    # with Cluster("Infrastructure as Code"):
    #     terraform = Terraform("Terraform")
    #     docker = Docker("Docker")
    #     kubernetes = Kubelet("Kubernetes")
    #     cicd = Custom("CI/CD", os.path.join(resources_path, "cicd.png"))
    #     provisioning = Custom("Resource Provisioning", os.path.join(resources_path, "provisioning.png"))
    #     containerization = Custom("Containerization", os.path.join(resources_path, "containerization.png"))
    #     orchestration = Custom("Orchestration", os.path.join(resources_path, "orchestration.png"))
    #     github_actions = Github("GitHub Actions")
    #
    #     # Tools for Infrastructure as Code
    #     with Cluster("Tools"):
    #         t_terraform = Terraform("Terraform")
    #         t_docker = Docker("Docker")
    #         t_kubernetes = Kubelet("Kubernetes")
    #         t_github_actions = Github("GitHub Actions")
    #
    #     # Infrastructure as Code Flow
    #     terraform >> Edge(label="to") >> provisioning
    #     docker >> Edge(label="to") >> containerization
    #     kubernetes >> Edge(label="to") >> orchestration
    #     cicd >> Edge(label="to") >> github_actions
    #
    # with Cluster("Monitoring & Maintenance"):
    #     prometheus = Prometheus("Prometheus")
    #     grafana = Grafana("Grafana")
    #     logging = Custom("Logging", os.path.join(resources_path, "logging.png"))
    #     metrics_collection = Custom("Metrics Collection", os.path.join(resources_path, "metrics_collection.png"))
    #     dashboards = Custom("Dashboards", os.path.join(resources_path, "dashboards.png"))
    #     elk = Custom("ELK Stack", os.path.join(resources_path, "elk.png"))
    #
    #     # Tools for Monitoring & Maintenance
    #     with Cluster("Tools"):
    #         t_prometheus = Prometheus("Prometheus")
    #         t_grafana = Grafana("Grafana")
    #         t_elk = Custom("ELK Stack", os.path.join(resources_path, "elk.png"))
    #
    #     # Monitoring & Maintenance Flow
    #     prometheus >> Edge(label="to") >> metrics_collection
    #     grafana >> Edge(label="to") >> dashboards
    #     logging >> Edge(label="to") >> elk
    #
    # # High-Level Pipeline Flow
    # api >> Edge(label="to", color="blue") >> isolation
    # isolation >> Edge(label="to", color="blue") >> ontology
    # ontology >> Edge(label="to", color="blue") >> evaluation
    # evaluation >> Edge(label="to", color="blue") >> terraform
    # terraform >> Edge(label="to", color="blue") >> prometheus

# Notes:
# - Install dependencies: `pip install diagrams`
# - Install Graphviz: https://graphviz.org/download/
# - Custom icons in ./resources/ should be PNG files representing each component (create placeholders or download icons).
# - Output: agent_pipeline.png in the working directory.
# - To modify, edit the node connections or add new clusters for extensibility.
# - Linkage: Subcomponents are grouped in clusters; refer to cluster names for high-level to low-level mapping.

[36m[2025-05-04 05:36:29][DEBUG][graphviz._tools][48]: os.makedirs('docs')[0m
[36m[2025-05-04 05:36:29][DEBUG][graphviz.saving][78]: write lines to 'docs/domain_specific_agent_pipeline'[0m
[36m[2025-05-04 05:36:29][DEBUG][graphviz.backend.execute][61]: run [PosixPath('dot'), '-Kdot', '-Tpng', '-O', 'domain_specific_agent_pipeline'][0m


Diagram Resources Path: 
/Users/dalexander/SynologyDrive/Repos/milvus_ha_cluster/src/diagrams/resources


[36m[2025-05-04 05:36:30][DEBUG][graphviz.backend.viewing][48]: view: ['open', 'docs/domain_specific_agent_pipeline.png'][0m


In [64]:
from diagrams import Diagram
from diagrams.aws.compute import EC2
from diagrams.aws.database import RDS
from diagrams.aws.network import ELB

# print("Original PATH:", os.environ["PATH"])
with Diagram("Grouped Workers", show=False,
             direction="TB", outformat="png",
             graph_attr={
                 "fontsize": "20",
                 "splines": "ortho",
                 "style": "rounded",
                 "rankdir": "TB",    # Top to Bottom layout
                 "size": "10,10!",   # Increase graph size
                 "nodesep": "2",     # Increase spacing between nodes
                 "ranksep": "2",     # Increase spacing between ranks
                },
             filename="docs/grouped_workers"):
    ELB("lb") >> [EC2("worker1"),
                  EC2("worker2"),
                  EC2("worker3"),
                  EC2("worker4"),
                  EC2("worker5")] >> RDS("events")


In [87]:
from diagrams import Cluster, Diagram, Edge
from diagrams.custom import Custom
from diagrams.onprem.analytics import Spark as Pandas
from diagrams.onprem.client import Client
from diagrams.onprem.container import Docker
from diagrams.onprem.database import PostgreSQL
from diagrams.onprem.iac import Terraform
from diagrams.onprem.inmemory import Redis
from diagrams.onprem.monitoring import Grafana, Prometheus

# from diagrams.onprem.orchestration import Kubernetes
# from diagrams.onprem.workflow import Terraform
from diagrams.onprem.vcs import Github

graph_attr = {
    "fontsize": "12",
    "bgcolor": "white",
    "splines": "ortho",
    "rankdir": "TB",
    "pad": "0.5",
}

with Diagram("Domain Specific Agent Pipeline",
             show=True, outformat="png",
             graph_attr=graph_attr, filename="docs/agent_pipeline"):

    with Cluster("Data Collection"):
        api = Client("API Integration")
        scrape = Client("Web Scraping")
        db = PostgreSQL("Database Queries")
        clean = Pandas("Data Cleaning")
        validate = Client("Data Validation")
        profile = Client("Data Profiling")
        visualize = Client("Data Visualization")
        store = Client("Data Storage")
        vector = Client("Vector DB Conversion")
        version = Client("Version Control")
        with Cluster("Tools"):
            t_requests = Client("Requests")
            t_scrapy = Client("Scrapy, BeautifulSoup")
            t_sqlalchemy = Client("SQLAlchemy")
            t_pandas = Pandas("Pandas")
            t_great_expectations = Client("Great Expectations")
            t_plotly = Client("Plotly")
            t_dvc = Client("DVC")
            t_milvus = Client("Milvus")
        api >> Edge(label="to") >> clean
        scrape >> Edge(label="to") >> clean
        db >> Edge(label="to") >> clean
        clean >> Edge(label="to") >> validate
        validate >> Edge(label="to") >> profile
        profile >> Edge(label="to") >> visualize
        visualize >> Edge(label="to") >> store
        store >> Edge(label="to") >> vector
        vector >> Edge(label="to") >> version

    with Cluster("Memory Management"):
        isolation = Client("Data Isolation")
        vector_db = Client("Vector Database")
        caching = Redis("Caching")
        retention = Client("Data Retention")
        encryption = Client("Encryption")
        indexing = Client("Indexing")
        redis = Redis("Redis")
        archiving = Client("Archiving")
        key_management = Client("Key Management")
        sharding = Client("Sharding")
        backup = Client("Backup")
        with Cluster("Tools"):
            t_rbac = Client("RBAC")
            t_milvus_mm = Client("Milvus")
            t_redis = Redis("Redis")
            t_aws_s3 = Client("AWS S3")
            t_aws_kms = Client("AWS KMS")
        isolation >> Edge(label="to") >> encryption
        vector_db >> Edge(label="to") >> indexing
        caching >> Edge(label="to") >> redis
        retention >> Edge(label="to") >> archiving
        encryption >> Edge(label="to") >> key_management
        indexing >> Edge(label="to") >> sharding
        archiving >> Edge(label="to") >> backup

    with Cluster("Domain Context Management"):
        ontology = Client("Domain Ontology")
        knowledge_graph = Client("Knowledge Graph")
        dsl = Client("DSL")
        owl = Client("OWL")
        neo4j = Client("Neo4j")
        antlr = Client("ANTLR")
        reasoning = Client("Reasoning")
        querying = Client("Querying")
        parsing = Client("Parsing")
        with Cluster("Tools"):
            t_protege = Client("Protégé")
            t_neo4j = Client("Neo4j")
            t_antlr = Client("ANTLR")
            t_sparql = Client("SPARQL")
            t_cypher = Client("Cypher")
        ontology >> Edge(label="to") >> owl
        knowledge_graph >> Edge(label="to") >> neo4j
        dsl >> Edge(label="to") >> antlr
        owl >> Edge(label="to") >> reasoning
        neo4j >> Edge(label="to") >> querying
        antlr >> Edge(label="to") >> parsing

    with Cluster("Model Selection"):
        evaluation = Client("Model Evaluation")
        tuning = Client("Model Tuning")
        deployment = Client("Model Deployment")
        metrics = Client("Metrics")
        hyperparameter = Client("Hyperparameter Tuning")
        serving = Client("Model Serving")
        cross_validation = Client("Cross-Validation")
        grid_search = Client("Grid Search")
        fastapi = Client("FastAPI")
        with Cluster("Tools"):
            t_sklearn = Client("Scikit-learn")
            t_optuna = Client("Optuna")
            t_docker = Docker("Docker")
            t_fastapi = Client("FastAPI")
        evaluation >> Edge(label="to") >> metrics
        tuning >> Edge(label="to") >> hyperparameter
        deployment >> Edge(label="to") >> serving
        metrics >> Edge(label="to") >> cross_validation
        hyperparameter >> Edge(label="to") >> grid_search
        serving >> Edge(label="to") >> fastapi

    with Cluster("Infrastructure as Code"):
        terraform = Terraform("Terraform")
        docker = Docker("Docker")
        kubernetes = Custom("Kubernetes", "resources/k8s.png")
        cicd = Client("CI/CD")
        provisioning = Client("Resource Provisioning")
        containerization = Client("Containerization")
        orchestration = Client("Orchestration")
        github_actions = Github("GitHub Actions")
        with Cluster("Tools"):
            t_terraform = Terraform("Terraform")
            t_docker = Docker("Docker")
            t_kubernetes = Custom("Kubernetes", "resources/k8s.png")
            t_github_actions = Github("GitHub Actions")
        terraform >> Edge(label="to") >> provisioning
        docker >> Edge(label="to") >> containerization
        kubernetes >> Edge(label="to") >> orchestration
        cicd >> Edge(label="to") >> github_actions

    with Cluster("Monitoring & Maintenance"):
        prometheus = Prometheus("Prometheus")
        grafana = Grafana("Grafana")
        logging = Client("Logging")
        metrics_collection = Client("Metrics Collection")
        dashboards = Client("Dashboards")
        elk = Client("ELK Stack")
        with Cluster("Tools"):
            t_prometheus = Prometheus("Prometheus")
            t_grafana = Grafana("Grafana")
            t_elk = Client("ELK Stack")
        prometheus >> Edge(label="to") >> metrics_collection
        grafana >> Edge(label="to") >> dashboards
        logging >> Edge(label="to") >> elk

    api >> Edge(label="to", color="blue") >> isolation
    isolation >> Edge(label="to", color="blue") >> ontology
    ontology >> Edge(label="to", color="blue") >> evaluation
    evaluation >> Edge(label="to", color="blue") >> terraform
    terraform >> Edge(label="to", color="blue") >> prometheus

[36m[2025-05-04 13:55:31][DEBUG][graphviz._tools][48]: os.makedirs('docs')[0m
[36m[2025-05-04 13:55:31][DEBUG][graphviz.saving][78]: write lines to 'docs/agent_pipeline'[0m
[36m[2025-05-04 13:55:31][DEBUG][graphviz.backend.execute][61]: run [PosixPath('dot'), '-Kdot', '-Tpng', '-O', 'agent_pipeline'][0m
[36m[2025-05-04 13:55:35][DEBUG][graphviz.backend.viewing][48]: view: ['open', 'docs/agent_pipeline.png'][0m


In [88]:
# File: src.utils.resize_image.py
import os

from src.utils import resize_image

# Example usage
resources_path = os.path.join(os.path.dirname(os.path.abspath(".")), "diagrams/resources")

# Path to your input image
input_path = os.path.join(resources_path, "rbac.webp")

# Path to save the resized image
output_path = os.path.join(resources_path, "rbac.png")

# Desired dimensions
new_width = 512  # Desired width
new_height = 512  # Desired height

resize_image(input_path, output_path, new_width, new_height)

Error resizing image: The input file /Users/dalexander/SynologyDrive/Repos/milvus_ha_cluster/src/diagrams/resources/rbac.webp does not exist.
