## Chapter 12: Commit to Contribute

This notebook powers the final chapter by automating the extraction, curation, and visualization of open-source AI projects mentioned throughout the book. Using CrewAI agents, it builds a structured glossary and an interactive reference architecture. It supports reproducibility, highlights licensing and contribution pathways, and shows how automation can document and sustain the open-source ecosystem.


### Set up

In [None]:
%%capture --no-stderr
%pip install -U --quiet 'crewai[tools]' aisuite databricks-sdk

In [None]:
%pip install markdown2 python-docx

In [None]:
# Constants and API Key Configuration
import os
from google.colab import userdata

# === Load API keys securely from Google Colab Secrets ===
def load_api_keys():
    keys = {
        "HF_TOKEN": userdata.get("HF_TOKEN"),
        "SERPER_API_KEY": userdata.get("SERPER_API_KEY"),
        "OPENAI_API_KEY": userdata.get("OPENAI_API_KEY"),
    }
    for key, value in keys.items():
        if not value:
            raise ValueError(f"❌ Missing {key}. Please set this API key in Colab secrets.")
        os.environ[key] = value
    print("✅ All API keys loaded and configured successfully.")

# Execute API key loading upon running this cell
load_api_keys()

### Listing 12-1: Agents that Extract and Merge AI Glossary Entries

This listing uses CrewAI agents to scan chapter files, extract open-source project mentions, and generate structured JSON files per chapter. A second script consolidates these entries, merges duplicates, and exports a clean CSV with hyperlinks. Together, they automate building a glossary from long-form content using agentic workflows.


# VERSION 2

In [None]:
# === Imports ===
import os
import re
import time
from datetime import datetime
from crewai import Agent, Task, Crew, Process
from crewai_tools import FileReadTool, DirectoryReadTool, FileWriterTool

# === Constants ===
DEFAULT_MODEL = "gpt-4o-mini"
CHAPTER_DIR = "/content/chapters"
GLOSSARY_MD_PATH = "open_source_glossary.md"

# === Tools ===
file_tool = FileReadTool()
dir_tool = DirectoryReadTool(directory=CHAPTER_DIR)
file_writer_tool = FileWriterTool()

# === Agent Definitions ===

directory_enumerator = Agent(
    role="Directory Enumerator",
    goal="List all chapter files inside the provided directory.",
    backstory="You are responsible for returning a complete list of text files representing chapters.",
    tools=[dir_tool],
    llm=DEFAULT_MODEL,
)

chapter_reader = Agent(
    role="Chapter Scanner",
    goal="Scan each chapter file (don't skip any) and extract structured open-source project information.",
    backstory="You read chapter files from a book on open-source AI and extract complete project data, saving per-chapter output.",
    tools=[file_tool, file_writer_tool],
    llm=DEFAULT_MODEL,
)

glossary_writer = Agent(
    role="Glossary Assembler",
    goal="Generate a clean Markdown glossary of all open-source projects mentioned in the book.",
    backstory="You transform structured project data into a readable glossary for publication.",
    tools=[file_tool, dir_tool],
    llm=DEFAULT_MODEL,
)

# === Task Definitions ===

list_chapter_files_task = Task(
    description=(
        "List all chapter files in the given directory. These are text-based files, "
        "each representing a chapter from a book on open-source AI. "
        "Return a list of file paths (or names) to be passed to the Chapter Scanner."
    ),
    expected_output="List of file paths or filenames for all chapter files.",
    agent=directory_enumerator,
)

extract_projects_task = Task(
    description=(
        "You will be given a list of file paths. \n"
        "You must process every file, do not skip any!\n"
        "Each file is a chapter from a book on open-source AI.\n\n"
        "For each file:\n"
        "- Read the file using the file reading tool.\n"
        "- Extract the chapter number and title if present (format: 'Chapter X: Title').\n"
        "- Identify all open-source projects, frameworks and tools mentioned.\n"
        "- For each project, extract or infer the following:\n"
        "  - Project name\n"
        "  - Creator (person or organization)\n"
        "  - Description (1 sentence)\n"
        "  - Year (of inception)\n"
        "  - URL (homepage or repository)\n"
        #"- If you are not certain write'N/A'.\n\n"
        "After analyzing each file, write the output as a JSON file using the file writing tool.\n"
        "Use the same name as the input file, but change the extension to '.json'.\n"
        "Write the .json file to the same directory as the original chapter file."
    ),
    expected_output=(
        "One JSON file saved per chapter, in the same folder as the chapter file. "
        "Each file contains a structured list of project dictionaries."
    ),
    agent=chapter_reader,
    context=[list_chapter_files_task],
)

generate_glossary_task = Task(
    description=(
        "Read all .json files in the same directory as the chapter files. "
        "These files contain structured lists of open-source projects extracted per chapter.\n"
        "- Merge all entries into one unified list.\n"
        "Write a clean Markdown-formatted glossary. Each project should include:\n"
        "- Project name\n"
        "- Creator\n"
        "- Description\n"
        "- Estimated year of inception\n"
        "- List of chapters it appears in\n"
        "- Project URL\n"
        "Ensure consistent formatting and readability."
    ),
    expected_output="Markdown glossary combining all chapter-level extractions.",
    agent=glossary_writer,
    context=[extract_projects_task],
    output_file=GLOSSARY_MD_PATH
)

# === Crew Definition ===

glossary_crew = Crew(
    agents=[directory_enumerator, chapter_reader, glossary_writer],
    tasks=[list_chapter_files_task, extract_projects_task, generate_glossary_task],
    process=Process.sequential,
    verbose=True
)

# === Run Program ===

def run_open_source_glossary():
    print(f"\n📚 Starting glossary generation from: {CHAPTER_DIR}")
    start = time.time()
    glossary_crew.kickoff()
    end = time.time()
    print("\n✅ Open-source glossary complete.")
    print(f"⏱️ Duration: {end - start:.2f} seconds.")
    print(f"📄 Glossary written to: {GLOSSARY_MD_PATH}")
    print(f"📁 Per-chapter output saved in: {CHAPTER_DIR} (as .json files)")

# === Entry Point ===
if __name__ == "__main__":
    run_open_source_glossary()


#### Part 2: Save Chapter JSONs to a CSV

In [None]:
import os
import json
import csv

CHAPTER_DIR = "/content/chapters"
OUTPUT_CSV = "open_source_glossary.csv"

def normalize_project_name(name):
    return name.strip().lower()

def merge_project_entries(entries):
    merged = {}
    for entry in entries:
        name_key = normalize_project_name(entry.get("name", ""))
        if not name_key:
            continue

        if name_key not in merged:
            merged[name_key] = entry
        else:
            existing = merged[name_key]
            # Merge chapter lists
            existing_chapters = set(existing.get("chapter_list", []))
            new_chapters = set(entry.get("chapter_list", []))
            existing["chapter_list"] = sorted(existing_chapters.union(new_chapters))
    return list(merged.values())

def standardize_entry(raw, chapter_label):
    # Try both naming styles
    name = raw.get("project_name") or raw.get("name", "N/A")
    creator = raw.get("creator", "N/A")
    description = raw.get("description", "N/A")
    year = raw.get("year") or raw.get("year_inception", "N/A")
    url = raw.get("url", "N/A")

    return {
        "name": name,
        "creator": creator,
        "description": description,
        "inception_year": year,
        "project_url": url,
        "chapter_list": [chapter_label]
    }

def load_all_projects_from_json(directory):
    all_entries = []
    json_files = [f for f in os.listdir(directory) if f.endswith(".json")]

    if not json_files:
        print(f"⚠️ No .json files found in: {directory}")
        return all_entries

    for filename in json_files:
        filepath = os.path.join(directory, filename)
        try:
            with open(filepath, "r", encoding="utf-8") as f:
                data = json.load(f)
                chapter_number = data.get("chapter") or data.get("chapter_number")
                chapter_title = data.get("title", "Unknown Title")
                chapter_label = f"Chapter {chapter_number}: {chapter_title}"

                for raw_proj in data.get("projects", []):
                    entry = standardize_entry(raw_proj, chapter_label)
                    all_entries.append(entry)

                print(f"✅ Loaded {len(data.get('projects', []))} projects from {filename}")
        except Exception as e:
            print(f"❌ Failed to read {filename}: {e}")
    return all_entries

def write_csv(projects, output_path):
    fieldnames = ["name", "creator", "description", "inception_year", "chapter_list", "HYPERLINK"]
    with open(output_path, "w", newline='', encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for proj in projects:
            url = proj.get("project_url", "")
            name = proj.get("name", "")
            hyperlink = f'=HYPERLINK("{url}", "{name}")' if url and name else ""

            writer.writerow({
                "name": name,
                "creator": proj.get("creator", ""),
                "description": proj.get("description", ""),
                "inception_year": proj.get("inception_year", ""),
                "chapter_list": ", ".join(proj.get("chapter_list", [])),
                "HYPERLINK": hyperlink
            })
    print(f"📄 CSV with hyperlinks saved to: {output_path}")

def run_merge_and_export():
    print(f"\n📥 Reading JSON files from: {CHAPTER_DIR}")
    all_entries = load_all_projects_from_json(CHAPTER_DIR)
    print(f"📦 Found {len(all_entries)} total project entries")

    merged_projects = merge_project_entries(all_entries)
    print(f"🔁 Merged to {len(merged_projects)} unique project entries")

    write_csv(merged_projects, OUTPUT_CSV)

# === Run It ===
if __name__ == "__main__":
    run_merge_and_export()


### Sample Glossary Output

# Glossary of Open-Source Projects

## Chapter 1
### Python
- **Creator:** Guido van Rossum  
- **Description:** Python is a high-level, interpreted programming language known for its readability and versatility, widely used for web development, data analysis, artificial intelligence, and more.  
- **Estimated Year of Inception:** 1991  
- **Project URL:** [python.org](https://www.python.org/)

### NumPy
- **Creator:** Travis Olliphant, et al.  
- **Description:** NumPy is a fundamental package for scientific computing in Python that provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.  
- **Estimated Year of Inception:** 2006  
- **Project URL:** [numpy.org](https://numpy.org/)

### Pandas
- **Creator:** Wes McKinney  
- **Description:** Pandas is an open-source data analysis and manipulation library for Python, providing data structures and functions needed to work with structured data effectively.  
- **Estimated Year of Inception:** 2008  
- **Project URL:** [pandas.pydata.org](https://pandas.pydata.org/)

### PyTorch
- **Creator:** Facebook AI Research  
- **Description:** PyTorch is an open-source machine learning library based on the Torch library, used for applications such as natural language processing and deep learning.  
- **Estimated Year of Inception:** 2016  
- **Project URL:** [pytorch.org](https://pytorch.org/)

### Matplotlib
- **Creator:** John D. Hunter  
- **Description:** Matplotlib is a plotting library for Python and its numerical mathematics extension NumPy, allowing for the creation of static, animated, and interactive visualizations.  
- **Estimated Year of Inception:** 2003  
- **Project URL:** [matplotlib.org](https://matplotlib.org/)

### Jupyter Notebooks
- **Creator:** Project Jupyter  
- **Description:** Jupyter Notebooks is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.  
- **Estimated Year of Inception:** 2014  
- **Project URL:** [jupyter.org](https://jupyter.org/)

### Google Colab
- **Creator:** Google  
- **Description:** Google Colaboratory, or Colab, is a free cloud service for Python that allows you to write and execute code in a web-based Jupyter environment, with easy access to GPUs.  
- **Estimated Year of Inception:** 2017  
- **Project URL:** [colab.research.google.com](https://colab.research.google.com/)

### scikit-learn
- **Creator:** David Cournapeau, et al.  
- **Description:** Scikit-learn is a machine learning library for Python that features various classification, regression, and clustering algorithms, along with tools for model selection and evaluation.  
- **Estimated Year of Inception:** 2007  
- **Project URL:** [scikit-learn.org](https://scikit-learn.org/)

## Chapter 2
### Fairlearn
- **Creator:** Microsoft  
- **Description:** Fairlearn is an open-source Python library that helps in assessing and improving the fairness of machine learning models.  
- **Estimated Year of Inception:** 2020  
- **Project URL:** [fairlearn.org](https://fairlearn.org/)

### Hugging Face
- **Creator:** Hugging Face, Inc.  
- **Description:** Hugging Face is a company known for its transformer models in NLP and provides an extensive library to easily implement and deploy machine learning models.  
- **Estimated Year of Inception:** 2016  
- **Project URL:** [huggingface.co](https://huggingface.co/)

### Milvus
- **Creator:** Zilliz  
- **Description:** Milvus is an open-source vector database designed for managing embedding data, offering high-performance searching and analytics capabilities.  
- **Estimated Year of Inception:** 2020  
- **Project URL:** [milvus.io](https://milvus.io/)

### FAISS
- **Creator:** Facebook AI Research  
- **Description:** FAISS is a library for efficient similarity search and clustering of dense vectors, providing algorithms that optimize searches for big datasets.  
- **Estimated Year of Inception:** 2017  
- **Project URL:** [faiss.ai](https://faiss.ai/)

### Weaviate
- **Creator:** SeMI Technologies  
- **Description:** Weaviate is an open-source vector search engine that allows developers to build semantic search applications powered by machine learning.  
- **Estimated Year of Inception:** 2019  
- **Project URL:** [weaviate.io](https://weaviate.io/)

### ChromaDB
- **Creator:** Chroma Team  
- **Description:** ChromaDB is an open-source embedding database that provides features for high-dimensional data and machine learning-based applications.  
- **Estimated Year of Inception:** 2022  
- **Project URL:** [chroma.ai](https://chroma.ai/)

### Stable Diffusion
- **Creator:** Stability AI  
- **Description:** Stable Diffusion is a deep learning model designed for generating detailed images based on text prompts, known for its open-source nature and high quality.  
- **Estimated Year of Inception:** 2022  
- **Project URL:** [stability.ai](https://stability.ai/)

### Gradient
- **Creator:** Paperspace  
- **Description:** Gradient is a platform that simplifies the process of building and deploying machine learning models, providing a suite of tools for developers and data scientists.  
- **Estimated Year of Inception:** 2020  
- **Project URL:** [paperspace.com/gradient](https://www.paperspace.com/gradient)

### Giant-Machine
- **Creator:** Giant Team  
- **Description:** Giant-Machine is an advanced toolkit for building and deploying robust AI models with ease.  
- **Estimated Year of Inception:** 2023  
- **Project URL:** [giantmachine.ai](https://giantmachine.ai/)

## Chapter 6
### HumanLayer
- **Creator:** Human Layer  
- **Description:** HumanLayer is an open-source library that allows developers to build applications that leverage human input alongside AI systems.  
- **Estimated Year of Inception:** 2022  
- **Project URL:** [humanlayer.com](https://www.humanlayer.com/)

### Gandalf
- **Creator:** Gandalf Team  
- **Description:** Gandalf is an open-source resource for building intuitive search applications, powered by AI.  
- **Estimated Year of Inception:** 2021  
- **Project URL:** [gandalf.dev](https://gandalf.dev/)

## Chapter 11
### TensorFlow
- **Creator:** Google Brain Team  
- **Description:** TensorFlow is an end-to-end open-source platform for machine learning, offering a comprehensive ecosystem for building ML applications.  
- **Estimated Year of Inception:** 2015  
- **Project URL:** [tensorflow.org](https://www.tensorflow.org/)

### OpenAI Gym
- **Creator:** OpenAI  
- **Description:** OpenAI Gym is a toolkit for developing and comparing reinforcement learning (RL) algorithms, providing a standard API for environments.  
- **Estimated Year of Inception:** 2016  
- **Project URL:** [gym.openai.com](https://gym.openai.com/)

### IBM Watson
- **Creator:** IBM  
- **Description:** IBM Watson is a suite of AI services and applications designed to help enterprises leverage advanced data analytics and cognitive services.  
- **Estimated Year of Inception:** 2011  
- **Project URL:** [ibm.com/watson](https://www.ibm.com/watson/)

### IBM watsonx
- **Creator:** IBM  
- **Description:** IBM watsonx is IBM’s next-generation data, AI, and integration platform designed to empower organizations in their AI journey.  
- **Estimated Year of Inception:** 2023  
- **Project URL:** [ibm.com/watsonx](https://www.ibm.com/watsonx/)

### Mistral
- **Creator:** Mistral  
- **Description:** Mistral is an open-source LLM that specializes in generating high-quality text based on prompts, with a focus on efficiency and accessibility.  
- **Estimated Year of Inception:** 2023  
- **Project URL:** [mistral.ai](https://mistral.ai/)

### CrewAI
- **Creator:** CrewAI  
- **Description:** CrewAI is a collaborative AI platform focused on improving teams and organizations through machine learning tools and data analytics.  
- **Estimated Year of Inception:** 2023  
- **Project URL:** [crewai.com](https://crewai.com/)

### LangChain
- **Creator:** Harrison Chase, et al.  
- **Description:** LangChain is a framework for developing applications powered by language models, emphasizing modular components for integrations, chains, and agents.  
- **Estimated Year of Inception:** 2022  
- **Project URL:** [langchain.readthedocs.io](https://langchain.readthedocs.io/)
Connected to Python 3 Google Compute Engine backend


### Listing 12-2: Generate Open Source AI Architecture View

This listing parses an Excel-based glossary of open-source AI projects and filters the top entries per category. It then groups and styles them into an interactive HTML reference architecture, color-coded by domain. The output offers a visual snapshot of the ecosystem, ideal for education, planning, or documentation.


In [None]:
import pandas as pd
import re
import requests
from io import BytesIO
from openpyxl import load_workbook

# === CATEGORY AND SUBCATEGORY DEFINITIONS ===
category = {
    1: {"label": "Tools & Ecosystem", "color": "#546E7A"},
    2: {"label": "Data Layer", "color": "#FFEB3B"},
    3: {"label": "Model Development", "color": "#2196F3"},
    4: {"label": "Agents & Operations", "color": "#ECEFF1"},
    5: {"label": "Platform Services", "color": "#9E9E9E"},
    6: {"label": "Governance & Oversight", "color": "#F44336"}
}

subcategory = {
    100: {"label": "Developer Environments", "limit": 2},
    110: {"label": "Model Hubs / Repos", "limit": 2},
    200: {"label": "Data Basics", "limit": 2},
    210: {"label": "Data Augmentation", "limit": 2},
    220: {"label": "Data Synth", "limit": 2},
    300: {"label": "Deep Learning Frameworks", "limit": 3},
    310: {"label": "Classical ML", "limit": 3},
    320: {"label": "Models", "limit": 3},
    400: {"label": "Agent Frameworks", "limit": 2},
    410: {"label": "Model Serve", "limit": 2},
    420: {"label": "Flow Control", "limit": 2},
    500: {"label": "Vector Stores", "limit": 2},
    510: {"label": "Experiment Tracking", "limit": 2},
    520: {"label": "Benchmarks", "limit": 2},
    600: {"label": "Security & Guardrails", "limit": 3},
    610: {"label": "Licensing & Compliance", "limit": 3},
    620: {"label": "Ethics & Responsibility", "limit": 3}
}

# === Constants ===
BASE_URL = "https://opensourceai-book.github.io/code/datasets/"
FILE_NAME = "open_source_ai_glossary.xlsx"
EXCEL_URL = BASE_URL + FILE_NAME

# === Load Excel file directly into memory ===
response = requests.get(EXCEL_URL)
excel_data = BytesIO(response.content)

# === Load workbook from in-memory bytes ===
wb = load_workbook(excel_data, data_only=False)
ws = wb.active

# === Parse headers and setup ===
headers = [cell.value for cell in next(ws.iter_rows(min_row=1, max_row=1))]
name_col_idx = headers.index("Name")
headers.insert(name_col_idx + 1, "URL")

data = []
for row in ws.iter_rows(min_row=2):
    cell = row[name_col_idx]
    value = str(cell.value)
    # Handle Excel HYPERLINK formulas
    hyperlink_match = re.match(r'=HYPERLINK\("([^"]+)",\s*"([^"]+)"\)', value)
    if hyperlink_match:
        url = hyperlink_match.group(1)
        display = hyperlink_match.group(2)
    else:
        url = cell.hyperlink.target if cell.hyperlink else None
        display = cell.value

    row_values = [c.value for c in row]
    row_values.insert(name_col_idx + 1, url)
    row_values[name_col_idx] = display
    data.append(row_values)

df = pd.DataFrame(data, columns=headers)
df = df.dropna(subset=["Name"])

# Normalize and clean
df["Category"] = pd.to_numeric(df["Category"], errors="coerce")
df["Name_normalized"] = df["Name"].str.strip().str.lower()
df = df.drop_duplicates(subset=["Name_normalized", "Chapter"])

# Count mentions across all chapters
mention_counts = df["Name_normalized"].value_counts().to_dict()
df["Mention_Count"] = df["Name_normalized"].map(mention_counts)

# Add category/subcategory labels
df["Category_Label"] = df["Category"].apply(
    lambda x: category.get(int(x // 100), {}).get("label") if pd.notnull(x) else None
)
df["Category_Color"] = df["Category"].apply(
    lambda x: category.get(int(x // 100), {}).get("color") if pd.notnull(x) else None
)
df["Subcategory_Label"] = df["Category"].map(lambda x: subcategory.get(x, {}).get("label"))
df["Subcategory_Limit"] = df["Category"].map(lambda x: subcategory.get(x, {}).get("limit"))

# === PREVIEW BEFORE FILTERING ===
display_cols = ["Name", "URL", "Category", "Category_Label", "Category_Color", "Subcategory_Label", "Mention_Count"]
print("🟡 Full dataset sample before filtering:\n")
print(df[display_cols].head(100).to_string(index=False))
print("\n\n")

# === FILTER TOP PROJECTS PER SUBCATEGORY (dedup by name within subcat) ===
df_sorted = df.sort_values(by=["Category", "Mention_Count"], ascending=[True, False])

filtered_dfs = []
for subcat_code, meta in subcategory.items():
    limit = meta["limit"]
    sub_df = df_sorted[df_sorted["Category"] == subcat_code]
    sub_df = sub_df.drop_duplicates(subset=["Name"])  # only keep one row per Name
    sub_df = sub_df.head(limit)
    filtered_dfs.append(sub_df)

final_df = pd.concat(filtered_dfs, ignore_index=True)

# === BEFORE FINAL DEDUP (raw filtered)
print("🟢 Filtered `final_df` before optional deduping by Name:\n")
print(final_df[display_cols].head(50).to_string(index=False))
print("\n\n")

# === OPTIONAL: DEDUP FINAL OUTPUT (ONLY KEEP ONE PER NAME)
final_df = final_df.drop_duplicates(subset=["Name"])
print("✅ `final_df` after deduping by Name:\n")
print(final_df[display_cols].head(50).to_string(index=False))


#### Part 2: Generate HTML for Reference Architecture

In [None]:
# === Generate HTML Reference Architecture from final_df ===

from collections import defaultdict

# Group data by top-level category and subcategory
grouped = defaultdict(lambda: defaultdict(list))
for _, row in final_df.iterrows():
    cat_label = row["Category_Label"]
    cat_color = row["Category_Color"]
    subcat_label = row["Subcategory_Label"]
    name = row["Name"]
    url = row["URL"] or "#"  # fallback if URL is missing
    grouped[(cat_label, cat_color)][subcat_label].append((name, url))

# Start building HTML content
html = [
    "<!DOCTYPE html>",
    "<html lang='en'>",
    "<head><meta charset='UTF-8'><title>Open Source AI Reference Architecture</title><style>",
    "body { font-family: Arial, sans-serif; background: #fdfdfd; margin: 0; padding: .05em; }",
    ".layer-wrapper { display: flex; align-items: flex-start; gap: .5em; margin-bottom: .5em; }",
    ".layer-label { width: 140px; font-weight: bold; font-size: 1.1em; text-align: center; padding-top: 1em; }",
    ".layer-box { flex: 1; border-radius: 8px; padding: .75em; border: 2px solid #333; }",
    ".category-row { display: flex; flex-wrap: wrap; justify-content: center; gap: 2em; }",
    ".category { background: white; border-radius: 12px; padding: .25em; width: 240px; box-shadow: 0 3px 8px rgba(0,0,0,0.15); display: flex; flex-direction: column; align-items: center; }",
    ".category-title { font-weight: bold; font-size: 1em; margin-bottom: 0.5em; text-align: center; }",
    ".project-link { display: block; margin: 0.25em 0; text-align: center; font-size: .8em; text-decoration: none; color: #0056b3; }",
    ".project-link:hover { text-decoration: underline; }",
    "</style></head><body>"
]

# Populate HTML content
for (layer_label, color), subcats in grouped.items():
    html.append('<div class="layer-wrapper">')
    html.append(f'<div class="layer-label">{layer_label}</div>')
    html.append(f'<div class="layer-box" style="background-color: {color}">')
    html.append('<div class="category-row">')
    for subcat, projects in subcats.items():
        html.append('<div class="category">')
        html.append(f'<div class="category-title">{subcat}</div>')
        for name, url in projects:
            html.append(f'<a class="project-link" href="{url}" target="_blank">{name}</a>')
        html.append('</div>')
    html.append('</div></div></div>')

html.extend(["</body>", "</html>"])

# Write HTML file
with open("open_source_ai_reference_architecture.html", "w", encoding="utf-8") as f:
    f.write("\n".join(html))

print("✅ Saved: open_source_ai_reference_architecture.html")
