Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ Assume an appropriate virtual environment is activated. If it isn't, just abort.

## Agent Hints

**CRITIAL: Ensure working directory**. Sometimes agents will change working directories as part of tasks and forget to change them back.
pydyna agent documentation and scripts are often sensitive to working directory, so recommended to check the pwd before each agent operation
if unsure.

**CRITICAL: Never redirect output to /dev/null on Windows**. This triggers a VS Code security prompt that halts execution. Instead:
- For commands where you want to suppress output: Use `>$null 2>&1` (PowerShell) or just run the command without redirection
- For commands where you want to check output: Use `python codegen/generate.py 2>&1 | Out-Null` or capture in a variable
Expand All @@ -13,13 +17,16 @@ Assume an appropriate virtual environment is activated. If it isn't, just abort.
- ✅ GOOD: `python codegen/generate.py` (no redirection)
- ✅ GOOD: `$output = python codegen/generate.py 2>&1`

**Documentation builds**: To build docs without examples, use:
**Documentation builds**: To build docs without examples:

Ensure that the python3.13 environment is active, then:

```bash
# Build docs without examples or autokeywords (fast)
cd doc && BUILD_EXAMPLES=false BUILD_AUTOKEYWORDS_API=false ./make.bat html
BUILD_EXAMPLES=false BUILD_AUTOKEYWORDS_API=false ./doc/make.bat html

# Build docs with autokeywords but no examples (slow, ~8+ min for keyword imports alone)
cd doc && BUILD_EXAMPLES=false BUILD_AUTOKEYWORDS_API=true ./make.bat html
BUILD_EXAMPLES=false BUILD_AUTOKEYWORDS_API=true ./doc/make.bat html
```

## Agent Coding Style Preferences
Expand Down
File renamed without changes.
93 changes: 93 additions & 0 deletions agents/projects/doc-todo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
Plan: Autodoc for generated keywords

## Performance Optimization Plan

**Goal**: Optimize PyDyna documentation build to handle 3,214+ auto-generated keyword classes efficiently.

**Problem**: Full build with auto-keywords takes 30+ minutes and uses excessive RAM. This makes iteration too slow for optimization work.

**Strategy**: Use representative subset for rapid iteration, optimize on subset, then validate on full build.

### Learnings

- **AutoAPI caching**: Must `rm -rf doc/source/api` before subset builds, otherwise Sphinx uses cached docs from previous full build.
- **Subset selection**: boundary (168), contact (155), control (216) = 539 keywords provides good representation (~17% of total).
- **Clean vs incremental**: Incremental builds (no `rm -rf source/api _build`) are 18% faster (154.57s → 126.66s).
- **Parallelization impact**: No scaling observed by increasing SPHINXJOBS from 1 to 4 to 8

### Steps

1. ✅ **Generate auto keywords for a specific subset using the domain feature from the codegen CLI** (e.g. boundary/*, contact/*, and control/*).
- Implementation: Added `--subset` CLI option to `codegen/generate.py` accepting comma-delimited domains
- Usage: `python codegen/generate.py --subset "boundary,contact,control"`
- Result: Generates 536 classes in 3 domains

2. ✅ **Add timing instrumentation** to `doc/make.bat` and `doc/Makefile` with timestamps before/after each major phase, and modify `conf.py` to add Sphinx event handlers (`autodoc-process-docstring`, `source-read`, etc.) that log phase durations to `doc/_build/timing.log`.
- Implementation: Added `setup()` function in `conf.py` with event handlers for builder-inited, env-get-outdated, env-before-read-docs, doctree-resolved, build-finished
- Output: Logs to `doc/_build/timing.log` with phase breakdown and total document count

3. ✅ **Establish baseline metrics** by running 3 timed builds with the subset: (a) current configuration, (b) with `SPHINXJOBS=4`, (c) with `SPHINXJOBS=8`, measuring total time, memory usage, and time per file to identify optimal parallelization.
- **Baseline metrics (1,461 documents, boundary/contact/control subset)**:
- Clean build (rm -rf source/api _build): ~154s total (41s read-docs, 113s process-doctrees)
- SPHINXJOBS=1: 128.94s total (36.05s read, 92.26s process) - FASTEST
- SPHINXJOBS=4: 147.34s total (40.46s read, 106.38s process)
- SPHINXJOBS=8: 134.88s total (35.74s read, 98.66s process)
- SPHINXJOBS=auto (default): 154.57s total (41.02s read, 113.07s process)
- **Finding**: Serial processing (SPHINXJOBS=1) is fastest for this subset size, likely due to overhead outweighing parallelization benefits

4. ✅ **Profile subset build** using `python -m cProfile -o doc/_build/build.prof` wrapper and `snakeviz` or `pyinstrument` to visualize hotspots, focusing on autodoc import time, docstring processing, and HTML generation phases.
- **Key findings**:
- **Wall-clock time**: 7m22s (442 seconds total)
- **Sphinx-reported time**: 140s (37s read-docs, 102s process-doctrees)
- **Hidden AutoAPI overhead**: ~302 seconds (68% of total time!) before Sphinx even initializes
- **AutoAPI phases**: "Reading files" → "Mapping Data" → "Rendering Data" all happen pre-init
- **Intersphinx**: 6 inventories loaded at startup (python, numpy, matplotlib, imageio, pandas, pytest)
- **CPU vs I/O**: pyinstrument showed 1070s wall-clock vs 198s CPU time - heavily I/O-bound
- **Optimization opportunities**:
1. AutoAPI "Reading files" and "Mapping Data" - 5+ minutes of parsing Python files
2. Intersphinx inventory loading - network I/O at startup
3. Autosummary generation running before AutoAPI
4. Process-doctrees phase (102s) - Jinja2 template compilation, numpydoc processing

5. **Measure, evaluate, and optimize AutoAPI** - The critical path (68% of build time is AutoAPI overhead)

**Phase A: Measure AutoAPI baseline**
1. Add detailed timing to AutoAPI phases (Reading files, Mapping Data, Rendering Data)
2. Measure time per file, identify slowest files to parse
3. Profile AST parsing with cProfile targeting AutoAPI/astroid modules
4. Baseline: 539 files in ~302s = 0.56s/file average

**Phase B: Quick wins (target: 15-25% improvement)**
1. **Intersphinx caching** (~5-10s):
- Pre-download 6 inventories to `doc/_build/intersphinx_cache/`
- Configure `intersphinx_cache_limit = -1`
2. **Optimize AutoAPI options** (~30-60s):
- Add `autoapi_options = ['members', 'show-inheritance']`
- Test `autoapi_python_class_content = 'class'`
- Test `autoapi_member_order = 'bysource'`
3. Document `SPHINXJOBS=1` as optimal (no time savings, just documentation)

**Phase C: Incremental builds (target: 40-60% savings on rebuilds)**
1. Enable `autoapi_keep_files = True` for caching generated RST files
2. Test incremental build after no-op (expect 5x+ speedup: 7.5min → <2min)
3. Test incremental build after single keyword change
4. Document when clean builds needed (config changes)

**Phase D: Deep optimization (target: 20-30% improvement)**
1. Simplify AutoAPI templates in `doc/source/autoapi/`
2. Investigate astroid caching options
3. Test if reducing type annotations helps parsing
4. Consider optimizing keyword class structure (codegen changes)

**Phase E: Alternative approaches (if insufficient)**
1. Pre-built documentation cache (commit AutoAPI RST files)
2. Selective documentation (important vs reference-only classes)
3. Split documentation builds (manual vs auto-keywords)

**Success metrics**:
- Subset build: <3 min (currently 7.5 min) = 60% improvement
- Full build: <10 min (currently 30+ min) = 67% improvement
- Incremental (no changes): <1 min = 85% improvement

6. **Validate on full build** by running optimized configuration against all 3,214 auto-keywords, comparing total time and memory usage to establish that improvements scale, then document optimal build settings in `AGENTS.md` and CI workflows.

78 changes: 66 additions & 12 deletions codegen/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,20 @@ def skip_generate_keyword_class(keyword: str) -> bool:
return False


def get_undefined_alias_keywords(keywords_list: typing.List[typing.Dict]) -> typing.List[typing.Dict]:
def get_undefined_alias_keywords(
keywords_list: typing.List[typing.Dict], subset_domains: typing.Optional[typing.List[str]] = None
) -> typing.List[typing.Dict]:
from keyword_generation.utils.domain_mapper import get_keyword_domain

undefined_aliases: typing.List[typing.Dict] = []
for alias, kwd in data_model.ALIAS_TO_KWD.items():
if alias not in [kwd["name"] for kwd in keywords_list]:
# Filter by subset domains if specified
if subset_domains:
domain = get_keyword_domain(alias)
if domain not in subset_domains:
continue

fixed_keyword = fix_keyword(alias).lower()
classname = get_classname(fixed_keyword)
fixed_base_keyword = fix_keyword(kwd).lower()
Expand Down Expand Up @@ -251,20 +261,33 @@ def generate_autodoc_file(autodoc_output_path, all_keywords, env):
logger.info(f"Generated index.rst with {len(categories)} category links")


def get_keywords_to_generate(kwd_name: typing.Optional[str] = None) -> typing.List[typing.Dict]:
def get_keywords_to_generate(
kwd_name: typing.Optional[str] = None, subset_domains: typing.Optional[typing.List[str]] = None
) -> typing.List[typing.Dict]:
"""Get keywords to generate. If a kwd name is not none, only generate
it and its generations."""
it and its generations. If subset_domains is provided, only generate keywords
from those domains (e.g., ['boundary', 'contact', 'control'])."""
assert data_model.KWDM_INSTANCE is not None, "KWDM_INSTANCE not initialized"
keywords = []
kwd_list = data_model.KWDM_INSTANCE.get_keywords_list()

# first get all aliases
add_aliases(kwd_list)

# Import domain mapper to properly determine keyword domain
from keyword_generation.utils.domain_mapper import get_keyword_domain

# then get keywords to generate
for keyword in kwd_list:
if kwd_name != None and keyword != kwd_name:
continue

# Filter by subset domains if specified
if subset_domains:
domain = get_keyword_domain(keyword)
if domain not in subset_domains:
continue

for keyword, keyword_options in get_generations(keyword):
item = get_keyword_item(keyword)
item["options"] = keyword_options
Expand All @@ -273,16 +296,26 @@ def get_keywords_to_generate(kwd_name: typing.Optional[str] = None) -> typing.Li
return keywords


def generate_classes(lib_path: str, kwd_name: typing.Optional[str] = None, autodoc_output_path: str = "") -> None:
def generate_classes(
lib_path: str,
kwd_name: typing.Optional[str] = None,
autodoc_output_path: str = "",
subset_domains: typing.Optional[typing.List[str]] = None,
) -> None:
"""Generates the keyword classes, importer, and type-mapper
if kwd_name is not None, this only generates that particular keyword class
if subset_domains is not None, only generates keywords from those domains
"""
logger.debug(f"Starting class generation with lib_path={lib_path}, kwd_name={kwd_name}")
logger.debug(
f"Starting class generation with lib_path={lib_path}, kwd_name={kwd_name}, subset_domains={subset_domains}"
)
if subset_domains:
logger.info(f"Subset mode: generating only domains {subset_domains}")
autodoc_entries = []
env = Environment(loader=get_loader(), trim_blocks=True, lstrip_blocks=True)
output_manager = OutputManager(lib_path)
# Generate only requested keyword(s)
keywords_list = get_keywords_to_generate(kwd_name)
keywords_list = get_keywords_to_generate(kwd_name, subset_domains)
logger.info(f"Generating {len(keywords_list)} keyword classes")
generated_count = 0
skipped_count = 0
Expand All @@ -304,9 +337,9 @@ def generate_classes(lib_path: str, kwd_name: typing.Optional[str] = None, autod

# Always rewrite autodoc for all keywords
if autodoc_output_path and not kwd_name:
all_keywords = get_keywords_to_generate()
all_keywords = get_keywords_to_generate(subset_domains=subset_domains)
generate_autodoc_file(autodoc_output_path, all_keywords, env)
keywords_list.extend(get_undefined_alias_keywords(keywords_list))
keywords_list.extend(get_undefined_alias_keywords(keywords_list, subset_domains))
if kwd_name == None:
generate_entrypoints(env, output_manager, keywords_list)

Expand Down Expand Up @@ -349,23 +382,37 @@ def run_codegen(args):
return
load_inputs(this_folder, args)

# Handle subset domains
subset_domains = None
if args.subset:
subset_domains = [d.strip() for d in args.subset.split(",")]
logger.info(f"Subset mode enabled: generating only {subset_domains} domains")

# Handle autodoc-only mode
if args.autodoc_only:
logger.info("Generating autodoc files only")
env = Environment(loader=get_loader(), trim_blocks=True, lstrip_blocks=True)
all_keywords = get_keywords_to_generate()
all_keywords = get_keywords_to_generate(subset_domains=subset_domains)
generate_autodoc_file(autodoc_path, all_keywords, env)
logger.info("Autodoc generation complete")
return

# Handle subset domains
subset_domains = None
if args.subset:
subset_domains = [d.strip() for d in args.subset.split(",")]
logger.info(f"Subset mode enabled: generating only {subset_domains} domains")

if args.keyword == "":
kwd = None
logger.info("Generating code for all keywords")
generate_classes(output, autodoc_output_path=autodoc_path)
logger.info(
"Generating code for all keywords" if not subset_domains else f"Generating subset: {subset_domains}"
)
generate_classes(output, autodoc_output_path=autodoc_path, subset_domains=subset_domains)
else:
kwd = args.keyword
logger.info(f"Generating code for {kwd}")
generate_classes(output, kwd, autodoc_output_path=autodoc_path)
generate_classes(output, kwd, autodoc_output_path=autodoc_path, subset_domains=subset_domains)


def parse_args():
Expand Down Expand Up @@ -419,6 +466,13 @@ def parse_args():
choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
help="Set the logging level. Defaults to INFO.",
)
parser.add_argument(
"--subset",
"-s",
default="",
help="Generate only a subset of keyword domains (comma-delimited list, e.g., 'boundary,contact,control')."
"Useful for fast iteration during optimization work.",
)
return parser.parse_args()


Expand Down
2 changes: 1 addition & 1 deletion doc/.vale.ini
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ SkippedScopes = script, style, pre, figure
WordTemplate = \b(?:%s)\b

# Ignore autogenerated files and internal development notes
IgnoredFiles = **/todo.md, AGENTS.md, agents/**, source/_autosummary/airbag.rst, source/_autosummary/ale.rst, source/_autosummary/battery.rst, source/_autosummary/boundary.rst, source/_autosummary/case.rst, source/_autosummary/cese.rst, source/_autosummary/change.rst, source/_autosummary/chemistry.rst, source/_autosummary/component.rst, source/_autosummary/constrained.rst, source/_autosummary/contact.rst, source/_autosummary/control.rst, source/_autosummary/controller.rst, source/_autosummary/cosim.rst, source/_autosummary/damping.rst, source/_autosummary/database.rst, source/_autosummary/define.rst, source/_autosummary/deformable.rst, source/_autosummary/delete.rst, source/_autosummary/dualcese.rst, source/_autosummary/ef.rst, source/_autosummary/element.rst, source/_autosummary/em.rst, source/_autosummary/eos.rst, source/_autosummary/fatigue.rst, source/_autosummary/frequency.rst, source/_autosummary/icfd.rst, source/_autosummary/iga.rst, source/_autosummary/include.rst, source/_autosummary/index.rst, source/_autosummary/initial.rst, source/_autosummary/integration.rst, source/_autosummary/interface.rst, source/_autosummary/keyword.rst, source/_autosummary/load.rst, source/_autosummary/lso.rst, source/_autosummary/mat.rst, source/_autosummary/mesh.rst, source/_autosummary/module.rst, source/_autosummary/node.rst, source/_autosummary/other.rst, source/_autosummary/parameter.rst, source/_autosummary/part.rst, source/_autosummary/particle.rst, source/_autosummary/perturbation.rst, source/_autosummary/rail.rst, source/_autosummary/rigid.rst, source/_autosummary/rigidwall.rst, source/_autosummary/rve.rst, source/_autosummary/section.rst, source/_autosummary/sensor.rst, source/_autosummary/set.rst
IgnoredFiles = AGENTS.md, agents/**, source/_autosummary/airbag.rst, source/_autosummary/ale.rst, source/_autosummary/battery.rst, source/_autosummary/boundary.rst, source/_autosummary/case.rst, source/_autosummary/cese.rst, source/_autosummary/change.rst, source/_autosummary/chemistry.rst, source/_autosummary/component.rst, source/_autosummary/constrained.rst, source/_autosummary/contact.rst, source/_autosummary/control.rst, source/_autosummary/controller.rst, source/_autosummary/cosim.rst, source/_autosummary/damping.rst, source/_autosummary/database.rst, source/_autosummary/define.rst, source/_autosummary/deformable.rst, source/_autosummary/delete.rst, source/_autosummary/dualcese.rst, source/_autosummary/ef.rst, source/_autosummary/element.rst, source/_autosummary/em.rst, source/_autosummary/eos.rst, source/_autosummary/fatigue.rst, source/_autosummary/frequency.rst, source/_autosummary/icfd.rst, source/_autosummary/iga.rst, source/_autosummary/include.rst, source/_autosummary/index.rst, source/_autosummary/initial.rst, source/_autosummary/integration.rst, source/_autosummary/interface.rst, source/_autosummary/keyword.rst, source/_autosummary/load.rst, source/_autosummary/lso.rst, source/_autosummary/mat.rst, source/_autosummary/mesh.rst, source/_autosummary/module.rst, source/_autosummary/node.rst, source/_autosummary/other.rst, source/_autosummary/parameter.rst, source/_autosummary/part.rst, source/_autosummary/particle.rst, source/_autosummary/perturbation.rst, source/_autosummary/rail.rst, source/_autosummary/rigid.rst, source/_autosummary/rigidwall.rst, source/_autosummary/rve.rst, source/_autosummary/section.rst, source/_autosummary/sensor.rst, source/_autosummary/set.rst

# List of Packages to be used for our guidelines
Packages = Google
Expand Down
8 changes: 6 additions & 2 deletions doc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS = -j auto
SPHINXJOBS ?= auto
SPHINXOPTS = -j $(SPHINXJOBS)
SPHINXBUILD = sphinx-build
SOURCEDIR = source
BUILDDIR = _build
Expand All @@ -24,7 +25,10 @@ keyword_classes:
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@echo "⏱️ Starting Sphinx build with -j $(SPHINXJOBS)..."
@echo "Build started at $$(date '+%Y-%m-%d %H:%M:%S')"
@time $(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@echo "Build finished at $$(date '+%Y-%m-%d %H:%M:%S')"

clean:
rm -rf $(BUILDDIR)/*
Expand Down
1 change: 1 addition & 0 deletions doc/changelog/981.maintenance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Doc build optimization plan
12 changes: 11 additions & 1 deletion doc/make.bat
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ set SOURCEDIR=source
set BUILDDIR=_build
set APIDIR=source\api

REM Set parallel build option (use SPHINXJOBS environment variable or default to auto)
if "%SPHINXJOBS%" == "" (
set SPHINXJOBS=auto
)

if "%1" == "" goto help
if "%1" == "clean" goto clean
if "%1" == "pdf" goto pdf
Expand All @@ -28,7 +33,12 @@ if errorlevel 9009 (
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
echo Starting Sphinx build with -j %SPHINXJOBS%...
echo Build started at %TIME%
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% -j %SPHINXJOBS% %SPHINXOPTS% %O%
set BUILD_EXIT_CODE=%ERRORLEVEL%
echo Build finished at %TIME%
exit /b %BUILD_EXIT_CODE%
goto end

:clean
Expand Down
Loading
Loading