ansys · koubaa · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025
@@ -3,6 +3,10 @@ Assume an appropriate virtual environment is activated. If it isn't, just abort.
 
 ## Agent Hints
 
+**CRITIAL: Ensure working directory**. Sometimes agents will change working directories as part of tasks and forget to change them back.
+pydyna agent documentation and scripts are often sensitive to working directory, so recommended to check the pwd before each agent operation
+if unsure.
+
 **CRITICAL: Never redirect output to /dev/null on Windows**. This triggers a VS Code security prompt that halts execution. Instead:
 - For commands where you want to suppress output: Use `>$null 2>&1` (PowerShell) or just run the command without redirection
 - For commands where you want to check output: Use `python codegen/generate.py 2>&1 | Out-Null` or capture in a variable
@@ -13,13 +17,16 @@ Assume an appropriate virtual environment is activated. If it isn't, just abort.
   - ✅ GOOD: `python codegen/generate.py` (no redirection)
   - ✅ GOOD: `$output = python codegen/generate.py 2>&1`
 
-**Documentation builds**: To build docs without examples, use:
+**Documentation builds**: To build docs without examples:
+
+Ensure that the python3.13 environment is active, then:
+
 ```bash
 # Build docs without examples or autokeywords (fast)
-cd doc && BUILD_EXAMPLES=false BUILD_AUTOKEYWORDS_API=false ./make.bat html
+BUILD_EXAMPLES=false BUILD_AUTOKEYWORDS_API=false ./doc/make.bat html
 
 # Build docs with autokeywords but no examples (slow, ~8+ min for keyword imports alone)
-cd doc && BUILD_EXAMPLES=false BUILD_AUTOKEYWORDS_API=true ./make.bat html
+BUILD_EXAMPLES=false BUILD_AUTOKEYWORDS_API=true ./doc/make.bat html
 ```
 
 ## Agent Coding Style Preferences

@@ -0,0 +1,93 @@
+Plan: Autodoc for generated keywords
+
+## Performance Optimization Plan
+
+**Goal**: Optimize PyDyna documentation build to handle 3,214+ auto-generated keyword classes efficiently.
+
+**Problem**: Full build with auto-keywords takes 30+ minutes and uses excessive RAM. This makes iteration too slow for optimization work.
+
+**Strategy**: Use representative subset for rapid iteration, optimize on subset, then validate on full build.
+
+### Learnings
+
+- **AutoAPI caching**: Must `rm -rf doc/source/api` before subset builds, otherwise Sphinx uses cached docs from previous full build.
+- **Subset selection**: boundary (168), contact (155), control (216) = 539 keywords provides good representation (~17% of total).
+- **Clean vs incremental**: Incremental builds (no `rm -rf source/api _build`) are 18% faster (154.57s → 126.66s).
+- **Parallelization impact**: No scaling observed by increasing SPHINXJOBS from 1 to 4 to 8
+
+### Steps
+
+1. ✅ **Generate auto keywords for a specific subset using the domain feature from the codegen CLI** (e.g. boundary/*, contact/*, and control/*).
+   - Implementation: Added `--subset` CLI option to `codegen/generate.py` accepting comma-delimited domains
+   - Usage: `python codegen/generate.py --subset "boundary,contact,control"`
+   - Result: Generates 536 classes in 3 domains
+
+2. ✅ **Add timing instrumentation** to `doc/make.bat` and `doc/Makefile` with timestamps before/after each major phase, and modify `conf.py` to add Sphinx event handlers (`autodoc-process-docstring`, `source-read`, etc.) that log phase durations to `doc/_build/timing.log`.
+   - Implementation: Added `setup()` function in `conf.py` with event handlers for builder-inited, env-get-outdated, env-before-read-docs, doctree-resolved, build-finished
+   - Output: Logs to `doc/_build/timing.log` with phase breakdown and total document count
+
+3. ✅ **Establish baseline metrics** by running 3 timed builds with the subset: (a) current configuration, (b) with `SPHINXJOBS=4`, (c) with `SPHINXJOBS=8`, measuring total time, memory usage, and time per file to identify optimal parallelization.
+   - **Baseline metrics (1,461 documents, boundary/contact/control subset)**:
+     - Clean build (rm -rf source/api _build): ~154s total (41s read-docs, 113s process-doctrees)
+     - SPHINXJOBS=1: 128.94s total (36.05s read, 92.26s process) - FASTEST
+     - SPHINXJOBS=4: 147.34s total (40.46s read, 106.38s process)
+     - SPHINXJOBS=8: 134.88s total (35.74s read, 98.66s process)
+     - SPHINXJOBS=auto (default): 154.57s total (41.02s read, 113.07s process)
+   - **Finding**: Serial processing (SPHINXJOBS=1) is fastest for this subset size, likely due to overhead outweighing parallelization benefits
+
+4. ✅ **Profile subset build** using `python -m cProfile -o doc/_build/build.prof` wrapper and `snakeviz` or `pyinstrument` to visualize hotspots, focusing on autodoc import time, docstring processing, and HTML generation phases.
+   - **Key findings**:
+     - **Wall-clock time**: 7m22s (442 seconds total)
+     - **Sphinx-reported time**: 140s (37s read-docs, 102s process-doctrees)
+     - **Hidden AutoAPI overhead**: ~302 seconds (68% of total time!) before Sphinx even initializes
+     - **AutoAPI phases**: "Reading files" → "Mapping Data" → "Rendering Data" all happen pre-init
+     - **Intersphinx**: 6 inventories loaded at startup (python, numpy, matplotlib, imageio, pandas, pytest)
+     - **CPU vs I/O**: pyinstrument showed 1070s wall-clock vs 198s CPU time - heavily I/O-bound
+   - **Optimization opportunities**:
+     1. AutoAPI "Reading files" and "Mapping Data" - 5+ minutes of parsing Python files
+     2. Intersphinx inventory loading - network I/O at startup
+     3. Autosummary generation running before AutoAPI
+     4. Process-doctrees phase (102s) - Jinja2 template compilation, numpydoc processing
+
+5. **Measure, evaluate, and optimize AutoAPI** - The critical path (68% of build time is AutoAPI overhead)
+
+   **Phase A: Measure AutoAPI baseline**
+   1. Add detailed timing to AutoAPI phases (Reading files, Mapping Data, Rendering Data)
+   2. Measure time per file, identify slowest files to parse
+   3. Profile AST parsing with cProfile targeting AutoAPI/astroid modules
+   4. Baseline: 539 files in ~302s = 0.56s/file average
+
+   **Phase B: Quick wins (target: 15-25% improvement)**
+   1. **Intersphinx caching** (~5-10s):
+      - Pre-download 6 inventories to `doc/_build/intersphinx_cache/`
+      - Configure `intersphinx_cache_limit = -1`
+   2. **Optimize AutoAPI options** (~30-60s):
+      - Add `autoapi_options = ['members', 'show-inheritance']`
+      - Test `autoapi_python_class_content = 'class'`
+      - Test `autoapi_member_order = 'bysource'`
+   3. Document `SPHINXJOBS=1` as optimal (no time savings, just documentation)
+
+   **Phase C: Incremental builds (target: 40-60% savings on rebuilds)**
+   1. Enable `autoapi_keep_files = True` for caching generated RST files
+   2. Test incremental build after no-op (expect 5x+ speedup: 7.5min → <2min)
+   3. Test incremental build after single keyword change
+   4. Document when clean builds needed (config changes)
+
+   **Phase D: Deep optimization (target: 20-30% improvement)**
+   1. Simplify AutoAPI templates in `doc/source/autoapi/`
+   2. Investigate astroid caching options
+   3. Test if reducing type annotations helps parsing
+   4. Consider optimizing keyword class structure (codegen changes)
+
+   **Phase E: Alternative approaches (if insufficient)**
+   1. Pre-built documentation cache (commit AutoAPI RST files)
+   2. Selective documentation (important vs reference-only classes)
+   3. Split documentation builds (manual vs auto-keywords)
+
+   **Success metrics**:
+   - Subset build: <3 min (currently 7.5 min) = 60% improvement
+   - Full build: <10 min (currently 30+ min) = 67% improvement
+   - Incremental (no changes): <1 min = 85% improvement
+
+6. **Validate on full build** by running optimized configuration against all 3,214 auto-keywords, comparing total time and memory usage to establish that improvements scale, then document optimal build settings in `AGENTS.md` and CI workflows.
+
@@ -98,10 +98,20 @@ def skip_generate_keyword_class(keyword: str) -> bool:
     return False
 
 
-def get_undefined_alias_keywords(keywords_list: typing.List[typing.Dict]) -> typing.List[typing.Dict]:
+def get_undefined_alias_keywords(
+    keywords_list: typing.List[typing.Dict], subset_domains: typing.Optional[typing.List[str]] = None
+) -> typing.List[typing.Dict]:
+    from keyword_generation.utils.domain_mapper import get_keyword_domain
+
     undefined_aliases: typing.List[typing.Dict] = []
     for alias, kwd in data_model.ALIAS_TO_KWD.items():
         if alias not in [kwd["name"] for kwd in keywords_list]:
+            # Filter by subset domains if specified
+            if subset_domains:
+                domain = get_keyword_domain(alias)
+                if domain not in subset_domains:
+                    continue
+
             fixed_keyword = fix_keyword(alias).lower()
             classname = get_classname(fixed_keyword)
             fixed_base_keyword = fix_keyword(kwd).lower()
@@ -251,20 +261,33 @@ def generate_autodoc_file(autodoc_output_path, all_keywords, env):
     logger.info(f"Generated index.rst with {len(categories)} category links")
 
 
-def get_keywords_to_generate(kwd_name: typing.Optional[str] = None) -> typing.List[typing.Dict]:
+def get_keywords_to_generate(
+    kwd_name: typing.Optional[str] = None, subset_domains: typing.Optional[typing.List[str]] = None
+) -> typing.List[typing.Dict]:
     """Get keywords to generate. If a kwd name is not none, only generate
-    it and its generations."""
+    it and its generations. If subset_domains is provided, only generate keywords
+    from those domains (e.g., ['boundary', 'contact', 'control'])."""
     assert data_model.KWDM_INSTANCE is not None, "KWDM_INSTANCE not initialized"
     keywords = []
     kwd_list = data_model.KWDM_INSTANCE.get_keywords_list()
 
     # first get all aliases
     add_aliases(kwd_list)
 
+    # Import domain mapper to properly determine keyword domain
+    from keyword_generation.utils.domain_mapper import get_keyword_domain
+
     # then get keywords to generate
     for keyword in kwd_list:
         if kwd_name != None and keyword != kwd_name:
             continue
+
+        # Filter by subset domains if specified
+        if subset_domains:
+            domain = get_keyword_domain(keyword)
+            if domain not in subset_domains:
+                continue
+
         for keyword, keyword_options in get_generations(keyword):
             item = get_keyword_item(keyword)
             item["options"] = keyword_options
@@ -273,16 +296,26 @@ def get_keywords_to_generate(kwd_name: typing.Optional[str] = None) -> typing.Li
     return keywords
 
 
-def generate_classes(lib_path: str, kwd_name: typing.Optional[str] = None, autodoc_output_path: str = "") -> None:
+def generate_classes(
+    lib_path: str,
+    kwd_name: typing.Optional[str] = None,
+    autodoc_output_path: str = "",
+    subset_domains: typing.Optional[typing.List[str]] = None,
+) -> None:
     """Generates the keyword classes, importer, and type-mapper
     if kwd_name is not None, this only generates that particular keyword class
+    if subset_domains is not None, only generates keywords from those domains
     """
-    logger.debug(f"Starting class generation with lib_path={lib_path}, kwd_name={kwd_name}")
+    logger.debug(
+        f"Starting class generation with lib_path={lib_path}, kwd_name={kwd_name}, subset_domains={subset_domains}"
+    )
+    if subset_domains:
+        logger.info(f"Subset mode: generating only domains {subset_domains}")
     autodoc_entries = []
     env = Environment(loader=get_loader(), trim_blocks=True, lstrip_blocks=True)
     output_manager = OutputManager(lib_path)
     # Generate only requested keyword(s)
-    keywords_list = get_keywords_to_generate(kwd_name)
+    keywords_list = get_keywords_to_generate(kwd_name, subset_domains)
     logger.info(f"Generating {len(keywords_list)} keyword classes")
     generated_count = 0
     skipped_count = 0
@@ -304,9 +337,9 @@ def generate_classes(lib_path: str, kwd_name: typing.Optional[str] = None, autod
 
     # Always rewrite autodoc for all keywords
     if autodoc_output_path and not kwd_name:
-        all_keywords = get_keywords_to_generate()
+        all_keywords = get_keywords_to_generate(subset_domains=subset_domains)
         generate_autodoc_file(autodoc_output_path, all_keywords, env)
-    keywords_list.extend(get_undefined_alias_keywords(keywords_list))
+    keywords_list.extend(get_undefined_alias_keywords(keywords_list, subset_domains))
     if kwd_name == None:
         generate_entrypoints(env, output_manager, keywords_list)
 
@@ -349,23 +382,37 @@ def run_codegen(args):
         return
     load_inputs(this_folder, args)
 
+    # Handle subset domains
+    subset_domains = None
+    if args.subset:
+        subset_domains = [d.strip() for d in args.subset.split(",")]
+        logger.info(f"Subset mode enabled: generating only {subset_domains} domains")
+
     # Handle autodoc-only mode
     if args.autodoc_only:
         logger.info("Generating autodoc files only")
         env = Environment(loader=get_loader(), trim_blocks=True, lstrip_blocks=True)
-        all_keywords = get_keywords_to_generate()
+        all_keywords = get_keywords_to_generate(subset_domains=subset_domains)
         generate_autodoc_file(autodoc_path, all_keywords, env)
         logger.info("Autodoc generation complete")
         return
 
+    # Handle subset domains
+    subset_domains = None
+    if args.subset:
+        subset_domains = [d.strip() for d in args.subset.split(",")]
+        logger.info(f"Subset mode enabled: generating only {subset_domains} domains")
+
     if args.keyword == "":
         kwd = None
-        logger.info("Generating code for all keywords")
-        generate_classes(output, autodoc_output_path=autodoc_path)
+        logger.info(
+            "Generating code for all keywords" if not subset_domains else f"Generating subset: {subset_domains}"
+        )
+        generate_classes(output, autodoc_output_path=autodoc_path, subset_domains=subset_domains)
     else:
         kwd = args.keyword
         logger.info(f"Generating code for {kwd}")
-        generate_classes(output, kwd, autodoc_output_path=autodoc_path)
+        generate_classes(output, kwd, autodoc_output_path=autodoc_path, subset_domains=subset_domains)
 
 
 def parse_args():
@@ -419,6 +466,13 @@ def parse_args():
         choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
         help="Set the logging level. Defaults to INFO.",
     )
+    parser.add_argument(
+        "--subset",
+        "-s",
+        default="",
+        help="Generate only a subset of keyword domains (comma-delimited list, e.g., 'boundary,contact,control')."
+        "Useful for fast iteration during optimization work.",
+    )
     return parser.parse_args()
 
 

@@ -17,7 +17,7 @@ SkippedScopes = script, style, pre, figure
 WordTemplate = \b(?:%s)\b
 
 # Ignore autogenerated files and internal development notes
-IgnoredFiles = **/todo.md, AGENTS.md, agents/**, source/_autosummary/airbag.rst, source/_autosummary/ale.rst, source/_autosummary/battery.rst, source/_autosummary/boundary.rst, source/_autosummary/case.rst, source/_autosummary/cese.rst, source/_autosummary/change.rst, source/_autosummary/chemistry.rst, source/_autosummary/component.rst, source/_autosummary/constrained.rst, source/_autosummary/contact.rst, source/_autosummary/control.rst, source/_autosummary/controller.rst, source/_autosummary/cosim.rst, source/_autosummary/damping.rst, source/_autosummary/database.rst, source/_autosummary/define.rst, source/_autosummary/deformable.rst, source/_autosummary/delete.rst, source/_autosummary/dualcese.rst, source/_autosummary/ef.rst, source/_autosummary/element.rst, source/_autosummary/em.rst, source/_autosummary/eos.rst, source/_autosummary/fatigue.rst, source/_autosummary/frequency.rst, source/_autosummary/icfd.rst, source/_autosummary/iga.rst, source/_autosummary/include.rst, source/_autosummary/index.rst, source/_autosummary/initial.rst, source/_autosummary/integration.rst, source/_autosummary/interface.rst, source/_autosummary/keyword.rst, source/_autosummary/load.rst, source/_autosummary/lso.rst, source/_autosummary/mat.rst, source/_autosummary/mesh.rst, source/_autosummary/module.rst, source/_autosummary/node.rst, source/_autosummary/other.rst, source/_autosummary/parameter.rst, source/_autosummary/part.rst, source/_autosummary/particle.rst, source/_autosummary/perturbation.rst, source/_autosummary/rail.rst, source/_autosummary/rigid.rst, source/_autosummary/rigidwall.rst, source/_autosummary/rve.rst, source/_autosummary/section.rst, source/_autosummary/sensor.rst, source/_autosummary/set.rst
+IgnoredFiles = AGENTS.md, agents/**, source/_autosummary/airbag.rst, source/_autosummary/ale.rst, source/_autosummary/battery.rst, source/_autosummary/boundary.rst, source/_autosummary/case.rst, source/_autosummary/cese.rst, source/_autosummary/change.rst, source/_autosummary/chemistry.rst, source/_autosummary/component.rst, source/_autosummary/constrained.rst, source/_autosummary/contact.rst, source/_autosummary/control.rst, source/_autosummary/controller.rst, source/_autosummary/cosim.rst, source/_autosummary/damping.rst, source/_autosummary/database.rst, source/_autosummary/define.rst, source/_autosummary/deformable.rst, source/_autosummary/delete.rst, source/_autosummary/dualcese.rst, source/_autosummary/ef.rst, source/_autosummary/element.rst, source/_autosummary/em.rst, source/_autosummary/eos.rst, source/_autosummary/fatigue.rst, source/_autosummary/frequency.rst, source/_autosummary/icfd.rst, source/_autosummary/iga.rst, source/_autosummary/include.rst, source/_autosummary/index.rst, source/_autosummary/initial.rst, source/_autosummary/integration.rst, source/_autosummary/interface.rst, source/_autosummary/keyword.rst, source/_autosummary/load.rst, source/_autosummary/lso.rst, source/_autosummary/mat.rst, source/_autosummary/mesh.rst, source/_autosummary/module.rst, source/_autosummary/node.rst, source/_autosummary/other.rst, source/_autosummary/parameter.rst, source/_autosummary/part.rst, source/_autosummary/particle.rst, source/_autosummary/perturbation.rst, source/_autosummary/rail.rst, source/_autosummary/rigid.rst, source/_autosummary/rigidwall.rst, source/_autosummary/rve.rst, source/_autosummary/section.rst, source/_autosummary/sensor.rst, source/_autosummary/set.rst
 
 # List of Packages to be used for our guidelines
 Packages = Google

@@ -3,7 +3,8 @@
 
 # You can set these variables from the command line, and also
 # from the environment for the first two.
-SPHINXOPTS    = -j auto
+SPHINXJOBS    ?= auto
+SPHINXOPTS    = -j $(SPHINXJOBS)
 SPHINXBUILD   = sphinx-build
 SOURCEDIR     = source
 BUILDDIR      = _build
@@ -24,7 +25,10 @@ keyword_classes:
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+	@echo "⏱️  Starting Sphinx build with -j $(SPHINXJOBS)..."
+	@echo "Build started at $$(date '+%Y-%m-%d %H:%M:%S')"
+	@time $(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+	@echo "Build finished at $$(date '+%Y-%m-%d %H:%M:%S')"
 
 clean:
 	rm -rf $(BUILDDIR)/*

@@ -0,0 +1 @@
+Doc build optimization plan
@@ -11,6 +11,11 @@ set SOURCEDIR=source
 set BUILDDIR=_build
 set APIDIR=source\api
 
+REM Set parallel build option (use SPHINXJOBS environment variable or default to auto)
+if "%SPHINXJOBS%" == "" (
+	set SPHINXJOBS=auto
+)
+
 if "%1" == "" goto help
 if "%1" == "clean" goto clean
 if "%1" == "pdf" goto pdf
@@ -28,7 +33,12 @@ if errorlevel 9009 (
 	exit /b 1
 )
 
-%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+echo Starting Sphinx build with -j %SPHINXJOBS%...
+echo Build started at %TIME%
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% -j %SPHINXJOBS% %SPHINXOPTS% %O%
+set BUILD_EXIT_CODE=%ERRORLEVEL%
+echo Build finished at %TIME%
+exit /b %BUILD_EXIT_CODE%
 goto end
 
 :clean