Develop by YWHyuk · Pull Request #221 · PSAL-POSTECH/PyTorchSim

YWHyuk · 2026-04-07T04:20:42Z

Changelog — `develop` → `master`

TOGSim (simulator)

Memory backend: updated to Ramulator 2.1.
Config format: Configuration files have migrated from JSON to YAML format.
Stats & robustness: Clearer DRAM bandwidth reporting, safer idle-stat handling, fixes for local/remote memory stats.
Scheduling: Internal graph API cleanup (non-breaking, no user-facing API changes).Trace files support comments; improved CLI help.

Compiler & runtime (PyTorchSim / MLIR)

PyTorch version: 2.1 → 2.8 (PyTorch version update #196)
Operators: SDPA can now be routed to a dedicated NPU kernel via torch.nn.attention.sdpa_kernel([SDPBackend.FLASH_ATTENTION]) context manager; TopK, Bitonic sort, Cat added. ([BUG]Support for repeat_interleave operation to enable Grouped Query Attention (GQA) #198)
CNNs: MobileNet CI and 1×1 spatial conv as linear; baseline group convolution decomposition + tests. ([BUG] Cannot schedule MobileNet-SSDLite model #205)
Dtypes / codegen: Fixed float16 codegen in MLIR templates; worked around gem5 lmul8 widening issue by avoiding the problematic vector-width in codegen.
TOGSim session: Run kernels under with TOGSimulator(config_path=...): so config and simulator lifecycle are scoped to the block.
Multi-tenant launch: Call torch.npu.launch_model(opt_fn, *args, stream_index=..., timestamp=..., **kwargs) inside that block.
Cleanup: Removed legacy scheduler code; standardized on the TOGSimulator-oriented API.

Device (OpenReg / NPU)

Device API: Use torch.device("npu") (and torch.device("npu:0"), etc.) like any built-in device type — no extra package import beyond import torch; the NPU backend registers with PyTorch's device system.
Eager mode: CPU fallback is applied automatically when graph compilation is not available.

⚠️ Breaking Changes

Config format migration: Configuration files must be converted from JSON to YAML format. Existing .json config files are no longer supported.
Multi-tenant API redesign: The scheduler-based multi-tenant launch pattern has been replaced. The old API required manual Scheduler instantiation, Request object construction, and a while not scheduler.is_finished(): loop. The new API uses a with TOGSimulator(config_path=...): context and torch.npu.launch_model(..., stream_index=..., timestamp=...) calls directly. See test_scheduler.py for the updated usage pattern.

CI, tests, experiments

Added or tightened tests for DeepSeek, YOLOv5, MobileNet; CI image updated for PyTorch 2.8.

Other

Misc. codegen, indexing, and matmul-related bugfixes and small refactors.

[Frontend] Use ops instead of raw assembly code

…ple DRAM YAML

- Remove cell execution timestamps from metadata - Simplify path setup: remove base_dir/sys.path.append, use absolute paths - Replace extension_config.CONFIG_TOGSIM_CONFIG with direct config paths - Update log file paths to latest run timestamps - Adjust tensor sizes and minor wording fixes

- gen_configs: use JSONEncoder to emit more compact JSON (regenerated yaml files) - Simulator: read Ramulator2 config with ifstream and log text instead of YAML::Dump

YWHyuk added 30 commits December 5, 2025 13:05

[Frontend] Use ops instead of raw assembly code

0d4ae79

[Test] Add matmul vector fusion case

bea9bd2

[Frontend] Fix ops conversion

837b062

[Frontend] Use custom malloc in the validation wrapper code

a33659a

[Device] Add missing operations

4e2d0a0

[Frontend] Add typecasting for logical operation

6e70edc

[Device] register amp

54f450a

[Frontend+Test] Support scatter pattern with a test case

8985ab8

[Fix] minor bugs

1c2c8bf

[Fix] Fix the acceess to wrong variable

1895958

[Log] Add print lock to prevent log crash

cd14109

[Device] Add custom zero_, zeors_like

5fe87e9

[Frontend/Spike] Use 64byte aligned buffer size

db18cbd

[Refactor] Seperate OpOverrides

1152428

[Test] Add Llama1&2 test cases

8452f5c

[TOGSim] Add error handling

00cd8c7

[Scheduler] Use given config file for compilations

a8d96cd

[Fix/ops] Fix wrong implementation of sigmoid

8aac3ab

[Tests] Use manual mask for Llama

fd6a846

[TOGSim] Use YAML instead of json

dea7f47

[Frontend] Use YAML config file instead of json

d66df91

[Test] Change attention masek for Llama

dce58d0

[Autotune] Fix autotune log path

1c2ab36

[Fix] Fix codegen error in ops.select

20af550

Merge pull request #164 from PSAL-POSTECH/ops

2276450

[Frontend] Use ops instead of raw assembly code

[Tutorial] Update environment setting for the tutorial

c39c3a3

[Tutorial] Add tutorail env setting scripts

8678fe6

[Tutorial] Change format of config files to yml

0a5d0e7

[Tutorial] Fix typo dockerfile

008cf4c

[Tutorial] Fix wrong config name

18d7bab

YWHyuk added 3 commits April 17, 2026 14:57

[Autotune] subprocess timeouts from first finite-cycle wall time

0993319

[Autotune] Add non-subtiling option in tile_candidates

46c4954

[Lower] Add filter condition

fbe0bc0

YWHyuk force-pushed the develop branch from 07c6911 to fbe0bc0 Compare April 17, 2026 05:57

YunseonShin and others added 25 commits April 20, 2026 01:29

[Tutorial] ispass2026 session1

901f93e

[Tutorial] Add guideline for a hands-on

174b3cc

[Tutorial] Add CI for tutorial image

6d64afa

[Tutorial] Add missing script

0043b01

[Tutorial] Fix paths in Dockerfile for gem5 and PyTorchSimDevice

c83d321

[Tutorial] fix

50e210c

[Config] derive req size, freq, peak BW from Ramulator2; simplify sim…

24062d1

…ple DRAM YAML

[TOGSim/Log] Improve simulator log clarity and wording

67d87ce

[Tutorial] Update session2 jupyter notebook

28745d6

[Tutorial] Fix ramulator config path

3cdfb7c

[Doc] update README for v1.1.0 release

9df9b07

[Config] Tighten Ramulator2 config output and log raw on-disk file

69ce680

- gen_configs: use JSONEncoder to emit more compact JSON (regenerated yaml files) - Simulator: read Ramulator2 config with ifstream and log text instead of YAML::Dump

[CI] run cycle + speedup in one job

9bfd11b

Merge branch 'master' into develop

84a2aad

[Doc] Update docker image tag

300b1cd

[Tutorial] build image with PyTorchSim at triggering commit

73467df

[TOGSim] Fix booksim config path

edf8b44

[Config] Fix ramulator config

a60cacc

[Tutorial] change param *_out to *_output

794c5f3

[Tutorial] Set default shell as bash

81ce7c3

[Tutorial] Seperate log analysis hands-on

b00a967

[CI] Fix speedup script + add missing file

614aa2c

[Doc] Update artifact evaluation explanation in README

1b1fa6d

[Doc] Enhance README clarity and examples

48c4979

YWHyuk merged commit f595cef into master Apr 25, 2026
202 of 204 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop#221

Develop#221
YWHyuk merged 203 commits into
masterfrom
develop

YWHyuk commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

YWHyuk commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog — develop → master

TOGSim (simulator)

Compiler & runtime (PyTorchSim / MLIR)

Device (OpenReg / NPU)

⚠️ Breaking Changes

CI, tests, experiments

Other

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

YWHyuk commented Apr 7, 2026 •

edited

Loading

Changelog — `develop` → `master`