Add language model architecture for inference script #108
Merged
Conversation
* docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. * CUDA header declarations for (LayerNorm) forward and backward (#66) * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces --------- Co-authored-by: Max <eamon5174@gmail.com> * Add CUDA attention kernels, gradient norms, and CI improvements (#69) * exp(#58) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp --------- Co-authored-by: Max <eamon5174@gmail.com> * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv Co-Authored-By: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-Authored-By: Eamon Sippy <eamon112009@gmail.com> * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * merge branch master (#78) * Add CUDA kernels, optimize CI, and update documentation (#74) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. * CUDA header declarations for (LayerNorm) forward and backward (#66) * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces --------- Co-authored-by: Max <eamon5174@gmail.com> * Add CUDA attention kernels, gradient norms, and CI improvements (#69) * exp(#58) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp --------- Co-authored-by: Max <eamon5174@gmail.com> * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv Co-Authored-By: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-Authored-By: Eamon Sippy <eamon112009@gmail.com> * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Eamon <eamon112009@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Optimize CI workflow and Docker configurations with refactoring (#72) (#76) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp * docs:report [run_20260530_165216](~791 tok/s) (#60) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs: report [run_20260530_165216] (~791 tok/s) (#62) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 --------- * chore: clang-format configuration file based on LLVM (#63) * ci: add manual PR checks workflow with slash command support * ci: add manual PR checks workflow with slash command support * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. * Refactor core architecture and optimize CUDA features (#75) * exp(#58) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp --------- * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. --------- * Add CUDA kernels, optimize CI, and update documentation (#74) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. * CUDA header declarations for (LayerNorm) forward and backward (#66) * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces --------- * Add CUDA attention kernels, gradient norms, and CI improvements (#69) * exp(#58) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp --------- * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. --------- --------- --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Refactor .dockerignore for improved clarity Removed unnecessary entries to streamline the build context. * Enhance .gitignore with additional exclusions Expanded .gitignore to include more file types and directories. * Modify training configuration parameters (#80) Updated training parameters including batch size, iterations, evaluation interval, learning rate, and dropout rate. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add optimize CI, and update documentation (#84) * merge branch master of codeaddict (#77) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. * CUDA header declarations for (LayerNorm) forward and backward (#66) * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum…
* Modify training configuration parameters Updated training parameters including batch size, iterations, evaluation interval, learning rate, and dropout rate. * Refactor training configuration and optimize CI workflows (#81) * Modify training configuration parameters (#80) Updated training parameters including batch size, iterations, evaluation interval, learning rate, and dropout rate. * merge branch master of codeaddict (#77) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. * CUDA header declarations for (LayerNorm) forward and backward (#66) * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces --------- * Add CUDA attention kernels, gradient norms, and CI improvements (#69) * exp(#58) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp --------- * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv Co-Authored-By: Eamon Sippy <eamon112009@gmail.com> * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. --------- * merge branch master (#78) * Add CUDA kernels, optimize CI, and update documentation (#74) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. * CUDA header declarations for (LayerNorm) forward and backward (#66) * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces --------- * Add CUDA attention kernels, gradient norms, and CI improvements (#69) * exp(#58) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp --------- * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv Co-Authored-By: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-Authored-By: Eamon Sippy <eamon112009@gmail.com> * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Eamon <eamon112009@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Optimize CI workflow and Docker configurations with refactoring (#72) (#76) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp * docs:report [run_20260530_165216](~791 tok/s) (#60) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs: report [run_20260530_165216] (~791 tok/s) (#62) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 --------- * chore: clang-format configuration file based on LLVM (#63) * ci: add manual PR checks workflow with slash command support * ci: add manual PR checks workflow with slash command support * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. * Refactor core architecture and optimize CUDA features (#75) * exp(#58) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp --------- * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. --------- * Add CUDA kernels, optimize CI, and update documentation (#74) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. * CUDA header declarations for (LayerNorm) forward and backward (#66) * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces --------- * Add CUDA attention kernels, gradient norms, and CI improvements (#69) * exp(#58) * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * feat(ci): optimize workflow pipeline and update docker configurations * refactor(ci): optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * refactor : optimize workflow pipeline and update docker configurations * Added MIT LICENSE to this project Quadtrix.cpp * Refactor Dockerfile to use ARG for CUDA version * Refactor Dockerfile for backend dependencies * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * Delete .devops/Dockerfile.frontend * Delete .devops/Dockerfile.dev.frontend * refactor : Dockerfile.backend optimize workflow pipeline * refactor : Dockerfile.backend optimize workflow pipeline * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactored (CI): consolidated manual Docker build jobs into a matrix strategy to reduce duplication * refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS keyframes * refactor : message bubble layout to use inline styles * refactor(ui): complete inline-style migration and update auto-scroll implementation * refactor(ui): complete inline-style migration for MessageAvatar component * refactor(ui): rewrite EmptyState component using pure inline styles * refactored(tensor): vectorize element-wise addition and scalar scaling using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`. * refactor(main): redesign training loop to log per-step and sample during evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints. * feat: implement GPT training loop with multi-GPU and memory optimizations - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes. * Update README.md with new banner for qudtrix.cpp --------- * ci: add manual PR checks workflow with slash command support * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces * feat(cuda): add LayerNorm forward and backward kernel declarations * refactor(ci): organize workflow into push-triggered QA and manual docker builds Updated CI workflow to restrict branches for push events and improved input descriptions for image selection and push options. * Fix formatting and update CI workflow steps * Enhance CI with macOS binary build and release Added macOS binary build and release steps to CI workflow. * feat(docker): add Dockerfile for frontend application * feat(docker): add Dockerfile for frontend application * refactor(ci): remove release job from GitHub actions * ci: add unified release and docker build workflow * ci: add unified release and docker build workflow * Refactor macOS build workflow for arm64 architecture * Update release workflow to remove macOS x64 build Removed dependency on build-macos-x64 for the release job. * perf: update execution time benchmarks in csv * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * ci(docker): refactor image build workflow and add frontend job * Remove frontend job from Docker Images workflow * Update release workflow to remove s390x and add notes Removed s390x build configurations and added a step to write detailed release notes. * feat: add local orchestration script for frontend and backend servers Introduces a central Python execution script to concurrently manage and orchestrate the development environment for both the frontend and backend. - Detects system OS to invoke correct `npm` and `python` (virtualenv) binary variants. - Verifies existence of the local PyTorch `.pt` model checkpoint before starting. - Configures environment variables dynamically for Uvicorn (FastAPI) and Vite. - Handles cross-origin setups (CORS) linking ports interactively. - Gracefully handles process termination (`Ctrl+C`) by forwarding termination signals. - Automatically launches the frontend application in the system web browser. * chore(deps): bump actions/github-script from 7 to 9 (#71) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... * feat(cuda): introduce log_message utility and LogLevel enum * feat(cuda): add cuBLAS handle wrapper and matmul operations * feat(cuda): implement core Tensor, TensorShape, and TensorView abstractions * refactor: untie embedding and lm_head weights and to quadtrix * feat(cuda): add NCCL communicator wrapper and all-reduce primitives * Update README.md with workflow badges Added badges for release, package, and CI workflows. * kernels: add AdamW optimization kernel with stochastic rounding Introduces the AdamW fused CUDA kernel including linear interpolation optimizations (`lerp`), multi-slice batching support via 2D grids, and `init_from_master` utility functions for low-precision parameter handling. * cudnn: implement cached SDPA forward graph using cuDNN frontend * feat(cuda): implement Packed128 memory vectorization utilities * feat: add distributed sharded DataLoader for binary token files * feat(multi-gpu): add foundational utilities for ZeRO sharding * feat(utils): add I/O and memory error-checking wrappers * feat : add PyTorch-compatible Mersenne Twister random utilities * README : Enhance README with header and workflow badges Updated README to include a header and badges for release, package, and CI workflows. * utils:`fopenCheck`, `freadCheck`, `fwriteCheck`, `fcloseCheck`, and `fseekCheck` with explicit crash details and project-specific troubleshooting hints. - Add cross-platform socket closure wrappers (`scloseCheck` and `closesocketCheck`) for Linux and Windows. * mfu: add GPU specifications database and utilities for MFU estimation * Modify project title in README.md Changed the project title to include 'llm.cpp' for clarity. * Update README to remove image and clean up content Removed image from README and adjusted formatting. --------- --------- --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Refactor .dockerignore for improved clarity Removed unnecessary entries to streamline the build context. * Enhance .gitignore with additional exclusions Expanded .gitignore to include more file types and directories. * Modify training configuration parameters (#80) Updated training parameters including batch size, iterations, evaluation interval, learning rate, and dropout rate. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update main.py * Delete quadtrix_training_report.png * Delete docker-compose.yml * Delete docker-compose.gpu.yml * Delete docker-compose.dev.yml * Delete benchmark_results.csv * Delete SECURITY.md * refactor: switch tokenizer from gpt2 to tiktoken o200k * Delete contributing.md * Delete CUDA/llmcpp directory * Delete CUDA directory --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Max <eamon5174@gmail.com> Co-authored-by: codeenthusiasm23 <273188204+codeenthusiasm23@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Optimize training configuration and CI workflows (#87) * Optimization (#82) * Modify training configuration parameters Updated training parameters including batch size, iterations, evaluation interval, learning rate, and dropout rate. * Refactor training configuration and optimize CI workflows (#81) * Modify training configuration parameters (#80) Updated training parameters including batch size, iterations, evaluation interval, learning rate, and dropout rate. * merge branch master of codeaddict (#77) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. * CUDA header declarations for (LayerNorm) forward and backward (#66) * feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com> …
Updated image links and improved project description.
Added new images to the README and updated project description for clarity.
Bumps [hadolint/hadolint-action](https://github.com/hadolint/hadolint-action) from 3.1.0 to 3.3.0. - [Release notes](https://github.com/hadolint/hadolint-action/releases) - [Commits](hadolint/hadolint-action@v3.1.0...v3.3.0) --- updated-dependencies: - dependency-name: hadolint/hadolint-action dependency-version: 3.3.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.10.4 to 2.13.4. - [Release notes](https://github.com/pydantic/pydantic/releases) - [Changelog](https://github.com/pydantic/pydantic/blob/main/HISTORY.md) - [Commits](pydantic/pydantic@v2.10.4...v2.13.4) --- updated-dependencies: - dependency-name: pydantic dependency-version: 2.13.4 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Update .gitattributes to mark frontend/** as linguist-vendored and add a submodule stanza for PyTorch (path: pytorch, url: https://github.com/pytorch/pytorch). This records the PyTorch submodule location and ensures frontend files are treated as vendored by GitHub linguist.
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - [Release notes](https://github.com/docker/metadata-action/releases) - [Commits](docker/metadata-action@v5...v6) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eamon Sippy <eamon112009@gmail.com>
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](docker/login-action@v3...v4) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/docker/login-action/releases">docker/login-action's releases</a>.</em></p> <blockquote> <h2>v4.0.0</h2> <ul> <li>Node 24 as default runtime (requires <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Actions Runner v2.327.1</a> or later) by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/login-action/pull/929">docker/login-action#929</a></li> <li>Switch to ESM and update config/test wiring by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/login-action/pull/927">docker/login-action#927</a></li> <li>Bump <code>@actions/core</code> from 1.11.1 to 3.0.0 in <a href="https://redirect.github.com/docker/login-action/pull/919">docker/login-action#919</a></li> <li>Bump <code>@aws-sdk/client-ecr</code> from 3.890.0 to 3.1000.0 in <a href="https://redirect.github.com/docker/login-action/pull/909">docker/login-action#909</a> <a href="https://redirect.github.com/docker/login-action/pull/920">docker/login-action#920</a></li> <li>Bump <code>@aws-sdk/client-ecr-public</code> from 3.890.0 to 3.1000.0 in <a href="https://redirect.github.com/docker/login-action/pull/909">docker/login-action#909</a> <a href="https://redirect.github.com/docker/login-action/pull/920">docker/login-action#920</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.63.0 to 0.77.0 in <a href="https://redirect.github.com/docker/login-action/pull/910">docker/login-action#910</a> <a href="https://redirect.github.com/docker/login-action/pull/928">docker/login-action#928</a></li> <li>Bump <code>@isaacs/brace-expansion</code> from 5.0.0 to 5.0.1 in <a href="https://redirect.github.com/docker/login-action/pull/921">docker/login-action#921</a></li> <li>Bump js-yaml from 4.1.0 to 4.1.1 in <a href="https://redirect.github.com/docker/login-action/pull/901">docker/login-action#901</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/login-action/compare/v3.7.0...v4.0.0">https://github.com/docker/login-action/compare/v3.7.0...v4.0.0</a></p> <h2>v3.7.0</h2> <ul> <li>Add <code>scope</code> input to set scopes for the authentication token by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/login-action/pull/912">docker/login-action#912</a></li> <li>Add support for AWS European Sovereign Cloud ECR by <a href="https://github.com/dphi"><code>@dphi</code></a> in <a href="https://redirect.github.com/docker/login-action/pull/914">docker/login-action#914</a></li> <li>Ensure passwords are redacted with <code>registry-auth</code> input by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/login-action/pull/911">docker/login-action#911</a></li> <li>build(deps): bump lodash from 4.17.21 to 4.17.23 in <a href="https://redirect.github.com/docker/login-action/pull/915">docker/login-action#915</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/login-action/compare/v3.6.0...v3.7.0">https://github.com/docker/login-action/compare/v3.6.0...v3.7.0</a></p> <h2>v3.6.0</h2> <ul> <li>Add <code>registry-auth</code> input for raw authentication to registries by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/login-action/pull/887">docker/login-action#887</a></li> <li>Bump <code>@aws-sdk/client-ecr</code> to 3.890.0 in <a href="https://redirect.github.com/docker/login-action/pull/882">docker/login-action#882</a> <a href="https://redirect.github.com/docker/login-action/pull/890">docker/login-action#890</a></li> <li>Bump <code>@aws-sdk/client-ecr-public</code> to 3.890.0 in <a href="https://redirect.github.com/docker/login-action/pull/882">docker/login-action#882</a> <a href="https://redirect.github.com/docker/login-action/pull/890">docker/login-action#890</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.62.1 to 0.63.0 in <a href="https://redirect.github.com/docker/login-action/pull/883">docker/login-action#883</a></li> <li>Bump brace-expansion from 1.1.11 to 1.1.12 in <a href="https://redirect.github.com/docker/login-action/pull/880">docker/login-action#880</a></li> <li>Bump undici from 5.28.4 to 5.29.0 in <a href="https://redirect.github.com/docker/login-action/pull/879">docker/login-action#879</a></li> <li>Bump tmp from 0.2.3 to 0.2.4 in <a href="https://redirect.github.com/docker/login-action/pull/881">docker/login-action#881</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/login-action/compare/v3.5.0...v3.6.0">https://github.com/docker/login-action/compare/v3.5.0...v3.6.0</a></p> <h2>v3.5.0</h2> <ul> <li>Support dual-stack endpoints for AWS ECR by <a href="https://github.com/Spacefish"><code>@Spacefish</code></a> <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/login-action/pull/874">docker/login-action#874</a> <a href="https://redirect.github.com/docker/login-action/pull/876">docker/login-action#876</a></li> <li>Bump <code>@aws-sdk/client-ecr</code> to 3.859.0 in <a href="https://redirect.github.com/docker/login-action/pull/860">docker/login-action#860</a> <a href="https://redirect.github.com/docker/login-action/pull/878">docker/login-action#878</a></li> <li>Bump <code>@aws-sdk/client-ecr-public</code> to 3.859.0 in <a href="https://redirect.github.com/docker/login-action/pull/860">docker/login-action#860</a> <a href="https://redirect.github.com/docker/login-action/pull/878">docker/login-action#878</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.57.0 to 0.62.1 in <a href="https://redirect.github.com/docker/login-action/pull/870">docker/login-action#870</a></li> <li>Bump form-data from 2.5.1 to 2.5.5 in <a href="https://redirect.github.com/docker/login-action/pull/875">docker/login-action#875</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/login-action/compare/v3.4.0...v3.5.0">https://github.com/docker/login-action/compare/v3.4.0...v3.5.0</a></p> <h2>v3.4.0</h2> <ul> <li>Bump <code>@actions/core</code> from 1.10.1 to 1.11.1 in <a href="https://redirect.github.com/docker/login-action/pull/791">docker/login-action#791</a></li> <li>Bump <code>@aws-sdk/client-ecr</code> to 3.766.0 in <a href="https://redirect.github.com/docker/login-action/pull/789">docker/login-action#789</a> <a href="https://redirect.github.com/docker/login-action/pull/856">docker/login-action#856</a></li> <li>Bump <code>@aws-sdk/client-ecr-public</code> to 3.758.0 in <a href="https://redirect.github.com/docker/login-action/pull/789">docker/login-action#789</a> <a href="https://redirect.github.com/docker/login-action/pull/856">docker/login-action#856</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.35.0 to 0.57.0 in <a href="https://redirect.github.com/docker/login-action/pull/801">docker/login-action#801</a> <a href="https://redirect.github.com/docker/login-action/pull/806">docker/login-action#806</a> <a href="https://redirect.github.com/docker/login-action/pull/858">docker/login-action#858</a></li> <li>Bump cross-spawn from 7.0.3 to 7.0.6 in <a href="https://redirect.github.com/docker/login-action/pull/814">docker/login-action#814</a></li> <li>Bump https-proxy-agent from 7.0.5 to 7.0.6 in <a href="https://redirect.github.com/docker/login-action/pull/823">docker/login-action#823</a></li> <li>Bump path-to-regexp from 6.2.2 to 6.3.0 in <a href="https://redirect.github.com/docker/login-action/pull/777">docker/login-action#777</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/login-action/compare/v3.3.0...v3.4.0">https://github.com/docker/login-action/compare/v3.3.0...v3.4.0</a></p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/docker/login-action/commit/650006c6eb7dba73a995cc03b0b2d7f5ca915bee"><code>650006c</code></a> Merge pull request <a href="https://redirect.github.com/docker/login-action/issues/960">#960</a> from docker/dependabot/npm_and_yarn/aws-sdk-dependenc...</li> <li><a href="https://github.com/docker/login-action/commit/99df1a3f6d65e48177ea57671a50e2242eae4b63"><code>99df1a3</code></a> chore: update generated content</li> <li><a href="https://github.com/docker/login-action/commit/3ab375f324f46da5f6901efeda4be4e2566ebaa2"><code>3ab375f</code></a> build(deps): bump the aws-sdk-dependencies group across 1 directory with 2 up...</li> <li><a href="https://github.com/docker/login-action/commit/39d85804ae465a1816c68ff58158ec66883981b4"><code>39d8580</code></a> Merge pull request <a href="https://redirect.github.com/docker/login-action/issues/970">#970</a> from docker/dependabot/npm_and_yarn/docker/actions-to...</li> <li><a href="https://github.com/docker/login-action/commit/4eefcd33ca7213989697445a78b6730274bfaba6"><code>4eefcd3</code></a> chore: update generated content</li> <li><a href="https://github.com/docker/login-action/commit/56d092c8b3f04006c22f4fc20a2b3d2442caed56"><code>56d092c</code></a> build(deps): bump <code>@docker/actions-toolkit</code> from 0.86.0 to 0.90.0</li> <li><a href="https://github.com/docker/login-action/commit/e2e31ca87063ae00fd41ad3b9c548dd8ec24c5ff"><code>e2e31ca</code></a> Merge pull request <a href="https://redirect.github.com/docker/login-action/issues/976">#976</a> from docker/dependabot/npm_and_yarn/actions/core-3.0.1</li> <li><a href="https://github.com/docker/login-action/commit/0bced941e843afc786fbfd58b1c6c13ca11e09c9"><code>0bced94</code></a> chore: update generated content</li> <li><a href="https://github.com/docker/login-action/commit/3e75a0f266b07e09777a621d0ca5f4432ef9f10c"><code>3e75a0f</code></a> build(deps): bump <code>@actions/core</code> from 3.0.0 to 3.0.1</li> <li><a href="https://github.com/docker/login-action/commit/365bebd9d646160567ebad47824f026e09ee6970"><code>365bebd</code></a> Merge pull request <a href="https://redirect.github.com/docker/login-action/issues/984">#984</a> from docker/dependabot/github_actions/aws-actions/con...</li> <li>Additional commits viewable in <a href="https://github.com/docker/login-action/compare/v3...v4">compare view</a></li> </ul> </details> <br />
Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-qemu-action/releases) - [Commits](docker/setup-qemu-action@v3...v4) --- updated-dependencies: - dependency-name: docker/setup-qemu-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/docker/setup-qemu-action/releases">docker/setup-qemu-action's releases</a>.</em></p> <blockquote> <h2>v4.0.0</h2> <ul> <li>Node 24 as default runtime (requires <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Actions Runner v2.327.1</a> or later) by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/245">docker/setup-qemu-action#245</a></li> <li>Switch to ESM and update config/test wiring by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/241">docker/setup-qemu-action#241</a></li> <li>Bump <code>@actions/core</code> from 1.11.1 to 3.0.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/244">docker/setup-qemu-action#244</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.67.0 to 0.77.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/243">docker/setup-qemu-action#243</a></li> <li>Bump <code>@isaacs/brace-expansion</code> from 5.0.0 to 5.0.1 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/240">docker/setup-qemu-action#240</a></li> <li>Bump js-yaml from 3.14.1 to 3.14.2 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/231">docker/setup-qemu-action#231</a></li> <li>Bump lodash from 4.17.21 to 4.17.23 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/238">docker/setup-qemu-action#238</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-qemu-action/compare/v3.7.0...v4.0.0">https://github.com/docker/setup-qemu-action/compare/v3.7.0...v4.0.0</a></p> <h2>v3.7.0</h2> <ul> <li>Bump <code>@docker/actions-toolkit</code> from 0.56.0 to 0.67.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/217">docker/setup-qemu-action#217</a> <a href="https://redirect.github.com/docker/setup-qemu-action/pull/230">docker/setup-qemu-action#230</a></li> <li>Bump brace-expansion from 1.1.11 to 1.1.12 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/220">docker/setup-qemu-action#220</a></li> <li>Bump form-data from 2.5.1 to 2.5.5 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/218">docker/setup-qemu-action#218</a></li> <li>Bump tmp from 0.2.3 to 0.2.4 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/221">docker/setup-qemu-action#221</a></li> <li>Bump undici from 5.28.4 to 5.29.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/219">docker/setup-qemu-action#219</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-qemu-action/compare/v3.6.0...v3.7.0">https://github.com/docker/setup-qemu-action/compare/v3.6.0...v3.7.0</a></p> <h2>v3.6.0</h2> <ul> <li>Display binfmt version by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/202">docker/setup-qemu-action#202</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-qemu-action/compare/v3.5.0...v3.6.0">https://github.com/docker/setup-qemu-action/compare/v3.5.0...v3.6.0</a></p> <h2>v3.5.0</h2> <ul> <li>Bump <code>@docker/actions-toolkit</code> from 0.54.0 to 0.56.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/205">docker/setup-qemu-action#205</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-qemu-action/compare/v3.4.0...v3.5.0">https://github.com/docker/setup-qemu-action/compare/v3.4.0...v3.5.0</a></p> <h2>v3.4.0</h2> <ul> <li>Bump <code>@docker/actions-toolkit</code> from 0.49.0 to 0.54.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/193">docker/setup-qemu-action#193</a> <a href="https://redirect.github.com/docker/setup-qemu-action/pull/197">docker/setup-qemu-action#197</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-qemu-action/compare/v3.3.0...v3.4.0">https://github.com/docker/setup-qemu-action/compare/v3.3.0...v3.4.0</a></p> <h2>v3.3.0</h2> <ul> <li>Add <code>cache-image</code> input to enable/disable caching of binfmt image by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/130">docker/setup-qemu-action#130</a></li> <li>Bump <code>@actions/core</code> from 1.10.1 to 1.11.1 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/172">docker/setup-qemu-action#172</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.35.0 to 0.49.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/187">docker/setup-qemu-action#187</a></li> <li>Bump cross-spawn from 7.0.3 to 7.0.6 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/182">docker/setup-qemu-action#182</a></li> <li>Bump path-to-regexp from 6.2.2 to 6.3.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/162">docker/setup-qemu-action#162</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-qemu-action/compare/v3.2.0...v3.3.0">https://github.com/docker/setup-qemu-action/compare/v3.2.0...v3.3.0</a></p> <h2>v3.2.0</h2> <ul> <li>Bump <code>@docker/actions-toolkit</code> from 0.31.0 to 0.35.0 in <a href="https://redirect.github.com/docker/setup-qemu-action/pull/154">docker/setup-qemu-action#154</a> <a href="https://redirect.github.com/docker/setup-qemu-action/pull/155">docker/setup-qemu-action#155</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-qemu-action/compare/v3.1.0...v3.2.0">https://github.com/docker/setup-qemu-action/compare/v3.1.0...v3.2.0</a></p> <h2>v3.1.0</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/docker/setup-qemu-action/commit/06116385d9baf250c9f4dcb4858b16962ea869c3"><code>0611638</code></a> Merge pull request <a href="https://redirect.github.com/docker/setup-qemu-action/issues/21">#21</a> from crazy-max/uninst</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/ce59c818a5ff16552ddf7407ee7cb00bea682925"><code>ce59c81</code></a> chore: update generated content</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/2ddad4401e17fa807e8a3c4bd289ccdd993f0868"><code>2ddad44</code></a> uninstall current emulators</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/8c37cd6f3456e1f3f3026250eac496709e9e7e10"><code>8c37cd6</code></a> Merge pull request <a href="https://redirect.github.com/docker/setup-qemu-action/issues/250">#250</a> from docker/dependabot/npm_and_yarn/docker/actions-to...</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/d1a0ff34af591b8e290e46f3fa114ef5bb81cd1c"><code>d1a0ff3</code></a> chore: update generated content</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/0a8f3dc12541cc2c3b19c182a1a2c90a2c8b8d93"><code>0a8f3dc</code></a> build(deps): bump <code>@docker/actions-toolkit</code> from 0.79.0 to 0.91.0</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/9430f61a7691bd1bfdc4d6ba70e558659d36fa7a"><code>9430f61</code></a> Merge pull request <a href="https://redirect.github.com/docker/setup-qemu-action/issues/291">#291</a> from docker/dependabot/npm_and_yarn/tmp-0.2.6</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/978bd7796cb6698377e7af6726b726e5ced642d0"><code>978bd77</code></a> chore: update generated content</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/3479febc62cc0fbcb98c7c7fc0dac778c0d79d6a"><code>3479feb</code></a> build(deps): bump tmp from 0.2.5 to 0.2.6</li> <li><a href="https://github.com/docker/setup-qemu-action/commit/b113c264143c28c2974bed61af25be32d32f4782"><code>b113c26</code></a> Merge pull request <a href="https://redirect.github.com/docker/setup-qemu-action/issues/255">#255</a> from docker/dependabot/npm_and_yarn/fast-xml-parser-5...</li> <li>Additional commits viewable in <a href="https://github.com/docker/setup-qemu-action/compare/v3...v4">compare view</a></li> </ul> </details> <br />
Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](docker/setup-buildx-action@v3...v4) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Updated README to clarify project structure and usage instructions.
Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/docker/setup-buildx-action/releases">docker/setup-buildx-action's releases</a>.</em></p> <blockquote> <h2>v4.0.0</h2> <ul> <li>Node 24 as default runtime (requires <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Actions Runner v2.327.1</a> or later) by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/483">docker/setup-buildx-action#483</a></li> <li>Remove deprecated inputs/outputs by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/464">docker/setup-buildx-action#464</a></li> <li>Switch to ESM and update config/test wiring by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/481">docker/setup-buildx-action#481</a></li> <li>Bump <code>@actions/core</code> from 1.11.1 to 3.0.0 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/475">docker/setup-buildx-action#475</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.63.0 to 0.79.0 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/482">docker/setup-buildx-action#482</a> <a href="https://redirect.github.com/docker/setup-buildx-action/pull/485">docker/setup-buildx-action#485</a></li> <li>Bump js-yaml from 4.1.0 to 4.1.1 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/452">docker/setup-buildx-action#452</a></li> <li>Bump lodash from 4.17.21 to 4.17.23 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/472">docker/setup-buildx-action#472</a></li> <li>Bump minimatch from 3.1.2 to 3.1.5 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/480">docker/setup-buildx-action#480</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-buildx-action/compare/v3.12.0...v4.0.0">https://github.com/docker/setup-buildx-action/compare/v3.12.0...v4.0.0</a></p> <h2>v3.12.0</h2> <ul> <li>Deprecate <code>install</code> input by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/455">docker/setup-buildx-action#455</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.62.1 to 0.63.0 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/434">docker/setup-buildx-action#434</a></li> <li>Bump brace-expansion from 1.1.11 to 1.1.12 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/436">docker/setup-buildx-action#436</a></li> <li>Bump form-data from 2.5.1 to 2.5.5 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/432">docker/setup-buildx-action#432</a></li> <li>Bump undici from 5.28.4 to 5.29.0 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/435">docker/setup-buildx-action#435</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-buildx-action/compare/v3.11.1...v3.12.0">https://github.com/docker/setup-buildx-action/compare/v3.11.1...v3.12.0</a></p> <h2>v3.11.1</h2> <ul> <li>Fix <code>keep-state</code> not being respected by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/429">docker/setup-buildx-action#429</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-buildx-action/compare/v3.11.0...v3.11.1">https://github.com/docker/setup-buildx-action/compare/v3.11.0...v3.11.1</a></p> <h2>v3.11.0</h2> <ul> <li>Keep BuildKit state support by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/427">docker/setup-buildx-action#427</a></li> <li>Remove aliases created when installing by default by <a href="https://github.com/hashhar"><code>@hashhar</code></a> in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/139">docker/setup-buildx-action#139</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.56.0 to 0.62.1 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/422">docker/setup-buildx-action#422</a> <a href="https://redirect.github.com/docker/setup-buildx-action/pull/425">docker/setup-buildx-action#425</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-buildx-action/compare/v3.10.0...v3.11.0">https://github.com/docker/setup-buildx-action/compare/v3.10.0...v3.11.0</a></p> <h2>v3.10.0</h2> <ul> <li>Bump <code>@docker/actions-toolkit</code> from 0.54.0 to 0.56.0 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/408">docker/setup-buildx-action#408</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-buildx-action/compare/v3.9.0...v3.10.0">https://github.com/docker/setup-buildx-action/compare/v3.9.0...v3.10.0</a></p> <h2>v3.9.0</h2> <ul> <li>Bump <code>@docker/actions-toolkit</code> from 0.48.0 to 0.54.0 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/402">docker/setup-buildx-action#402</a> <a href="https://redirect.github.com/docker/setup-buildx-action/pull/404">docker/setup-buildx-action#404</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-buildx-action/compare/v3.8.0...v3.9.0">https://github.com/docker/setup-buildx-action/compare/v3.8.0...v3.9.0</a></p> <h2>v3.8.0</h2> <ul> <li>Make cloud prefix optional to download buildx if driver is cloud by <a href="https://github.com/crazy-max"><code>@crazy-max</code></a> in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/390">docker/setup-buildx-action#390</a></li> <li>Bump <code>@actions/core</code> from 1.10.1 to 1.11.1 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/370">docker/setup-buildx-action#370</a></li> <li>Bump <code>@docker/actions-toolkit</code> from 0.39.0 to 0.48.0 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/389">docker/setup-buildx-action#389</a></li> <li>Bump cross-spawn from 7.0.3 to 7.0.6 in <a href="https://redirect.github.com/docker/setup-buildx-action/pull/382">docker/setup-buildx-action#382</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/docker/setup-buildx-action/compare/v3.7.1...v3.8.0">https://github.com/docker/setup-buildx-action/compare/v3.7.1...v3.8.0</a></p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/docker/setup-buildx-action/commit/d7f5e7f509e45cec5c76c4d5afdd7de93d0b3df5"><code>d7f5e7f</code></a> Merge pull request <a href="https://redirect.github.com/docker/setup-buildx-action/issues/489">#489</a> from docker/dependabot/npm_and_yarn/docker/actions-to...</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/92bc5c9777806d0a73d9d668ba2114fa1177f164"><code>92bc5c9</code></a> chore: update generated content</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/da11e35abee0f20cb4f1c1b7c461d37c29be52f5"><code>da11e35</code></a> build(deps): bump <code>@docker/actions-toolkit</code> from 0.79.0 to 0.90.0</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/f021e162ef95b6fba51af1c6674f537f25bce851"><code>f021e16</code></a> Merge pull request <a href="https://redirect.github.com/docker/setup-buildx-action/issues/492">#492</a> from docker/dependabot/npm_and_yarn/undici-6.24.1</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/b5af94fab700aee0c64d6077e0e34ae987815b67"><code>b5af94f</code></a> chore: update generated content</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/16ad9776a801d0c47f0a05f007b88a3789aa8ab6"><code>16ad977</code></a> build(deps): bump undici from 6.23.0 to 6.25.0</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/d7a12d7df895b33bd02a9b4bf62a12f2b9a24458"><code>d7a12d7</code></a> Merge pull request <a href="https://redirect.github.com/docker/setup-buildx-action/issues/495">#495</a> from docker/dependabot/npm_and_yarn/glob-10.5.0</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/28ff27de4eed7518d361591f2cd1dfb69c34a7cb"><code>28ff27d</code></a> build(deps): bump glob from 10.3.12 to 13.0.6</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/daf436b50e13d9053b9730cbc16516891878b019"><code>daf436b</code></a> Merge pull request <a href="https://redirect.github.com/docker/setup-buildx-action/issues/496">#496</a> from docker/dependabot/npm_and_yarn/fast-xml-parser-5...</li> <li><a href="https://github.com/docker/setup-buildx-action/commit/9725348367859764880f2f2e688a6b0c353e3f35"><code>9725348</code></a> chore: update generated content</li> <li>Additional commits viewable in <a href="https://github.com/docker/setup-buildx-action/compare/v3...v4">compare view</a></li> </ul> </details> <br />
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's releases</a>.</em></p> <blockquote> <h2>v7.0.0</h2> <h2>v7 What's new</h2> <h3>Direct Uploads</h3> <p>Adds support for uploading single files directly (unzipped). Callers can set the new <code>archive</code> parameter to <code>false</code> to skip zipping the file during upload. Right now, we only support single files. The action will fail if the glob passed resolves to multiple files. The <code>name</code> parameter is also ignored with this setting. Instead, the name of the artifact will be the name of the uploaded file.</p> <h3>ESM</h3> <p>To support new versions of the <code>@actions/*</code> packages, we've upgraded the package to ESM.</p> <h2>What's Changed</h2> <ul> <li>Add proxy integration test by <a href="https://github.com/Link"><code>@Link</code></a>- in <a href="https://redirect.github.com/actions/upload-artifact/pull/754">actions/upload-artifact#754</a></li> <li>Upgrade the module to ESM and bump dependencies by <a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/762">actions/upload-artifact#762</a></li> <li>Support direct file uploads by <a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/764">actions/upload-artifact#764</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/Link"><code>@Link</code></a>- made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/754">actions/upload-artifact#754</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v6...v7.0.0">https://github.com/actions/upload-artifact/compare/v6...v7.0.0</a></p> <h2>v6.0.0</h2> <h2>v6 - What's new</h2> <blockquote> <p>[!IMPORTANT] actions/upload-artifact@v6 now runs on Node.js 24 (<code>runs.using: node24</code>) and requires a minimum Actions Runner version of 2.327.1. If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <h3>Node.js 24</h3> <p>This release updates the runtime to Node.js 24. v5 had preliminary support for Node.js 24, however this action was by default still running on Node.js 20. Now this action by default will run on Node.js 24.</p> <h2>What's Changed</h2> <ul> <li>Upload Artifact Node 24 support by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/719">actions/upload-artifact#719</a></li> <li>fix: update <code>@actions/artifact</code> for Node.js 24 punycode deprecation by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/744">actions/upload-artifact#744</a></li> <li>prepare release v6.0.0 for Node.js 24 support by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/745">actions/upload-artifact#745</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0">https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0</a></p> <h2>v5.0.0</h2> <h2>What's Changed</h2> <p><strong>BREAKING CHANGE:</strong> this update supports Node <code>v24.x</code>. This is not a breaking change per-se but we're treating it as such.</p> <ul> <li>Update README.md by <a href="https://github.com/GhadimiR"><code>@GhadimiR</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li> <li>Update README.md by <a href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li> <li>Readme: spell out the first use of GHES by <a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li> <li>Update GHES guidance to include reference to Node 20 version by <a href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li> <li>Bump <code>@actions/artifact</code> to <code>v4.0.0</code></li> <li>Prepare <code>v5.0.0</code> by <a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/734">actions/upload-artifact#734</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/upload-artifact/commit/043fb46d1a93c77aae656e7c1c64a875d1fc6a0a"><code>043fb46</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/797">#797</a> from actions/yacaovsnc/update-dependency</li> <li><a href="https://github.com/actions/upload-artifact/commit/634250c1388765ea7ed0f053e636f1f399000b94"><code>634250c</code></a> Include changes in typespec/ts-http-runtime 0.3.5</li> <li><a href="https://github.com/actions/upload-artifact/commit/e454baaac2be505c9450e11b8f3215c6fc023ce8"><code>e454baa</code></a> Readme: bump all the example versions to v7 (<a href="https://redirect.github.com/actions/upload-artifact/issues/796">#796</a>)</li> <li><a href="https://github.com/actions/upload-artifact/commit/74fad66b98a6d799dc004d3353ccd0e6f6b2530e"><code>74fad66</code></a> Update the readme with direct upload details (<a href="https://redirect.github.com/actions/upload-artifact/issues/795">#795</a>)</li> <li><a href="https://github.com/actions/upload-artifact/commit/bbbca2ddaa5d8feaa63e36b76fdaad77386f024f"><code>bbbca2d</code></a> Support direct file uploads (<a href="https://redirect.github.com/actions/upload-artifact/issues/764">#764</a>)</li> <li><a href="https://github.com/actions/upload-artifact/commit/589182c5a4cec8920b8c1bce3e2fab1c97a02296"><code>589182c</code></a> Upgrade the module to ESM and bump dependencies (<a href="https://redirect.github.com/actions/upload-artifact/issues/762">#762</a>)</li> <li><a href="https://github.com/actions/upload-artifact/commit/47309c993abb98030a35d55ef7ff34b7fa1074b5"><code>47309c9</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/754">#754</a> from actions/Link-/add-proxy-integration-tests</li> <li><a href="https://github.com/actions/upload-artifact/commit/02a8460834e70dab0ce194c64360c59dc1475ef0"><code>02a8460</code></a> Add proxy integration test</li> <li><a href="https://github.com/actions/upload-artifact/commit/b7c566a772e6b6bfb58ed0dc250532a479d7789f"><code>b7c566a</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/745">#745</a> from actions/upload-artifact-v6-release</li> <li><a href="https://github.com/actions/upload-artifact/commit/e516bc8500aaf3d07d591fcd4ae6ab5f9c391d5b"><code>e516bc8</code></a> docs: correct description of Node.js 24 support in README</li> <li>Additional commits viewable in <a href="https://github.com/actions/upload-artifact/compare/v4...v7">compare view</a></li> </ul> </details> <br />
Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.10.4 to 2.13.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pydantic/pydantic/releases">pydantic's releases</a>.</em></p> <blockquote> <h2>v2.13.4 2026-05-06</h2> <h2>v2.13.4 (2026-05-06)</h2> <h3>What's Changed</h3> <h4>Packaging</h4> <ul> <li>Bump libc from 0.2.155 to 0.2.185 by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13109">#13109</a></li> <li>Adapt <code>pydantic-core</code> linker flags on macOS by <a href="https://github.com/washingtoneg"><code>@washingtoneg</code></a> and <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13147">#13147</a></li> </ul> <h4>Fixes</h4> <ul> <li>Preserve <code>RootModel</code> core metadata by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13129">#13129</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pydantic/pydantic/compare/v2.13.3...v2.13.4">https://github.com/pydantic/pydantic/compare/v2.13.3...v2.13.4</a></p> <h2>v2.13.3 2026-04-20</h2> <h2>v2.13.3 (2026-04-20)</h2> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Handle <code>AttributeError</code> subclasses with <code>from_attributes</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13096">#13096</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pydantic/pydantic/compare/v2.13.2...v2.13.3">https://github.com/pydantic/pydantic/compare/v2.13.2...v2.13.3</a></p> <h2>v2.13.2 2026-04-17</h2> <h2>v2.13.2 (2026-04-17)</h2> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Fix <code>ValidationInfo.field_name</code> missing with <code>model_validate_json()</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13084">#13084</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pydantic/pydantic/compare/v2.13.1...v2.13.2">https://github.com/pydantic/pydantic/compare/v2.13.1...v2.13.2</a></p> <h2>v2.13.1 2026-04-15</h2> <h2>v2.13.1 (2026-04-15)</h2> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Fix <code>ValidationInfo.data</code> missing with <code>model_validate_json()</code> by <a href="https://github.com/davidhewitt"><code>@davidhewitt</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13079">#13079</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pydantic/pydantic/compare/v2.13.0...v2.13.1">https://github.com/pydantic/pydantic/compare/v2.13.0...v2.13.1</a></p> <h2>v2.13.0 2026-04-13</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pydantic/pydantic/blob/main/HISTORY.md">pydantic's changelog</a>.</em></p> <blockquote> <h2>v2.13.4 (2026-05-06)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.13.4">GitHub release</a></p> <h3>What's Changed</h3> <h4>Packaging</h4> <ul> <li>Bump libc from 0.2.155 to 0.2.185 by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13109">#13109</a></li> <li>Adapt <code>pydantic-core</code> linker flags on macOS by <a href="https://github.com/washingtoneg"><code>@washingtoneg</code></a> and <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13147">#13147</a></li> </ul> <h4>Fixes</h4> <ul> <li>Preserve <code>RootModel</code> core metadata by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13129">#13129</a></li> </ul> <h2>v2.13.3 (2026-04-20)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.13.3">GitHub release</a></p> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Handle <code>AttributeError</code> subclasses with <code>from_attributes</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13096">#13096</a></li> </ul> <h2>v2.13.2 (2026-04-17)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.13.2">GitHub release</a></p> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Fix <code>ValidationInfo.field_name</code> missing with <code>model_validate_json()</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13084">#13084</a></li> </ul> <h2>v2.13.1 (2026-04-15)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.13.1">GitHub release</a></p> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Fix <code>ValidationInfo.data</code> missing with <code>model_validate_json()</code> by <a href="https://github.com/davidhewitt"><code>@davidhewitt</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/13079">#13079</a></li> </ul> <h2>v2.13.0 (2026-04-13)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.13.0">GitHub release</a></p> <p>The highlights of the v2.13 release are available in the <a href="https://pydantic.dev/articles/pydantic-v2-13-release">blog post</a>.</p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pydantic/pydantic/commit/cf67d4b3193c3fe43ede18612ed62785eee11382"><code>cf67d4b</code></a> Fix linting</li> <li><a href="https://github.com/pydantic/pydantic/commit/f0d8a214a5803036db46a56b1f62f1e56b81d662"><code>f0d8a21</code></a> Prepare release v2.13.4</li> <li><a href="https://github.com/pydantic/pydantic/commit/5e3fe1d41a00f441204241c66078003ae0391f9a"><code>5e3fe1d</code></a> Check for pydantic tag pattern in CI</li> <li><a href="https://github.com/pydantic/pydantic/commit/7f9edcc2a191d2eaa9751220eb910914e716a686"><code>7f9edcc</code></a> Document tagging conventions</li> <li><a href="https://github.com/pydantic/pydantic/commit/b46a0c9b8a4dd967fda8ec1a92f6437076bf262c"><code>b46a0c9</code></a> Adapt <code>pydantic-core</code> linker flags on macOS</li> <li><a href="https://github.com/pydantic/pydantic/commit/50629c851e61d887d5420452c311ec6203f1f400"><code>50629c8</code></a> Update to PyPy 7.3.22</li> <li><a href="https://github.com/pydantic/pydantic/commit/8522ebb71e5e9a6f7188af5f009f01785b8cf725"><code>8522ebb</code></a> Preserve <code>RootModel</code> core metadata</li> <li><a href="https://github.com/pydantic/pydantic/commit/a37f3aff090ca342dc5f48304889963530b993f8"><code>a37f3af</code></a> Adapt <code>MISSING</code> sentinel test to work with unreleased <code>typing_extensions</code> ver...</li> <li><a href="https://github.com/pydantic/pydantic/commit/909259a9df660518033aa686b689f045a6eaf9d2"><code>909259a</code></a> Remove Logfire example in documentation</li> <li><a href="https://github.com/pydantic/pydantic/commit/2c4174c366606fc2dc46cb806833a080aefa77df"><code>2c4174c</code></a> Bump libc from 0.2.155 to 0.2.185</li> <li>Additional commits viewable in <a href="https://github.com/pydantic/pydantic/compare/v2.10.4...v2.13.4">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details>
Bumps [redis](https://github.com/redis/redis-py) from 5.2.1 to 8.0.1. - [Release notes](https://github.com/redis/redis-py/releases) - [Changelog](https://github.com/redis/redis-py/blob/master/CHANGES) - [Commits](redis/redis-py@v5.2.1...v8.0.1) --- updated-dependencies: - dependency-name: redis dependency-version: 8.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Corrected grammar and phrasing in the project description.
Owner
Author
|
/run-checks |
|
✅ All checks passed! |
Owner
Author
|
label python |
|
✅ All checks passed! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tokenizer Upgrade: Integrated tiktoken (using the "gpt2" encoding scheme) for robust vocabulary handling and text tokenization, updating vocab_size, encode(), and decode() steps accordingly.
Hyperparameter Configuration: Standardized initial model parameters directly within the module (block_size = 12, n_embd = 64, n_head = 4, n_layer = 4) to align with downstream training constraints.