Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Nov 22, 2025

  1. Add HF compatible hf_select_quant_linear_v2 api
  2. Separate AWQ GEMM kernel into GEMM_TORCH GEMM_CUDA GEMM_TRITON

pytest.skip(f"Triton backend is incompatible: {err}")
return torch.matmul(x, weight.to(x.dtype))

def run_fused_gemm():
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@Qubitium Qubitium marked this pull request as draft November 24, 2025 06:39
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
f"{duration:.3f}",
)

linear_layer = linear_layer.cpu()
Comment on lines +148 to +152
# if self.padded_infeatures != self.in_features:
# self.qweight.resize_(self.padded_infeatures // self.pack_dtype_bits * self.bits, self.out_features)
# self.qzeros.resize_(
# math.ceil(self.padded_infeatures / self.group_size),
# self.out_features // self.pack_dtype_bits * self.bits
from ...utils.logger import setup_logger


log = setup_logger()
log = setup_logger()

awq_ext, msg = try_import("gptqmodel_awq_kernels")
user_has_been_warned = False
Qubitium and others added 2 commits November 25, 2025 02:39
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants