[Feature]: Supporting different quantization bit sizes for the same model on Mac and GPU

### Is your feature request related to a problem?

This may result in the graphics memory not being fully utilized，And there is a problem with the Qwen3-next-fp8 model.

### Describe the Solution you'd like

for mac & gpu using different params bytes

### Alternatives Considered (Optional)

_No response_

### Additional Context (Optional)

_No response_