hf-model-checker is a command-line tool designed to analyze Hugging Face model URLs and recommend the most suitable quantization options based on your system's available resources. By evaluating your system's RAM and VRAM, the tool ensures optimal performance and compatibility when loading large machine learning models.
- System Resource Analysis: Detects available RAM and VRAM to determine feasible model quantizations.
- Quantization Recommendations: Suggests the best quantization method from predefined multipliers to balance performance and memory usage.
- Comprehensive Reporting: Provides detailed information about the model size, required memory, and recommended quantization in a user-friendly format.
- Supports Multiple Model Formats: Handles
.safetensors,.binand GGUF quantized models, both specific and non-specific versions.
-
Clone the Repository:
git clone https://github.com/Adversing/hf-model-checker.git cd hf-model-checker -
Set Up a Virtual Environment (Optional but Recommended):
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
Ensure you have Python 3.8 or higher installed.
pip install -r requirements.txt
quant_multipliers.json: This JSON file defines multipliers for different quantization methods, influencing the estimated RAM required for each quantization type.
-
Run the Script:
Navigate to the project directory and execute the script:
python hf_model_checker.py
-
Enter a Hugging Face Model URL:
When prompted, input the Hugging Face model URL you wish to analyze.
Enter a Hugging Face model URL (or 'exit' to quit): -
Receive Analysis:
The tool will display an analysis of the model, including size, required memory, and recommended quantization.
hf-model-checker accepts three types of Hugging Face model URLs. Each type corresponds to different model formats and quantization methods.
-
Description: URLs that directly point to a directory containing
.safetensorsor.binfiles, which are optimized tensor formats for efficient storage and loading. -
Usage Scenario: When you have a standard model file without any specific quantization applied.
-
Example URL:
https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct -
Behavior: The tool will analyze the directory files and estimate memory requirements based on the model's size.
-
Description: URLs that point to a repository containing GGUF quantized versions of models without specifying a particular quantization variant.
-
Usage Scenario: When the repository includes multiple GGUF quantization options, and you want the tool to evaluate all available quantizations.
-
Example URL:
https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF -
Behavior: The tool scans the repository for all GGUF files, evaluates each quantization based on system resources, and recommends the most suitable quantization method.
-
Description: URLs that point directly to a specific GGUF quantized model file.
-
Usage Scenario: When you have a particular GGUF quantization variant in mind and want to verify its compatibility with your system.
-
Example URL:
https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF/blob/main/Llama-3.3-70B-Instruct-Q4_K_M.gguf -
Behavior: The tool analyzes the specified GGUF file, estimates the required memory, and indicates whether your system can efficiently handle the quantized model.
Contributions are welcome! Please follow these steps:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeature
-
Commit Your Changes
git commit -m "Add your feature" -
Push to the Branch
git push origin feature/YourFeature
-
Open a Pull Request
This project is licensed under the MIT License.
