Skip to content

Add support for HuggingFace GGUF models in Ollama#3

Merged
neoneye merged 1 commit into
PlanExeOrg:mainfrom
FeelTheFonk:feature/gguf_support
Feb 26, 2025
Merged

Add support for HuggingFace GGUF models in Ollama#3
neoneye merged 1 commit into
PlanExeOrg:mainfrom
FeelTheFonk:feature/gguf_support

Conversation

@FeelTheFonk
Copy link
Copy Markdown
Contributor

Add support for HuggingFace GGUF models in Ollama

Description

This PR adds support for running GGUF models directly from HuggingFace through Ollama using the hf.co/ prefix in model configurations. This enables users to leverage a wider range of models without requiring manual model installation.

Changes

  • Updated OllamaInfo.is_model_available() to recognize HuggingFace GGUF model paths
  • Added documentation and examples for using GGUF models
  • Added test case for GGUF model path validation

Example Configuration

{
    "lmstudio-qwen2.5-7b-instruct-1m-gguf": {
        "comment": "This runs on your own computer via Ollama using GGUF models from HuggingFace.",
        "class": "Ollama",
        "arguments": {
            "model": "hf.co/lmstudio-community/Qwen2.5-7B-Instruct-1M-GGUF:Q6_K",
            "temperature": 0.5,
            "request_timeout": 120.0,
            "is_function_calling_model": false
        }
    }
}

Updated OllamaInfo.is_model_available() to recognize HuggingFace GGUF model paths
@neoneye
Copy link
Copy Markdown
Member

neoneye commented Feb 24, 2025

I'm curious to your setup. I imagine you have uploaded your SSH key to Hugging Face. And when you run the model, the inference happens on Hugging Face?

Can you talk on PlanExe Discord?

@FeelTheFonk
Copy link
Copy Markdown
Contributor Author

Hey! To download models from Hugging Face, you simply need a valid HF_TOKEN set as an environment variable locally. There's no need to use huggingface_cli or any other pip library—model inference is handled locally via Ollama rather than on Hugging Face’s servers.

I'll try to join your discord server when I can, thank you for your attention!

@neoneye
Copy link
Copy Markdown
Member

neoneye commented Feb 25, 2025

I can't make sense of the find_model.startswith("hf.co/"), why is that needed?

I have tried recreating your scenario, using hf.co and I can run models locally. I'm unable to runs models on HF?

I was unaware that ollama could fetch GGUFs directly from HF, thanks. I have updated the docs roughly describing how to fetch GGUF models. Let me know if the docs can be improved further.
https://github.com/neoneye/PlanExe/blob/main/extra/ollama.md

@FeelTheFonk
Copy link
Copy Markdown
Contributor Author

The find_model.startswith("hf.co/") check is needed to distinguish between standard Ollama models and models that should be downloaded directly from HuggingFace. This prefix tells your system to fetch the GGUF model from HuggingFace instead of looking for it in the local Ollama repository.

I think there's a misunderstanding in your documentation. When using HuggingFace GGUF models with the hf.co/ prefix, you must specify a specific quantization version (like :Q4_K_M, :Q6_K, etc.) and cannot use :latest. The :latest syntax only works with standard Ollama models, not with HuggingFace GGUF models.

For example:

  • Correct: hf.co/unsloth/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M (or any other versions/LLM)
  • Incorrect: hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:latest

The confusion might be because your ollama list output shows a model with :latest, but this likely won't work properly when actually trying to run inference. For HuggingFace GGUF models, you need to specify the exact quantization version you want to use.

To clarify the core functionality: Ollama downloads the model file from HuggingFace and then runs inference locally on the user's machine, not on HuggingFace's servers.

@neoneye neoneye merged commit 21aa826 into PlanExeOrg:main Feb 26, 2025
@neoneye
Copy link
Copy Markdown
Member

neoneye commented Feb 26, 2025

I would like to talk with you on Discord about your fix.

In particular why return True when it's a huggingface model?
Is it because you do inference on huggingfaces server?

I have updated the docs with your recommendations. Thank you.
https://github.com/neoneye/PlanExe/blob/main/extra/ollama.md

neoneye added a commit that referenced this pull request Oct 31, 2025
…d instead of a question mark (e.g., #3 “Does the plan use excessive buzzwords without evidence of knowledge.”; #4 “Does this plan grossly underestimate risks.”).
82deutschmark referenced this pull request in VoynichLabs/PlanExe2026 Feb 8, 2026
Phase 1 (Critical Security):
- Fix SECRET_KEY validation to detect both 'your-secret-key' AND 'dev-secret-key' defaults
- Fail hard in production (when FLASK_ENV=production or PLANEXE_PUBLIC_BASE_URL set)
- Add session cookie security flags (SECURE, HTTPONLY, SAMESITE=Lax)
- Update .env examples with SECRET_KEY generation command

Phase 2 (Error Handling & UX):
- Wrap OAuth callback in try/except for better error handling
- Add profile field validation with clear error messages
- Log warning when OAuth profile missing email
- Update login.html to display error messages

Addresses Issues #1, #3, #5, PlanExeOrg#6, PlanExeOrg#7 from OAUTH_ANALYSIS.md
82deutschmark referenced this pull request in VoynichLabs/PlanExe2026 Feb 8, 2026
Phase 1 (Critical Security):
- Fix SECRET_KEY validation to detect both 'your-secret-key' AND 'dev-secret-key' defaults
- Fail hard in production (when FLASK_ENV=production or PLANEXE_PUBLIC_BASE_URL set)
- Add session cookie security flags (SECURE, HTTPONLY, SAMESITE=Lax)
- Update .env examples with SECRET_KEY generation command

Phase 2 (Error Handling & UX):
- Wrap OAuth callback in try/except for better error handling
- Add profile field validation with clear error messages
- Log warning when OAuth profile missing email
- Update login.html to display error messages

Addresses Issues #1, #3, #5, PlanExeOrg#6, PlanExeOrg#7 from OAUTH_ANALYSIS.md
huangyingting pushed a commit to repomesh/PlanExe that referenced this pull request May 22, 2026
…hip-set

Updates two docs to reflect the post-PlanExeOrg#753 state of the napkin-math pipeline.

methology.md: describe the current pipeline behaviour — two-batch compress with paraphrase-tolerant quote match and cross-bucket promoter; extract's source-arithmetic preservation, threshold-pairing, and dropped_signals field; 19-check validator (added aggregate_not_bounded, requirement_has_margin, dropped_signals_schema); bounds' asymmetric source label on commitment defaults, calculation-output strip, reserved correlations block, reserved lognormal/pert disciplines with loud NotImplementedError; advisory audit_source_preservation.py step.

20260520_plan.md → 20260522_plan.md: bump status date; mark PR PlanExeOrg#750 merged; add PR PlanExeOrg#751/PlanExeOrg#752/PlanExeOrg#753 entries (proposal 141 implementation); update Phase status table (added 4.5 audit row, reclassified Phase 8 as partially done, Phase 10 marked done for current ship-set); add v58 14-plan empirical snapshot (1 viable / 5 fragile / 8 doom); reorder Next likely move now that proposal 141 has shipped — Phase 5 citation verifier promoted to PlanExeOrg#1, Phase 8 samplers added as PlanExeOrg#2 with v58 cases that bite now, Phase 9 composite-band cap as PlanExeOrg#3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants