Add support for HuggingFace GGUF models in Ollama by FeelTheFonk · Pull Request #3 · PlanExeOrg/PlanExe

FeelTheFonk · 2025-02-24T09:25:33Z

Add support for HuggingFace GGUF models in Ollama

Description

This PR adds support for running GGUF models directly from HuggingFace through Ollama using the hf.co/ prefix in model configurations. This enables users to leverage a wider range of models without requiring manual model installation.

Changes

Updated OllamaInfo.is_model_available() to recognize HuggingFace GGUF model paths
Added documentation and examples for using GGUF models
Added test case for GGUF model path validation

Example Configuration

{
    "lmstudio-qwen2.5-7b-instruct-1m-gguf": {
        "comment": "This runs on your own computer via Ollama using GGUF models from HuggingFace.",
        "class": "Ollama",
        "arguments": {
            "model": "hf.co/lmstudio-community/Qwen2.5-7B-Instruct-1M-GGUF:Q6_K",
            "temperature": 0.5,
            "request_timeout": 120.0,
            "is_function_calling_model": false
        }
    }
}

Updated OllamaInfo.is_model_available() to recognize HuggingFace GGUF model paths

neoneye · 2025-02-24T20:08:28Z

I'm curious to your setup. I imagine you have uploaded your SSH key to Hugging Face. And when you run the model, the inference happens on Hugging Face?

Can you talk on PlanExe Discord?

FeelTheFonk · 2025-02-25T07:18:43Z

Hey! To download models from Hugging Face, you simply need a valid HF_TOKEN set as an environment variable locally. There's no need to use huggingface_cli or any other pip library—model inference is handled locally via Ollama rather than on Hugging Face’s servers.

I'll try to join your discord server when I can, thank you for your attention!

neoneye · 2025-02-25T16:40:22Z

I can't make sense of the find_model.startswith("hf.co/"), why is that needed?

I have tried recreating your scenario, using hf.co and I can run models locally. I'm unable to runs models on HF?

I was unaware that ollama could fetch GGUFs directly from HF, thanks. I have updated the docs roughly describing how to fetch GGUF models. Let me know if the docs can be improved further.
https://github.com/neoneye/PlanExe/blob/main/extra/ollama.md

FeelTheFonk · 2025-02-25T18:21:17Z

The find_model.startswith("hf.co/") check is needed to distinguish between standard Ollama models and models that should be downloaded directly from HuggingFace. This prefix tells your system to fetch the GGUF model from HuggingFace instead of looking for it in the local Ollama repository.

I think there's a misunderstanding in your documentation. When using HuggingFace GGUF models with the hf.co/ prefix, you must specify a specific quantization version (like :Q4_K_M, :Q6_K, etc.) and cannot use :latest. The :latest syntax only works with standard Ollama models, not with HuggingFace GGUF models.

For example:

Correct: hf.co/unsloth/Llama-3.1-Tulu-3-8B-GGUF:Q4_K_M (or any other versions/LLM)
Incorrect: hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:latest

The confusion might be because your ollama list output shows a model with :latest, but this likely won't work properly when actually trying to run inference. For HuggingFace GGUF models, you need to specify the exact quantization version you want to use.

To clarify the core functionality: Ollama downloads the model file from HuggingFace and then runs inference locally on the user's machine, not on HuggingFace's servers.

neoneye · 2025-02-26T17:46:31Z

I would like to talk with you on Discord about your fix.

In particular why return True when it's a huggingface model?
Is it because you do inference on huggingfaces server?

I have updated the docs with your recommendations. Thank you.
https://github.com/neoneye/PlanExe/blob/main/extra/ollama.md

…d instead of a question mark (e.g., #3 “Does the plan use excessive buzzwords without evidence of knowledge.”; #4 “Does this plan grossly underestimate risks.”).

Phase 1 (Critical Security): - Fix SECRET_KEY validation to detect both 'your-secret-key' AND 'dev-secret-key' defaults - Fail hard in production (when FLASK_ENV=production or PLANEXE_PUBLIC_BASE_URL set) - Add session cookie security flags (SECURE, HTTPONLY, SAMESITE=Lax) - Update .env examples with SECRET_KEY generation command Phase 2 (Error Handling & UX): - Wrap OAuth callback in try/except for better error handling - Add profile field validation with clear error messages - Log warning when OAuth profile missing email - Update login.html to display error messages Addresses Issues #1, #3, #5, PlanExeOrg#6, PlanExeOrg#7 from OAUTH_ANALYSIS.md

…hip-set Updates two docs to reflect the post-PlanExeOrg#753 state of the napkin-math pipeline. methology.md: describe the current pipeline behaviour — two-batch compress with paraphrase-tolerant quote match and cross-bucket promoter; extract's source-arithmetic preservation, threshold-pairing, and dropped_signals field; 19-check validator (added aggregate_not_bounded, requirement_has_margin, dropped_signals_schema); bounds' asymmetric source label on commitment defaults, calculation-output strip, reserved correlations block, reserved lognormal/pert disciplines with loud NotImplementedError; advisory audit_source_preservation.py step. 20260520_plan.md → 20260522_plan.md: bump status date; mark PR PlanExeOrg#750 merged; add PR PlanExeOrg#751/PlanExeOrg#752/PlanExeOrg#753 entries (proposal 141 implementation); update Phase status table (added 4.5 audit row, reclassified Phase 8 as partially done, Phase 10 marked done for current ship-set); add v58 14-plan empirical snapshot (1 viable / 5 fragile / 8 doom); reorder Next likely move now that proposal 141 has shipped — Phase 5 citation verifier promoted to PlanExeOrg#1, Phase 8 samplers added as PlanExeOrg#2 with v58 cases that bite now, Phase 9 composite-band cap as PlanExeOrg#3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add support for HuggingFace GGUF models in Ollama

8987a39

Updated OllamaInfo.is_model_available() to recognize HuggingFace GGUF model paths

neoneye merged commit 21aa826 into PlanExeOrg:main Feb 26, 2025

82deutschmark mentioned this pull request Feb 8, 2026

OAuth Security Fixes: SECRET_KEY validation, session cookies, error handling #16

Merged

neoneye mentioned this pull request May 21, 2026

docs(napkin-math): refresh methodology + plan status for 2026-05-22 ship-set #754

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for HuggingFace GGUF models in Ollama#3

Add support for HuggingFace GGUF models in Ollama#3
neoneye merged 1 commit into
PlanExeOrg:mainfrom
FeelTheFonk:feature/gguf_support

FeelTheFonk commented Feb 24, 2025

Uh oh!

neoneye commented Feb 24, 2025 •

edited

Loading

Uh oh!

FeelTheFonk commented Feb 25, 2025

Uh oh!

neoneye commented Feb 25, 2025

Uh oh!

FeelTheFonk commented Feb 25, 2025

Uh oh!

neoneye commented Feb 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FeelTheFonk commented Feb 24, 2025

Add support for HuggingFace GGUF models in Ollama

Description

Changes

Example Configuration

Uh oh!

neoneye commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FeelTheFonk commented Feb 25, 2025

Uh oh!

neoneye commented Feb 25, 2025

Uh oh!

FeelTheFonk commented Feb 25, 2025

Uh oh!

neoneye commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neoneye commented Feb 24, 2025 •

edited

Loading

neoneye commented Feb 26, 2025 •

edited

Loading