# Debugging GPU LLM Loading and Summarization with GGUF Models

This notebook helps you debug GPU loading issues for GGUF models using llama-cpp-python, and demonstrates summarizing a .txt file using a custom summarize function. The workflow is organized in a 'debugging' folder with 'input', 'output', and 'src' subfolders.

In [None]:
# Section 1: Setup Debugging Folder Structure
import os
from pathlib import Path

base_dir = Path.cwd() / 'debugging'
input_dir = base_dir / 'input'
output_dir = base_dir / 'output'
src_dir = base_dir / 'src'

for d in [input_dir, output_dir, src_dir]:
    d.mkdir(parents=True, exist_ok=True)

print(f"Created folders:\n- {input_dir}\n- {output_dir}\n- {src_dir}")

In [None]:
# Section 2: Import Required Libraries
import os
from pathlib import Path
try:
    from llama_cpp import Llama
except ImportError:
    print("llama_cpp not installed. Please install with the correct CUDA wheel.")

In [None]:
# Section 3: Summarize Function Implementation (src/summarize.py)
def summarize_text(text, model_path, llm_model, n_ctx=4096, n_gpu_layers=48):
    """
    Summarize text using a GGUF LLM model with llama-cpp-python.
    Args:
        text (str): The input text to summarize.
        model_path (str): Path to the directory containing the model.
        llm_model (str): Filename of the GGUF model.
        n_ctx (int): Context window size.
        n_gpu_layers (int): Number of layers to run on GPU.
    Returns:
        str: The summary or error message.
    """
    import os
    try:
        from llama_cpp import Llama
        model_file = os.path.join(model_path, llm_model)
        llm = Llama(model_path=model_file, n_ctx=n_ctx, n_gpu_layers=n_gpu_layers)
        prompt = f"Resumí el siguiente texto en español de forma concisa y clara, usando viñetas si es posible.\n\nTexto:\n{text[:2000]}\n\nResumen:"
        output = llm(prompt, max_tokens=256)
        result = output['choices'][0]['text'].strip()
        if not result:
            print('[DEBUG] LLM returned empty summary. Full output:', output)
            return '[LLAMA ERROR] LLM returned empty summary.'
        print('[DEBUG] LLM summary:', result)
        return result
    except Exception as e:
        print(f'[DEBUG] Exception in LLM summarization: {e}')
        return f"[LLAMA ERROR] {e}\n{text[:200]}..."

In [None]:
# Section 4: Load and Display Input Text File
input_files = list(input_dir.glob('*.txt'))
if not input_files:
    print(f"No .txt files found in {input_dir}. Please add a file to summarize.")
else:
    input_file = input_files[0]
    with open(input_file, 'r', encoding='utf-8') as f:
        input_text = f.read()
    print(f"Loaded file: {input_file.name}\n---\n{input_text[:500]}\n{'...' if len(input_text) > 500 else ''}")

In [None]:
# Section 5: Run Summarization with GGUF Model
# Set your GGUF model filename and model path here
llm_model = 'your_model.gguf'  # Change to your actual model filename
model_path = str(Path('models').absolute())  # Adjust if your models are elsewhere

if not input_files:
    summary = None
    print("No input file to summarize.")
else:
    summary = summarize_text(input_text, model_path, llm_model, n_ctx=4096, n_gpu_layers=48)
    print(f"Summary:\n{summary}")

In [None]:
# Section 6: Save Summary to Output Folder
if summary:
    output_file = output_dir / f"summary_{input_file.stem}.txt"
    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(summary)
    print(f"Summary saved to: {output_file}")
else:
    print("No summary to save.")

# Troubleshooting and Checklist

**If you encounter issues with GPU usage or summarization, review the following:**

- Ensure you have an NVIDIA GPU and the correct CUDA Toolkit installed.
- Install llama-cpp-python with the CUDA-enabled wheel for your CUDA version.
- Set `n_gpu_layers` to a value that fits your VRAM (e.g., 40–50 for 8GB with a 7B model).
- Watch the logs for lines like `llama.cpp: loading model on GPU: ... layers`.
- If you see only CPU messages, double-check your CUDA version and llama-cpp-python installation.
- For DLL errors, ensure CUDA's `bin` directory is in your PATH and reboot if needed.
- For more help, see: https://github.com/abetlen/llama-cpp-python/discussions/1587

## Checklist
- [ ] NVIDIA GPU detected (`nvidia-smi`)
- [ ] CUDA Toolkit installed and in PATH
- [ ] Correct llama-cpp-python CUDA wheel installed
- [ ] `n_gpu_layers` set in code
- [ ] Logs show "loading model on GPU"
