<a href="https://colab.research.google.com/github/InferenceIllusionist/MilkDropLM-Lite/blob/main/MilkDropLM_7b_Lite_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# MilKDropLM Version: 7b-0.3 @ Q6_K
# Lite (Notebook) Version: 1.1b - New Gradio front-end
# To Run: Go to Runtime -> Run All, then find Gradio UI or URL at bottom of notebook

# Install required packages
%pip install https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.4-cu124/llama_cpp_python-0.3.4-cp310-cp310-linux_x86_64.whl
%pip install gradio

import gradio as gr
import torch
from llama_cpp import Llama
import time

In [2]:
# Initialize the model globally to avoid reloading
llm = Llama.from_pretrained(
    repo_id="Quant-Cartel/MilkDropLM-7b-v0.3-GGUF",
    filename="*Q6_K.gguf",
    n_ctx=16384,
    n_gpu_layers=-1,
    max_tokens=8192,
    temperature=0.3
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


MilkDropLM-v0.3C-Q5_K_S.gguf:   0%|          | 0.00/5.32G [00:00<?, ?B/s]

llama_model_loader: loaded meta data with 31 key-value pairs and 339 tensors from /root/.cache/huggingface/hub/models--Quant-Cartel--MilkDropLM-7b-v0.3-GGUF/snapshots/ed3d5a7e561418a3b4e86a6f73ff41db946a9893/./MilkDropLM-v0.3C-Q5_K_S.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
llama_model_loader: - kv   3:                       general.organization str              = Unsloth
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Qwen2.5-Coder
llama_model_loa

In [3]:
# Instantiate Generation with Samplers
def generate_preset(
    prompt,
    temperature=0.3,
    top_p=0.8,
    top_k=20,
    min_p=0.05,
    repeat_penalty=1.05,
    max_tokens=8192
):
    try:
        # Add a default prefix if user doesn't provide one
        if not "preset" in prompt.lower():
            prompt = f"Make me a {prompt} milkdrop preset."

        # Start time for generation
        start_time = time.time()

        # Generate response
        response = llm(
            prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            min_p=min_p,
            repeat_penalty=repeat_penalty,

        )

        # Calculate generation time
        generation_time = time.time() - start_time

        # Format the response
        formatted_response = f"""
Generation Time: {generation_time:.2f} seconds

Generated Preset:
{response['choices'][0]['text']}
"""
        return formatted_response

    except Exception as e:
        return f"An error occurred: {str(e)}"


In [4]:
# Create the Gradio interface
with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("""
    # 🎵 MilkDropLM Preset Generator Lite
    Generate custom Milkdrop presets using natural language prompts.
    """)

    with gr.Row():
        with gr.Column():
            prompt = gr.Textbox(
                label="Your Preset Description",
                placeholder="e.g., Glowsticks Mirror",
                lines=3
            )

            with gr.Accordion("Advanced Settings", open=False):
                temperature = gr.Slider(
                    minimum=0.1,
                    maximum=1.0,
                    value=0.3,
                    step=0.1,
                    label="Temperature"
                )
                top_p = gr.Slider(
                    minimum=0.1,
                    maximum=1.0,
                    value=0.8,
                    step=0.1,
                    label="Top P"
                )
                top_k = gr.Slider(
                    minimum=1,
                    maximum=100,
                    value=20,
                    step=1,
                    label="Top K"
                )
                min_p = gr.Slider(
                    minimum=0.01,
                    maximum=0.2,
                    value=0.05,
                    step=0.01,
                    label="Min P"
                )
                repeat_penalty = gr.Slider(
                    minimum=1.0,
                    maximum=2.0,
                    value=1.05,
                    step=0.05,
                    label="Repeat Penalty"
                )

            generate_btn = gr.Button("Generate Preset", variant="primary")

        with gr.Column():
            output = gr.Textbox(
                label="Generated Preset",
                lines=20,
                show_copy_button=True
            )

    # Handle the generation
    generate_btn.click(
        fn=generate_preset,
        inputs=[
            prompt,
            temperature,
            top_p,
            top_k,
            min_p,
            repeat_penalty
        ],
        outputs=output
    )

    gr.Markdown("""
    ### Tips for Better Results:
    - Please be patient, generation may take up to 5-6 minutes per preset on a free colab T4
    - Try to prompt for categories suggested on the [model card](https://huggingface.co/Quant-Cartel/MilkDropLM-7b-v0.3-GGUF#text-prompt-suggestions)
    - Mention colors, shapes, and movement patterns
    """)

In [None]:
demo.launch(share=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://15924ae5a3695ef11b.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)



llama_print_timings:        load time =     537.65 ms
llama_print_timings:      sample time =   13591.43 ms /  5815 runs   (    2.34 ms per token,   427.84 tokens per second)
llama_print_timings: prompt eval time =     537.49 ms /     6 tokens (   89.58 ms per token,    11.16 tokens per second)
llama_print_timings:        eval time =  190115.75 ms /  5814 runs   (   32.70 ms per token,    30.58 tokens per second)
llama_print_timings:       total time =  249583.26 ms /  5820 tokens
