<a href="https://colab.research.google.com/github/Alyxx-The-Sniper/CNN/blob/main/my_gemma.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).

Read our **[Gemma 3N Guide](https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [3]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" huggingface_hub hf_transfer
    !pip install --no-deps unsloth

In [4]:
%%capture
# Install latest transformers for Gemma 3N
!pip install --no-deps --upgrade timm # Only for Gemma 3N

### Unsloth

`FastModel` supports loading nearly any model now! This includes Vision and Text models!

In [5]:
from unsloth import FastModel
import torch

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3n-E4B-it",
    dtype = None, # None for auto detection
    max_seq_length = 4096, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)


Please restructure your imports with 'import unsloth' at the top of your file.
  from unsloth import FastModel


NotImplementedError: Unsloth currently only works on NVIDIA GPUs and Intel GPUs.

# Start Here

In [3]:
# !pip install --upgrade gradio

In [2]:
# ===============  Gradio demo ===============
import gradio as gr, os, json, datetime, torch
from transformers import TextStreamer

# ------------------------------------------------------------------
# Helper: run Gemma-3n once and return the string (not stream)
# ------------------------------------------------------------------
@torch.no_grad()
def run_gemma(messages, max_new_tokens=1024):
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    ).to("cuda")
    gen_ids = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=1.0,
        top_p=0.95,
        top_k=64,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    # Decode only the new tokens
    new_ids = gen_ids[:, inputs['input_ids'].shape[1]:]
    return tokenizer.decode(new_ids[0], skip_special_tokens=True).strip()

# ------------------------------------------------------------------
# Core workflow steps
# ------------------------------------------------------------------
def step_transcribe(audio):
    """
    audio can be a string (path) or a tempfile object.
    Coerce to str before sending to the model.
    """
    if audio is None:
        raise gr.Error("No audio provided or unclear audio.")

    # Gradio >= 4: audio can be a tuple (path, samplerate) -> take first element
    if isinstance(audio, tuple):
        audio_path = str(audio[0])
    else:
        audio_path = str(audio)

    # -- Transcribe --
    transcript = run_gemma([
        {"role": "user",
         "content": [
             {"type": "audio", "audio": audio_path},
             {"type": "text",  "text": "Provide an accurate transcript of the audio."}
         ]}
    ], max_new_tokens=2048)

    # -- Summarise --
    summary = run_gemma([
        {"role": "user",
         "content": [
             {"type": "text", "text": f"Transcript:\n{transcript}\n\nSummarise it concisely."}
         ]}
    ], max_new_tokens=2048)

    return transcript, summary, gr.update(visible=True), gr.update(visible=False)




def step_approve(choice, transcript, summary, feedback):
    """Handle approve / reject / regenerate."""
    feedback = feedback or ""  # <-- NEW: guard against None

    if choice == "✅ Approve":
        # save
        stamp = datetime.datetime.utcnow().isoformat(timespec="seconds")
        fname = f"saved_{stamp}.json".replace(":", "-")
        with open(fname, "w", encoding="utf-8") as f:
            json.dump({"transcript": transcript, "summary": summary}, f, ensure_ascii=False, indent=2)
        return (
            gr.update(visible=False),  # approval row
            gr.update(visible=False),  # feedback row
            f"Saved to `{fname}` ✅"
        )
    else:  # Rejected
        if not feedback.strip():
            return (
                gr.update(visible=True),
                gr.update(visible=True),
                "Please provide feedback first."
            )
        # regenerate
        new_summary = run_gemma([
            {"role": "user",
             "content": [
                 {"type": "text", "text": f"Transcript:\n{transcript}\n\nPrevious summary was rejected. Summarize again with regards to this: {feedback}"}
             ]}
        ], max_new_tokens=2048)
        return (
            gr.update(visible=True),   # keep approval block visible
            gr.update(visible=True),  # hide feedback box again
            new_summary
        )

# ------------------------------------------------------------------
# Gradio UI
# ------------------------------------------------------------------


# ==================== Voice-to-Summary with Approval ====================
import gradio as gr, os, json, datetime, torch
from transformers import TextStreamer

# ------------------------------------------------------------------
# Helper: run Gemma-3n and return the decoded string
# ------------------------------------------------------------------
@torch.no_grad()
def run_gemma(messages, max_new_tokens=1024):
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    ).to("cuda")
    gen_ids = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=1.0,
        top_p=0.95,
        top_k=64,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    new_ids = gen_ids[:, inputs['input_ids'].shape[1]:]
    return tokenizer.decode(new_ids[0], skip_special_tokens=True).strip()


# ------------------------------------------------------------------
# Step 1: Transcribe + Summarize
# ------------------------------------------------------------------
def step_transcribe(audio_path):
    if audio_path is None:
        raise gr.Error("No audio provided. Please record or upload an audio file.")

    # No need to check for type, Gradio gives us a path string
    print(f"Processing audio file at: {audio_path}")

    # Now proceed with transcription
    transcript = run_gemma([
        {"role": "user",
         "content": [
             {"type": "audio", "audio": audio_path},
             {"type": "text", "text": "Provide an accurate transcript of the audio."}
         ]}
    ], max_new_tokens=2048)

    summary = run_gemma([
        {"role": "user",
         "content": [
             {"type": "text", "text": f"Transcript:\n{transcript}\n\nSummarise it concisely."}
         ]}
    ], max_new_tokens=2048)

    # Make the approval buttons visible and hide the feedback box
    return transcript, summary, gr.update(visible=True), gr.update(visible=False)



# ------------------------------------------------------------------
# Step 2/3: Approve or Reject/Improve
# ------------------------------------------------------------------
def step_approve(choice, transcript, summary, feedback):
    feedback = feedback or ""

    if choice == "✅ Approve":
        stamp = datetime.datetime.utcnow().isoformat(timespec="seconds").replace(":", "-")
        fname = f"saved_{stamp}.json"
        with open(fname, "w", encoding="utf-8") as f:
            json.dump({"transcript": transcript, "summary": summary}, f, ensure_ascii=False, indent=2)
        return gr.update(visible=False), gr.update(visible=False), f"Saved to `{fname}` ✅"

    # Rejected but no feedback
    if not feedback.strip():
        return gr.update(visible=True), gr.update(visible=True), "Please provide feedback first."

    # Rejected with feedback: regenerate summary
    new_summary = run_gemma([
        {"role": "user",
         "content": [
             {"type": "text", "text": f"Transcript:\n{transcript}\n\nPrevious summary was rejected. Summarize again with regards to this: {feedback}"}
         ]}
    ], max_new_tokens=2048)

    return new_summary, gr.update(visible=True), new_summary


# ------------------------------------------------------------------
# Gradio UI
# ------------------------------------------------------------------
with gr.Blocks(title="Voice-to-Summary with Approval") as demo:
    gr.Markdown("# 🎙️ Voice to Summary with Human-in-the-loop Approval")

    audio_in = gr.Audio(sources=["upload", "microphone"], type="filepath", label="Upload or record audio")

    btn_run = gr.Button("🚀 Transcribe & Summarise", variant="primary")

    transcript_box = gr.Textbox(label="Transcript", lines=10, interactive=False)
    summary_box = gr.Textbox(label="Summary", lines=5, interactive=True)
    summary_state = gr.State()

    with gr.Row(visible=True) as approval_row:
        approve_btn = gr.Button("✅ Approve")
        reject_btn = gr.Button("❌ Reject / Improve")

    feedback_box = gr.Textbox(visible=False, label="Feedback / Instructions for new summary", lines=3)
    status = gr.Textbox(label="Status", interactive=False)

    # STEP 1: Transcribe & summarize
    def first_run(audio):
        transcript, summary, *ui = step_transcribe(audio)
        return transcript, summary, *ui, summary

    btn_run.click(
        fn=first_run,
        inputs=audio_in,
        outputs=[transcript_box, summary_box, approval_row, feedback_box, summary_state]
    )

    # STEP 2: Approve
    approve_btn.click(
        fn=lambda t, s: step_approve("✅ Approve", t, s, ""),
        inputs=[transcript_box, summary_box],
        outputs=[approval_row, feedback_box, status]
    )

    # STEP 3: Reject & regenerate
    def reject_and_regen(t, s_old, f):
        new_summary, feedback_vis, status_txt = step_approve("❌ Reject / Improve", t, s_old, f)
        return new_summary, feedback_vis, status_txt, new_summary

    reject_btn.click(
        fn=reject_and_regen,
        inputs=[transcript_box, summary_state, feedback_box],
        outputs=[summary_box, feedback_box, status, summary_state]
    )

demo.queue().launch(debug=True)





It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://97c8bf55d4eaee0bac.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Processing audio file at: /tmp/gradio/ea8170c1fdcfb128497c28e517695cc07c3b14e04f13e1839f2e1b003736dcd3/audio2.mp3


Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/gradio/queueing.py", line 626, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gradio/route_utils.py", line 350, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gradio/blocks.py", line 2235, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gradio/blocks.py", line 1746, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^

Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://97c8bf55d4eaee0bac.gradio.live


