<a href="https://colab.research.google.com/github/NShravanReddy/SatvickRecipe/blob/master/satvick_recipe_model_gradio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install gradio
!pip install bitsandbytes>=0.39.0
!pip install --upgrade accelerate transformers

Collecting gradio
  Downloading gradio-5.23.3-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 

In [2]:
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

# Load the model and tokenizer
model_name = "Nadiveedishravanreddy/satvick_lora_model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Define the prediction function
def get_preparation_steps(recipe_name):
    """
    Generate preparation steps for a given recipe name.
    """
    # Format the input prompt
    prompt = f"Recipe: {recipe_name}\nPreparation Steps:"

    # Tokenize the input
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    # Stream text generation
    streamer = TextStreamer(tokenizer)
    print("Generating response...")
    _ = model.generate(**inputs, streamer=streamer, max_new_tokens=128)

    # Generate and decode the response
    outputs = model.generate(inputs.input_ids, max_length=256, num_return_sequences=1)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response[len(prompt):].strip()  # Exclude the prompt from the output

# Create the Gradio Interface
interface = gr.Interface(
    fn=get_preparation_steps,
    inputs=gr.Textbox(label="Enter Recipe Name", placeholder="E.g., Ragi Idiyappam"),
    outputs=gr.Textbox(label="Preparation Steps"),
    title="Recipe Chatbot",
    description="Enter a recipe name to get the preparation steps generated by the Satvick Lora Model.",
    theme="compact"
)

# Launch the interface
if __name__ == "__main__":
    interface.launch()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/235 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]


Sorry, we can't find the page you are looking for.


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://0d6dc02d8611895213.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


In [11]:
import nbformat
from google.colab import files

# Download and reupload workaround
nb_name = 'satvick_recipe_model_gradio.ipynb'

nb = nbformat.v4.new_notebook()
nb['cells'] = []
nb['metadata'] = {}

with open(nb_name, 'w') as f:
    nbformat.write(nb, f)


In [None]:
# Count the number of parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Total Parameters: {total_params:,}")
print(f"Trainable Parameters: {trainable_params:,}")

Total Parameters: 4,582,543,360
Trainable Parameters: 0


In [None]:
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

# Load the model and tokenizer on CPU
model_name = "Nadiveedishravanreddy/satvick_lora_model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cpu")  # Explicitly specify CPU

def get_preparation_steps(recipe_name):
    """
    Generate preparation steps for a given recipe name.
    """
    # Format the input prompt
    prompt = f"Recipe: {recipe_name}\nPreparation Steps:"

    # Tokenize the input on CPU
    inputs = tokenizer(prompt, return_tensors="pt").to("cpu")  # Use CPU

    # Stream text generation
    streamer = TextStreamer(tokenizer)
    print("Generating response...")
    _ = model.generate(**inputs, streamer=streamer, max_new_tokens=128)

    # Generate and decode the response
    outputs = model.generate(inputs.input_ids, max_length=256, num_return_sequences=1)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response[len(prompt):].strip()  # Exclude the prompt from the output

# Create the Gradio Interface
interface = gr.Interface(
    fn=get_preparation_steps,
    inputs=gr.Textbox(label="Enter Recipe Name", placeholder="E.g., Ragi Idiyappam"),
    outputs=gr.Textbox(label="Preparation Steps"),
    title="Recipe Chatbot",
    description="Enter a recipe name to get the preparation steps generated by the Satvick Lora Model.",
    theme="compact"
)

# Launch the interface
if __name__ == "__main__":
    interface.launch()

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]


Sorry, we can't find the page you are looking for.


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://b833d485a5440b76be.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
