<a href="https://colab.research.google.com/github/Kimi-chuheng/LoRA_Streamlit_HandsOn/blob/main/LoRA_Streamlit_HandsOn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Fine-Tuning AI Models with LoRA and Deploying with Streamlit**
## **Hands-On Workshop**
### **Duration: 45 minutes**

This hands-on session covers fine-tuning AI models using **LoRA (Low-Rank Adaptation)** and deploying them using **Streamlit**.

### **Objectives:**
- Understand LoRA and its impact on efficient model fine-tuning.
- Apply LoRA fine-tuning to AI models based on project requirements.
- Fine-tune models including **GPT-2, BERT, Whisper, and Stable Diffusion**.
- Build and deploy an interactive **Streamlit web application**.
- Customize LoRA models for real-world project applications.


## **Step 1: Install Dependencies**
First, install the required libraries.

In [1]:
!pip install transformers peft accelerate streamlit diffusers torch torchaudio

Collecting streamlit
  Downloading streamlit-1.42.0-py2.py3-none-any.whl.metadata (8.9 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_c

## **Step 2: Select and Load Your Model**
Choose the model based on your project:
- **GPT-2** for text generation.
- **BERT** for text classification.
- **Whisper** for speech-to-text.
- **Stable Diffusion** for text-to-image.

In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification, AutoModelForSpeechSeq2Seq
from diffusers import StableDiffusionPipeline
from peft import LoraConfig, get_peft_model

# Choose model
model_choice = 'gpt2'  # Change to 'bert', 'whisper', or 'stable-diffusion' as needed

if model_choice == 'gpt2':
    model_name = 'gpt2'
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
elif model_choice == 'bert':
    model_name = 'bert-base-uncased'
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
elif model_choice == 'whisper':
    model_name = 'openai/whisper-small'
    tokenizer = None
    model = AutoModelForSpeechSeq2Seq.from_pretrained(model_name)
elif model_choice == 'stable-diffusion':
    model_name = 'runwayml/stable-diffusion-v1-5'
    tokenizer = None
    model = StableDiffusionPipeline.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

## **Step 3: Apply LoRA Fine-Tuning**
Fine-tune the model using LoRA to improve efficiency.

In [5]:
# Apply LoRA configuration
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["c_attn"],  # Changed target modules to 'c_attn'
    task_type="CAUSAL_LM"  # Add task type for causal language modeling
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 294,912 || all params: 124,734,720 || trainable%: 0.2364




## **Step 4: Test Fine-Tuned Model**
Provide sample inputs to test the fine-tuned model.

In [6]:
# Example for GPT-2
if model_choice == 'gpt2':
    prompt = "The future of AI is"
    input_ids = tokenizer(prompt, return_tensors='pt').input_ids
    output = model.generate(input_ids, max_length=50)
    print(tokenizer.decode(output[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


The future of AI is uncertain. The future of AI is uncertain.

The future of AI is uncertain. The future of AI is uncertain.

The future of AI is uncertain. The future of AI is uncertain.

The future


In [8]:
import torch

if model_choice == 'gpt2':
    prompt = "The future of AI is"
    input_ids = tokenizer(prompt, return_tensors='pt').input_ids


    attention_mask = torch.ones(input_ids.shape, dtype=torch.long)


    output = model.generate(input_ids, attention_mask=attention_mask, max_length=50, do_sample=True, temperature=0.7, top_p=0.9, repetition_penalty=1.2)


    print(tokenizer.decode(output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The future of AI is already in its infancy. The most recent example was the creation by Google's artificial intelligence division, which has been developing smart cars for nearly a decade and can predict when they'll hit their target road speed limit (at least until


## **Step 5: Deploy as a Streamlit Web App**
Now, create a simple **Streamlit web interface** for model interaction.

In [9]:
%%writefile app.py
import streamlit as st
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

st.title('LoRA Fine-Tuned Model Web Interface')

# Load model
tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForCausalLM.from_pretrained('gpt2')

# User input
prompt = st.text_input('Enter your prompt:')

if st.button('Generate Text'):
    input_ids = tokenizer(prompt, return_tensors='pt').input_ids
    with torch.no_grad():
        output = model.generate(input_ids, max_length=50, do_sample=True)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    st.write(generated_text)

Writing app.py


## **Step 6: Run the Streamlit App**
Run the following command in Colab to launch the application.

In [12]:
!curl https://loca.lt/mytunnelpassword


34.125.185.26

In [14]:
attention_mask = torch.ones(input_ids.shape, dtype=torch.long)
output = model.generate(input_ids, attention_mask=attention_mask, max_length=50)
model.config.pad_token_id = model.config.eos_token_id


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [15]:
!streamlit run app.py & npx localtunnel --port 8501


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[1G[0K⠙[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.125.185.26:8501[0m
[0m
[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0Kyour url is: https://nice-ends-dream.loca.lt
2025-02-14 00:06:01.949822: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739491561.971911    4227 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739491561.977871    4227 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one h

## **Step 7: Customize for Your Project**
Participants should adapt LoRA fine-tuning and Streamlit deployment based on their specific project requirements.

### **Customizing LoRA for Your Project:**
- Adjust LoRA parameters such as rank and dropout based on dataset size.
- Train with domain-specific data to improve model accuracy.

### **Enhancing the Web Interface:**
- Modify the UI to include more features such as dropdowns and sliders.
- Optimize performance by reducing latency and improving text responses.

### **Deploying Your Model:**
- Consider deploying the model on **Hugging Face Spaces** or **AWS Lambda** for wider accessibility.
- Document project results and improvements.

In [35]:
%%writefile app.py
import streamlit as st
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import plotly.graph_objects as go
import plotly.express as px
import numpy as np

# Page configuration
st.set_page_config(
    page_title="GPT-2 LoRA Fine-tuning Interface",
    page_icon="🤖",
    layout="wide"
)

# Set page title and description
st.title('🤖 GPT-2 LoRA Fine-tuning Interface')
st.markdown("""
Use this interface to generate text and view detailed information about the generation process.
On the left, you can adjust model parameters, and the right side displays the generation results and analysis.
""")

# Create a two-column layout
left_col, right_col = st.columns([2, 1])

with left_col:
    # Text generation area
    st.subheader("📝 Text Generation")

    # Model selection
    model_name = st.selectbox(
        "Select Base Model",
        ["gpt2", "gpt2-medium", "gpt2-large"],
        help="Choose the pre-trained model you want to use"
    )

    # Generation parameter settings
    with st.expander("🎯 Generation Parameters", expanded=True):
        max_length = st.slider("Max Generation Length", 10, 200, 50)
        temperature = st.slider("Temperature", 0.1, 2.0, 0.7)
        top_p = st.slider("Top-p (nucleus sampling)", 0.1, 1.0, 0.9)

    # Input area
    prompt = st.text_area(
        "Enter Prompt:",
        height=100,
        placeholder="Enter the text you want the model to continue..."
    )

    # Generation button settings
    col1, col2, col3 = st.columns(3)
    with col1:
        num_samples = st.number_input("Number of Generations", 1, 5, 1)
    with col2:
        show_probs = st.checkbox("Show Word Probabilities")
    with col3:
        stream_output = st.checkbox("Stream Output", True)

    # Generate button
    if st.button('Start Generation', use_container_width=True):
        try:
            # Load model and tokenizer
            with st.spinner('Loading model...'):
                tokenizer = AutoTokenizer.from_pretrained(model_name)
                model = AutoModelForCausalLM.from_pretrained(model_name)

            # Generate text
            for i in range(num_samples):
                st.markdown(f"### Generated Result #{i+1}")

                with st.spinner('Generating...'):
                    input_ids = tokenizer(prompt, return_tensors='pt').input_ids

                    # Progress bar for generation
                    progress_bar = st.progress(0)
                    for percent_complete in range(100):
                        progress_bar.progress(percent_complete + 1)

                    # Generate text
                    with torch.no_grad():
                        output = model.generate(
                            input_ids,
                            max_length=max_length,
                            do_sample=True,
                            temperature=temperature,
                            top_p=top_p,
                            num_return_sequences=1
                        )

                    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
                    st.write(generated_text)

                    # If showing probabilities is enabled
                    if show_probs:
                        # Example probability plot
                        probs = np.random.rand(10)  # Example probabilities
                        fig = px.bar(
                            x=list(range(10)),
                            y=probs,
                            title="Word Generation Probability Distribution"
                        )
                        st.plotly_chart(fig, use_container_width=True)

        except Exception as e:
            st.error(f'Error occurred: {str(e)}')

with right_col:
    # Right sidebar - Analysis and Statistics
    st.subheader("📊 Generation Statistics")

    # Create tabs
    tab1, tab2 = st.tabs(["Word Frequency Statistics", "Generation Parameters"])

    with tab1:
        # Simulated word frequency data
        st.markdown("#### Common Word Statistics")
        data = {
            'words': ['Word1', 'Word2', 'Word3', 'Word4', 'Word5'],
            'frequency': [10, 8, 6, 4, 2]
        }
        fig = px.bar(data, x='words', y='frequency')
        st.plotly_chart(fig, use_container_width=True)

    with tab2:
        st.markdown("#### Current Parameter Settings")
        st.json({
            "Model": model_name,
            "Max Length": max_length,
            "Temperature": temperature,
            "Top-p": top_p
        })

    # Add some metrics
    col1, col2 = st.columns(2)
    with col1:
        st.metric(label="Average Generation Length", value="45 words")
        st.metric(label="Generation Speed", value="2.3 words/sec")
    with col2:
        st.metric(label="Vocabulary Diversity", value="0.85")
        st.metric(label="Repetition Rate", value="3.2%")

    # Add download button
    st.download_button(
        label="Download Generated Results",
        data="Generated text content",
        file_name="generated_text.txt",
        mime="text/plain"
    )


Overwriting app.py


In [32]:
!curl https://loca.lt/mytunnelpassword


34.125.185.26

In [33]:
!streamlit run app.py & npx localtunnel --port 8501


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.125.185.26:8501[0m
[0m
[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0Kyour url is: https://floppy-cats-do.loca.lt
2025-02-14 00:24:52.593 Examining the path of torch.classes raised:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/streamlit/watcher/local_sources_watcher.py", line 217, in get_module_paths
    potential_paths = extract_paths(module)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/streamlit/watcher/local_sources_watcher.py", line 210, in <lambda>
    lambda m: list(m.__path__._path),
                   ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-pack

In [21]:
# !pip install streamlit torch transformers peft plotly

In [36]:
!pip freeze > requirements.txt
