## 第十一週作業 04/29: 打造自己的圖像生成 Web App
📌 臺灣師範大學 | 613K0019C | AI所碩一 | 劉思廷

### 作業內容說明:
- 選定一個合適的模型，試著生出各種圖片
- 使用 LLaMA3 70B 將使用者的 prompt 修改成更完整的內容


In [1]:
""" 1. 安裝必要套件 """
!pip install openai diffusers transformers accelerate safetensors huggingface_hub gradio --upgrade

Collecting accelerate
  Downloading accelerate-1.7.0-py3-none-any.whl.metadata (19 kB)
Collecting gradio
  Downloading gradio-5.29.1-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.10.1 (from gradio)
  Downloading gradio_client-1.10.1-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.10-py3-none-manylinux_

In [2]:
""" 2. 匯入必要套件 """
import torch
import random
import matplotlib.pyplot as plt
import gc # Python 的垃圾回收模組，用來手動釋放未使用的記憶體（避免 Colab 記憶體爆掉）

In [3]:
""" 從 Hugging Face Hub 載入預訓練的 Stable Diffusion 模型 pipeline """
from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler # 匯入 Stable Diffusion 管線與多步驟調度器

# 選定模型
model_name = "digiplay/majicMIX_realistic_v6"

# 載入預訓練模型
#   model_name: 從 Hugging Face Hub 載入指定名稱的模型
#   torch_dtype=torch.float16: 使用 float16 精度可節省 GPU 記憶體並加速推論
#   use_safetensors=True: 使用 safetensors 格式載入模型權重，更安全且效能較佳
pipe = StableDiffusionPipeline.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    use_safetensors=True
).to("cuda") # 將模型移至 GPU（CUDA）以提升運算效能

# 替換預設的 scheduler（排程器）為 UniPCMultistepScheduler，提高生成品質與速度
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/553 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/4.58k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/580 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/582 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/520 [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]



In [4]:
""" 設定推薦使用者 prompt 的模型 """
# 匯入需要套件
import os
from google.colab import userdata # 用來存取使用者上傳的私人資料 (API 金鑰)

# 讀入 API 金鑰 並指定要使用的模型
api_key = userdata.get('Groq')
model = "llama3-70b-8192"

# 將取得的 API 金鑰設定為環境變數 'OPENAI_API_KEY': 讓 OpenAI 客戶端可以自動讀取使用
os.environ['OPENAI_API_KEY'] = api_key

# 從 openai 套件中匯入 OpenAI 類別: 建立與 OpenAI API 或相容服務（如 Groq）的連線與互動
from openai import OpenAI
client = OpenAI(base_url = "https://api.groq.com/openai/v1")

In [5]:
# LLaMA3 的 system prompt
system = '''
You are a professional prompt engineer specializing in image generation.
Your job is to take a user's raw, vague, or casual description, and convert it into a rich,
detailed prompt suitable for text-to-image models like Stable Diffusion, DALL·E, or Midjourney.

Focus on enhancing the visual clarity, artistic style, atmosphere, and composition of the prompt.
Avoid repeating user input literally. Instead, interpret and expand it into a well-structured
and evocative visual scene involving one or more human characters.

Prioritize details such as the subject's appearance, clothing, pose, expression, setting, lighting, and overall mood.

Output only the final prompt. Do not include explanations or meta comments.

Here are some examples:
-
Example 1
User input: "a girl working in a café"
Final prompt: "A young woman working on a laptop in a cozy café,
wearing a beige sweater and glasses, natural sunlight streaming through the window,
wooden furniture, latte on the table, warm and peaceful atmosphere, cinematic soft lighting"

-
Example 2
User input: "a man walking in the rain"
Final prompt: "A man in a long coat walking down a rainy city street at night,
reflections on the wet pavement, holding a transparent umbrella, moody lighting, soft blur,
street lights glowing in the background, melancholic tone, cinematic look"

-
Example 3
User input: "a couple at the beach"
Final prompt: "A couple walking barefoot along a sunset beach, holding hands,
warm golden light reflecting on the water, relaxed and romantic atmosphere,
wind blowing through their hair, pastel sky, soft focus"

Now, continue this pattern. When the user provides a request,
return only the detailed image prompt, in the same style as above.'''

In [6]:
""" 定義 prompt_advice 函式, 接收使用者的 prompt """
def prompt_adivce(prompt):
    messages = [
        {"role": "system", "content": system},
        {"role": "user", "content": prompt}
    ]
    # 呼叫模型的 API，傳入對話歷史 messages 和指定的模型
    chat_completion = client.chat.completions.create(messages=messages, model=model)
    # 從模型回傳的結果中取出第一個回應的內容
    reply = chat_completion.choices[0].message.content
    return reply # 回傳模型的回應文字

In [7]:
def generate_images(prompt, use_enhance, enhance_text, use_negative, negative_text,
                    use_custom_seed, custom_seed, height, width, steps, num_images):
    # 將輸入的高度與寬度轉為整數
    height = int(height)
    width = int(width)

    # 檢查圖像尺寸是否為8的倍數（Stable Diffusion 的要求）
    if height % 8 != 0 or width % 8 != 0:
        raise ValueError("高度和寬度必須是8的倍數！")

    # 設定隨機種子（可指定或自動生成）
    if use_custom_seed:
        base_seed = int(custom_seed) # 使用者自訂的種子
    else:
        base_seed = random.randint(0, 2**32 - 1) # 隨機產生一個種子

    # 產生每一張圖片對應的種子（避免完全相同）
    seeds = [base_seed + i for i in range(num_images)]

    # 初始化列表
    prompts = []
    negative_prompts = []
    generators = []

    # 根據是否使用增強文字，組合最終提示詞
    final_prompt = prompt

    if use_enhance and enhance_text:
        final_prompt = prompt + ", " + enhance_text

    # 設定負面提示詞（用來避免生成不想要的內容）
    final_negative = negative_text if use_negative else None

    # 為每張圖像準備對應的生成器與提示詞
    for seed in seeds:
        g = torch.Generator("cuda").manual_seed(seed) # 使用特定種子設定 CUDA 隨機生成器
        generators.append(g)
        prompts.append(final_prompt)
        negative_prompts.append(final_negative)

    # 清理記憶體
    gc.collect()
    torch.cuda.empty_cache()

    # 生成圖像
    images = []
    for i in range(num_images):
        with torch.no_grad():
            image = pipe(
                prompt=prompts[i], # 主提示詞
                negative_prompt=negative_prompts[i] if final_negative else None, # 負面提示詞（如需）
                height=height,
                width=width,
                num_inference_steps=steps, # 去噪步數（越多圖像越精細，但越慢）
                guidance_scale=7.5, # 提示詞引導強度
                generator=generators[i] # 使用對應的隨機種子產生器
            ).images[0] # 取得產生的圖像
            images.append(image)

    # 回傳圖像與所使用的種子
    return images, f"使用的 random seeds: {seeds}"

In [8]:
""" 預設的增強與負面提示詞 """
# 有使用 ChatGPT 基於老師範例的 prompt 做延伸
default_enhance = (
    "masterpiece, best quality, photorealistic, ultra high resolution, "
    "8k, HDR, beautifully lit, volumetric lighting, dramatic shadows, "
    "extremely detailed skin texture, realistic facial expressions, "
    "sharp focus, shallow depth of field, DSLR photography, "
    "cinematic composition, finely rendered, professional studio lighting"
)
default_negative = (
    "blurry, low resolution, bad anatomy, disfigured face, "
    "distorted body, extra limbs, missing fingers, extra fingers, mutated hands, "
    "poorly drawn face, asymmetrical eyes, broken limbs, out of frame, "
    "text, watermark, logo, ugly, low quality, worst quality, grainy, jpeg artifacts, "
    "overexposed, underexposed, bad lighting, unnatural colors"
)

In [13]:
# 匯入 Gradio 套件
import gradio as gr

with gr.Blocks(css="""
.gradio-container { background-color: #111111; font-family: 'Segoe UI', sans-serif; padding: 25px; }
.gr-button {font-size: 18px; background: linear-gradient(to right, #667eea, #764ba2); color: white;}
.markdown h1, .markdown h2, .markdown p {color: #222222 !important;}
""") as demo:

    gr.Markdown("""
    # 🎨 MajicMIX v6 互動圖像生成器
    歡迎使用！輸入提示詞、選擇設定，立即生成你的寫實風格作品！
    """)

    with gr.Row():
        with gr.Column(scale=6):
            prompt_input = gr.Textbox(label="Prompt", placeholder="請輸入你的提示詞 (prompt)", lines=3)
            better_prompt_output = gr.Textbox(label="建議提示詞", lines=3)

            get_advice_btn = gr.Button("🧠 AI 建議提示詞")

            get_advice_btn.click(
                fn=prompt_adivce,
                inputs=prompt_input,
                outputs=better_prompt_output
            )

            with gr.Accordion("進階設定", open=False):
              use_enhance = gr.Checkbox(label="加強 Prompt", value=True)
              enhance_text = gr.Textbox(label="加強內容", value=default_enhance)
              use_negative = gr.Checkbox(label="使用 Negative Prompt", value=True)
              negative_text = gr.Textbox(label="Negative Prompt 內容", value=default_negative)
              use_custom_seed = gr.Checkbox(label="自訂 Random Seed", value=False)
              custom_seed = gr.Number(label="指定種子", value=45)

            with gr.Row():
                height = gr.Dropdown(["512", "768", "1024"], label="高度 Height", value="512")
                width = gr.Dropdown(["512", "768", "1024"], label="寬度 Width", value="512")
            with gr.Row():
                steps = gr.Slider(10, 50, value=30, step=5, label="生成步數 (Steps)")
                num_images = gr.Slider(1, 4, step=1, value=1, label="生成張數")
            generate_btn = gr.Button("🚀 開始生成！")

        with gr.Column(scale=6):
            gallery = gr.Gallery(label="生成結果", columns=2, object_fit="contain", height="auto")
            seed_info = gr.Label(label="使用的 Random Seeds")

    generate_btn.click(
        fn=generate_images,
        inputs=[better_prompt_output, use_enhance, enhance_text, use_negative,
                negative_text, use_custom_seed, custom_seed, height, width, steps, num_images],
        outputs=[gallery, seed_info]
    )

In [14]:
demo.launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://556dbd246fca6b49a3.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: [', 8 k, hdr, beautifully lit, volumetric lighting, dramatic shadows, extremely detailed skin texture, realistic facial expressions, sharp focus, shallow depth of field, dslr photography, cinematic composition, finely rendered, professional studio lighting']


  0%|          | 0/30 [00:00<?, ?it/s]

Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://556dbd246fca6b49a3.gradio.live


