<a href="https://colab.research.google.com/github/Wendylin0112/Multimedia_Final_Project/blob/main/Multimedia_FN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1.音訊處理與分類

In [None]:
# 安裝必要的程式庫（如果未安裝）
!pip install librosa tensorflow tensorflow_hub pandas tensorflow_io

In [None]:
import librosa
import tensorflow_hub as hub
import tensorflow as tf
import pandas as pd
import numpy as np
import os
import json

# 掛載 Google Drive
from google.colab import drive
drive.mount('/content/drive')

# 加載 YAMNet 模型
model = hub.load('https://tfhub.dev/google/yamnet/1')

# 動態篩選類別索引
class_map_path = model.class_map_path().numpy().decode('utf-8')
class_map = pd.read_csv(class_map_path)

# 篩選包含 "Rain", "Ocean", "Stream", "Train", "Rain on surface", "Writing","Insect"的類別
keywords = ["Rain", "Ocean", "Stream","Train", "Writing", "Insect"]
filtered_classes = class_map[class_map['display_name'].str.contains('|'.join(keywords), case=False)]

# 提取篩選結果
selected_indices = filtered_classes['index'].tolist()
class_names = filtered_classes['display_name'].tolist()
detailed_labels = {
    "Rain": "雨聲",
    "Ocean": "海洋聲",
    "Stream": "溪流聲",
    "Train": "火車聲",
    "Writing": "寫字聲",
    "Insect": "昆蟲鳴聲"
}

print(f"Selected indices: {selected_indices}")
print(f"Class names: {class_names}")

def classify_audio(audio_file):
    """
    使用 YAMNet 模型對單個音檔進行分類
    """
    try:
        # 加載音檔並重採樣到 16kHz
        waveform, _ = librosa.load(audio_file, sr=16000)

        # 使用 YAMNet 進行分類
        scores, _, _ = model(waveform)
        scores_np = scores.numpy()

        # 檢查模型輸出
        if scores_np.ndim == 0 or scores_np.size == 0:
            raise ValueError(f"模型未返回有效的分類分數，音檔路徑: {audio_file}")

        # 過濾選定類別的分數
        filtered_scores = scores_np[:, selected_indices]
        mean_scores = filtered_scores.mean(axis=0)

        # 獲取分類索引
        base_label = class_names[np.argmax(mean_scores)]

        # 返回對應的中文標籤
        return detailed_labels[base_label]
    except Exception as e:
        print(f"Error processing file {audio_file}: {e}")
        return "分類失敗"

def classify_audio_folder(folder_path):
    """
    對資料夾中的所有音檔進行分類
    """
    try:
        # 獲取所有音檔
        audio_files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith('.wav')]

        # 存儲分類結果
        results = {}
        for file in audio_files:
            label = classify_audio(file)
            results[file] = label
            print(f"File: {file} -> Label: {label}")

        return results
    except Exception as e:
        print(f"Error processing folder {folder_path}: {e}")
        return {}

# 測試 voice 資料夾中的所有音檔
audio_folder = '/content/drive/My Drive/多媒體_期末報告/test_audio/'
results = classify_audio_folder(audio_folder)

# 打印分類結果
print("\n分類結果：")
for file, label in results.items():
    print(f"{file}: {label}")

## 2.生成對應的文本

1. **下雨（大雨）**
背景：城市街道。
聚焦生成：大雨的雨滴，使用刮畫法，雨滴落下的軌跡非常明顯。砸在地上、屋簷上會濺起小水花。
天空顏色為淺灰色。
只有下雨部分為動態，其餘皆保持靜態。

2. **溪流**
背景：靜謐的森林，溪流中有石頭。
必須生成：潺潺流水。
可隨機生成（也可無）：石頭上的青苔、溪水中的樹葉。
只有潺潺流水須為動態，其他皆為靜態。

3. **海浪**
背景：夜晚海邊的沙灘。
聚焦：翻騰的海浪。
可隨機生成（也可無）：沙灘上海星、天上繁星、沙灘上有腳印及城堡。
影片只需讓聚焦部分動態化，其他皆保持靜態。

4. **火車**
背景: 室內月台，黃光壁燈，月台椅子
聚焦: 行駛中火車。
影片只須讓聚焦部分動態化，其他皆保持靜態。

5. **writing**
背景: 圖書館，第一視角寫字，桌上有書，手握著筆正在寫字。
聚焦: 手中的筆。
影片只須讓聚焦部分動態化，其他皆保持靜態。

6. **chirping**
背景：靜謐的森林，有少許陽光從樹葉間隙射入
必須生成：小昆蟲（正在進行跳、飛等動作）、小鳥（正在飛）。
可隨機生成（也可無）：兔子、狐狸、小鹿、浣熊、松鼠（靜態、動態皆可）
影片只需讓出現的生物動態化，其他皆保持靜態

1. **Rain** City street is heavily raining with obvious rainfalls and small splashes when hits the ground. Sky is light gray. Only the rainfalls should be dynamic,else remains static.

2. **Stream** A stream containing rocks in the forest. Randomly creates moss on the rocks, leaves in the stream. Gently flowing water has to be dynamic, else remains static.

3. **Waves** A nighttime beach with rolling waves. Randomly creates small starfish on the beach, stars in the sky, footprints on the beach. Only the waves dynamic,else remains static.

4. **Train** Indoor platform with yellow-lit wall lamps and platform benches. A train is passing through the platform. Only the train is moving, else remains static.

5. **Writing** First-person perspective. I'm at a library. A book is lying on the desk, and I'm writing with a pen holding in my hands. Only the pen is moving ,else remains static.

6. **Insects** Small insects and birds are active in a forest with a few sunbeams filtering through the leaves. Randomly create lively rabbits, foxes, deer, raccoons, squirrels in the forest. Only the creatures are dynamic, else remains static.

### 2-5. 將分類連結對應的prompt


In [None]:
for file, label in results.items():
    if label == "雨聲":
        prompt = "City street is heavily raining with obvious rainfalls and small splashes when hits the ground. Sky is light gray. Only the rainfalls should be dynamic,else remains static."
    elif label == "溪流聲":
        prompt = "A stream containing rocks in the forest. Randomly creates moss on the rocks, leaves in the stream. Gently flowing water has to be dynamic, else remains static."
    elif label == "海洋聲":
        prompt = "A nighttime beach with rolling waves. Randomly creates small starfish on the beach, stars in the sky, footprints on the beach. Only the waves dynamic,else remains static."
    elif label == "火車聲":
        prompt = "Indoor platform with yellow-lit wall lamps and platform benches. A train is passing through the platform. Only the train is moving, else remains static."
    elif label == "寫字聲":
        prompt = "First-person perspective. I'm at a library. A book is lying on the desk, and I'm writing with a pen holding in my hands. Only the pen is moving ,else remains static."
    else:
        prompt = "Small insects and birds are active in a forest with a few sunbeams filtering through the leaves. Randomly create lively rabbits, foxes, deer, raccoons, squirrels in the forest. Only the creatures are dynamic, else remains static."
    print(f"File: {file}, Label: {label}, Prompt: {prompt}") # Added print statement to verify that this code works correctly

## 3.根據文本產生對應的影片

In [None]:
# 安裝必要的程式庫（如果未安裝）
!pip install diffusers transformers accelerate

In [None]:
!pip install imageio[ffmpeg] moviepy

In [None]:
!nvidia-smi

In [None]:
!nvcc -V

In [None]:
import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

In [None]:
pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

In [None]:
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    use_auth_token=True,
    torch_dtype=torch.float16,
).to("cuda")

pipeline.enable_model_cpu_offload()


#### Generate image

In [None]:
image = pipeline(prompt).images[0]  # 從列表中取出生成的第一張影像

# 將影像儲存為檔案
output_image_path = "example.jpg"  # 儲存的檔案名稱
image.save(output_image_path)

print(f"Image exported successfully to {output_image_path}")

#### Generate video

In [None]:
video_path = "generated_video.mp4"
generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size = 8, generator = generator, num_frames = 30, motion_bucket_id = 90, noise_aug_strength = 0.05).frames[0]

In [None]:
export_to_video(frames, video_path, fps = 10)

## 4.音訊影片結合

In [None]:
import os
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

# 設定音訊檔案路徑和影片檔案名稱
audio_path = '/content/drive/My Drive/多媒體_期末報告/test_audio/'
video_file = "generated_video.mp4"
output_file = "output_with_audio.mp4"  # 輸出的影片名稱

# 確保資料夾中只有一個音訊檔案，抓取 .wav 檔案
audio_files = [f for f in os.listdir(audio_path) if f.endswith('.wav')]

if len(audio_files) == 1:
    # 取得音訊完整路徑，並處理特殊字元
    audio_file = os.path.abspath(os.path.join(audio_path, audio_files[0]))
    print(f"找到音訊檔案: {audio_file}")
else:
    raise FileNotFoundError("資料夾中沒有 .wav 檔案或存在多個 .wav 檔案，請確認資料夾內容！")

# 建立安全的 FFmpeg 命令
ffmpeg_command = f'ffmpeg -i "{video_file}" -i "{audio_file}" -c:v copy -c:a aac -strict experimental -shortest "{output_file}"'

# 打印出執行的 FFmpeg 命令
print(f"執行的命令: {ffmpeg_command}")

# 執行 FFmpeg 命令
os.system(ffmpeg_command)

# 確認輸出檔案是否成功生成
if os.path.exists(output_file):
    print(f"影片與音訊已成功結合，輸出檔案: {output_file}")
else:
    print("影片與音訊結合失敗！")


In [None]:
# 將影片下載下來
from google.colab import files

files.download(output_file)