We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/illustration_style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [3]:
input_list = [
  'Taylor Swift - Love Story.mp3',
]
prompts = [r'''

''',
]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
num_char = 2 # default 1
num_non_char = 2 # default 1
image_num = 1 
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [4]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 2')

Taylor Swift - Love Story.mp3
['Taylor Swift - Love Story.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.50it/s]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.43it/s]


False
using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
successfully add prompt for Taylor Swift - Love Story.wav
This music is cut into 10 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A pop/rock song with a light, bouncy feel.
Description piece 2: A pop song with a country feel. Features acoustic guitar, electric guitar, drums, bass, and female vocals. The mood is uplifting and positive. This song would work well in a corporate video, commercial, or advertisement.
Description piece 3: A driving, energetic pop-rock song with a strong female lead vocal. The song is upbeat and has a catchy chorus. The song is a good fit for commercials, corporate videos, and presentations.
Description piece 4: A catchy pop rock song with a female lead vocal. The song is upbeat and has a strong catchy chorus. The

0

## Process

In [5]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/ --num_char {num_char} --num_non_char {num_non_char}')

['Taylor Swift - Love Story']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
15
15
15
15
Token spent: 31587


0

In [6]:
for file_name in input_list:
  for t in range(num_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt' + str(t), 'r') as f:
      print(f.read())
  for t in range(num_non_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt_nc' + str(t), 'rb') as f:
      print(f.read())

upbeat, vibrant, youthful energy, summer day, two young adults, playful interaction, garden setting, bright colors, emotional connection, love story, memories, joy, nostalgia, 8k resolution, 16:9 aspect ratio, 60fps
female singer, emotional expression, acoustic guitar, electric guitar, drums, piano, uplifting mood, friendship, love, country pop, pop rock, medium tempo, driving rhythm, catchy melody, passionate vocal, summer balcony scene, past relationship, regret, longing, vibrant colors, 8k resolution, 16:9 aspect ratio, 60fps
b'vibrant garden setting, colorful flower beds, vibrant greenery, warm sunset hues, whimsical gazebo, romantic ambiance,Juliet-style balcony, throwback summer vibe, sparkling pebble path, emotional atmosphere, heartfelt memories, dynamic lighting, abstract patterns of light and shadow, warm and inviting, nostalgic essence, playful staircase, 8k resolution, 16:9 aspect ratio, 60fps.'
b'vibrant garden setting, blooming flowers, colorful butterfly wings, warm suns

## Generate

In [7]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num {image_num} --num_char {num_char} --num_non_char {num_non_char}')

Loading prompt from file
Taylor Swift - Love Story.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  7.47it/s]


Model loaded
Generating for Taylor Swift - Love Story.prompt


100%|██████████| 50/50 [00:06<00:00,  7.55it/s]
100%|██████████| 50/50 [00:06<00:00,  7.79it/s]


Generated for Taylor Swift - Love Story.prompt
Loading prompt from file
Generating image without characters
Prompt loaded
Generating for Taylor Swift - Love Story.prompt_nc


Token indices sequence length is longer than the specified maximum sequence length for this model (79 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (79 > 77). Running this sequence through the model will result in indexing errors
100%|██████████| 50/50 [00:06<00:00,  7.78it/s]
100%|██████████| 50/50 [00:06<00:00,  7.78it/s]


Generated for Taylor Swift - Love Story.prompt_nc


0

# Style transfer

If you don't want to keep the previous images, run the line below to remove them (so that we can show the images generated).

In [8]:
import os, glob
for file_name in input_list:
    [os.remove(f) for f in glob.glob(DATA_PATH + '.tmp/style_transfer/' + file_name + '/*')]

In [4]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o --num_char {num_char} --num_non_char {num_non_char} --attn --aams')

2024-06-06 13:46:53,494 - modelscope - INFO - PyTorch version 2.1.1 Found.
2024-06-06 13:46:53,496 - modelscope - INFO - TensorFlow version 2.16.1 Found.
2024-06-06 13:46:53,496 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-06-06 13:46:53,536 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 8b52ec5d6c1ca61827fbe6521de44ef5 and a total number of 976 components indexed


['0-0.png', '1-0.png', 'nc0-0.png', 'nc1-0.png']


2024-06-06 13:47:06,491 - modelscope - INFO - initiate model from /root/.cache/modelscope/hub/damo/cv_aams_style-transfer_damo
2024-06-06 13:47:06,491 - modelscope - INFO - initiate model from location /root/.cache/modelscope/hub/damo/cv_aams_style-transfer_damo.
2024-06-06 13:47:06.948187: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-06 13:47:07.056438: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2024-06-06 13:47:08.772847: I

content: Taylor Swift - Love Story/0-0.png
style: 15.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Taylor Swift - Love Story/0-0.png to /root/LLM_project/codes/data/style/illustration_style/15.png


2024-06-06 13:47:11.240564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 21330 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:37:00.0, compute capability: 8.9
2024-06-06 13:47:11.240828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:1 with 22283 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:9a:00.0, compute capability: 8.9
2024-06-06 13:47:11.357806: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-06-06 13:47:13.124426: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8902
2024-06-06 13:47:13.662427: I tensorflow/core/util/cuda_solvers.cc:178] Creating GpuSolver handles for stream 0x138a9310


torch.Size([3, 224, 224])
torch.Size([1024, 1024, 1])
Transfer from /root/LLM_project/codes/data/.tmp/generate/Taylor Swift - Love Story/0-0.png to /root/LLM_project/codes/data/style/illustration_style/15.png done
content: Taylor Swift - Love Story/1-0.png
style: 15.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Taylor Swift - Love Story/1-0.png to /root/LLM_project/codes/data/style/illustration_style/15.png


2024-06-06 13:47:21.316540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 21330 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:37:00.0, compute capability: 8.9
2024-06-06 13:47:21.316994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:1 with 22283 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:9a:00.0, compute capability: 8.9


torch.Size([3, 224, 224])
torch.Size([1024, 1024, 1])
Transfer from /root/LLM_project/codes/data/.tmp/generate/Taylor Swift - Love Story/1-0.png to /root/LLM_project/codes/data/style/illustration_style/15.png done
content: Taylor Swift - Love Story/nc0-0.png
style: 15.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Taylor Swift - Love Story/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/15.png


2024-06-06 13:47:27.427218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 21330 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:37:00.0, compute capability: 8.9
2024-06-06 13:47:27.427489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:1 with 22283 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:9a:00.0, compute capability: 8.9


torch.Size([3, 224, 224])
torch.Size([1024, 1024, 1])
Transfer from /root/LLM_project/codes/data/.tmp/generate/Taylor Swift - Love Story/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/15.png done
content: Taylor Swift - Love Story/nc1-0.png
style: 15.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Taylor Swift - Love Story/nc1-0.png to /root/LLM_project/codes/data/style/illustration_style/15.png


2024-06-06 13:47:32.268741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 21330 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:37:00.0, compute capability: 8.9
2024-06-06 13:47:32.269017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:1 with 22283 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:9a:00.0, compute capability: 8.9


torch.Size([3, 224, 224])
torch.Size([1024, 1024, 1])
Transfer from /root/LLM_project/codes/data/.tmp/generate/Taylor Swift - Love Story/nc1-0.png to /root/LLM_project/codes/data/style/illustration_style/15.png done


0

# Final Results

In [None]:
import matplotlib.pyplot as plt
from PIL import Image

result = {}
for music in input_list:
  result[music] = os.listdir(DATA_PATH + '.tmp/style_transfer/'+music)

for (music, pics) in zip(result.keys(), result.values()):
  print(music)
  for pic in pics:
    if pic.endswith('.png'):
      image = Image.open((DATA_PATH + '.tmp/style_transfer/'+music+'/'+pic))
      plt.imshow(image)
      plt.axis('off')
      plt.show()