We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/illustration_style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [3]:
input_list = [
  'Distorted Fate.mp3',
  'Retribution ~ Cycle of Redemption ~.mp3',
]
prompts = [""]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
num_char = 2 # default 1
num_non_char = 2 # default 1
image_num = 1 
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [9]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 2')

GOODTEK.mp3
Retribution ~ Cycle of Redemption ~.mp3
Chronologika.mp3
CrossSoul.mp3
['GOODTEK.wav', 'Retribution ~ Cycle of Redemption ~.wav', 'Chronologika.wav', 'CrossSoul.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:34<00:00,  3.79s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:26<00:00,  2.91s/it]


using device 0


2024-06-06 14:34:56.374748: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-06 14:34:56.434966: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


using device 1
using device 0
using device 1
using device 0
successfully add prompt for GOODTEK.wav
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
successfully add prompt for Retribution ~ Cycle of Redemption ~.wav
using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
successfully add prompt for Chronologika.wav
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
successfully add prompt for CrossSoul.wav
This music is cut into 5 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A fast-paced techno song with a strong beat and a repetitive melody. The instruments are powerful and the sound is energetic. The song is perfect for action scenes, sports, and fast-paced events.
Description piece 2: This is a high-energy electronic

0

## Process

In [6]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/ --num_char {num_char} --num_non_char {num_non_char}')

['Distorted Fate']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
1
3
7
7
Token spent: 31734


0

In [11]:
for file_name in input_list:
  for t in range(num_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt' + str(t), 'r') as f:
      print(f.read())
  for t in range(num_non_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt_nc' + str(t), 'rb') as f:
      print(f.read())

dynamic scene, vibrant colors, abstract shapes, futuristic cityscape, neon lights, dancing crowd, fast motion blur, glowing energy, sense of exhilaration, 8k resolution, 16:9 aspect ratio, 60fps
energetic dance floor, strobe lights, silhouette of a lone dancer, neon glow, electronic beats, abstract patterns, color shifts, sense of movement, dynamic angles, bright highlights, dark background, urban setting, 8k resolution, 16:9 aspect ratio, 60fps.
b'dynamic cyber landscape, neon-lit horizon, pulsating rhythm lines, vibrant color palette, abstract shapes, intense energy, glowing orbs, electronic glitches, futuristic cityscape, instrumental symbolism, ethereal light pillars, sense of height and elation, dreamlike state, no characters, high contrast, 8k resolution, 16:9 aspect ratio, 60fps.'
b'dynamic cyber landscape, neon-lit horizon, pulsating rhythm lines, abstract shapes in motion, electric blue and acid green hues, glitchy digital textures, sense of speed and urgency, ethereal atmosph

## Generate

In [12]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num {image_num} --num_char {num_char} --num_non_char {num_non_char}')

2024-06-06 14:43:02.194690: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-06 14:43:02.247466: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Loading prompt from file
GOODTEK.prompt
Retribution ~ Cycle of Redemption ~.prompt
Chronologika.prompt
CrossSoul.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00,  5.38it/s]


Model loaded
Generating for GOODTEK.prompt


100%|██████████| 50/50 [00:06<00:00,  7.51it/s]
100%|██████████| 50/50 [00:06<00:00,  7.88it/s]


Generated for GOODTEK.prompt
Generating for Retribution ~ Cycle of Redemption ~.prompt


 54%|█████▍    | 27/50 [00:03<00:02,  7.79it/s]

100%|██████████| 50/50 [00:06<00:00,  7.86it/s]
100%|██████████| 50/50 [00:06<00:00,  7.86it/s]


Generated for Retribution ~ Cycle of Redemption ~.prompt
Generating for Chronologika.prompt


Token indices sequence length is longer than the specified maximum sequence length for this model (478 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (478 > 77). Running this sequence through the model will result in indexing errors
100%|██████████| 50/50 [00:06<00:00,  7.35it/s]
100%|██████████| 50/50 [00:06<00:00,  7.74it/s]


Generated for Chronologika.prompt
Generating for CrossSoul.prompt


100%|██████████| 50/50 [00:06<00:00,  7.76it/s]
100%|██████████| 50/50 [00:06<00:00,  7.78it/s]


Generated for CrossSoul.prompt
Loading prompt from file
Generating image without characters
Prompt loaded
Generating for GOODTEK.prompt_nc


100%|██████████| 50/50 [00:06<00:00,  7.78it/s]
100%|██████████| 50/50 [00:06<00:00,  7.78it/s]


Generated for GOODTEK.prompt_nc
Generating for Retribution ~ Cycle of Redemption ~.prompt_nc


100%|██████████| 50/50 [00:06<00:00,  7.78it/s]
100%|██████████| 50/50 [00:06<00:00,  7.77it/s]


Generated for Retribution ~ Cycle of Redemption ~.prompt_nc
Generating for Chronologika.prompt_nc


100%|██████████| 50/50 [00:06<00:00,  7.78it/s]
100%|██████████| 50/50 [00:06<00:00,  7.77it/s]


Generated for Chronologika.prompt_nc
Generating for CrossSoul.prompt_nc


100%|██████████| 50/50 [00:06<00:00,  7.78it/s]
100%|██████████| 50/50 [00:06<00:00,  7.78it/s]


Generated for CrossSoul.prompt_nc


0

# Style transfer

If you don't want to keep the previous images, run the line below to remove them (so that we can show the images generated).

In [11]:
import os, glob
for file_name in input_list:
    [os.remove(f) for f in glob.glob(DATA_PATH + '.tmp/style_transfer/' + file_name + '/*')]

In [12]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o --num_char {num_char} --num_non_char {num_non_char} --attn --aams')

2024-06-11 21:11:24,874 - modelscope - INFO - PyTorch version 2.1.1 Found.
2024-06-11 21:11:24,876 - modelscope - INFO - TensorFlow version 2.16.1 Found.
2024-06-11 21:11:24,876 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-06-11 21:11:24,916 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 254d387c1784b4dbb125f4cdeb89d21c and a total number of 980 components indexed


['0-0.png', '1-0.png', 'nc0-0.png', 'nc1-0.png']


2024-06-11 21:11:29,806 - modelscope - INFO - initiate model from /root/.cache/modelscope/hub/damo/cv_aams_style-transfer_damo
2024-06-11 21:11:29,806 - modelscope - INFO - initiate model from location /root/.cache/modelscope/hub/damo/cv_aams_style-transfer_damo.
2024-06-11 21:11:30.061474: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-11 21:11:30.107230: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2024-06-11 21:11:31.435510: I

content: Distorted Fate/0-0.png
style: 1.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/0-0.png to /root/LLM_project/codes/data/style/illustration_style/1.png


2024-06-11 21:11:32.194142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 22181 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:36:00.0, compute capability: 8.9
2024-06-11 21:11:32.194401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:1 with 21901 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:37:00.0, compute capability: 8.9
2024-06-11 21:11:32.250192: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-06-11 21:11:34.039840: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8902
2024-06-11 21:11:34.338323: I tensorflow/core/util/cuda_solvers.cc:178] Creating GpuSolver handles for stream 0xe5d3320


torch.Size([3, 224, 224])
torch.Size([1024, 1024, 1])
Transfer from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/0-0.png to /root/LLM_project/codes/data/style/illustration_style/1.png done
content: Distorted Fate/1-0.png
style: 3.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/1-0.png to /root/LLM_project/codes/data/style/illustration_style/3.png


2024-06-11 21:11:38.319264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 22181 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:36:00.0, compute capability: 8.9
2024-06-11 21:11:38.319531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:1 with 21901 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:37:00.0, compute capability: 8.9


torch.Size([3, 224, 224])
torch.Size([1024, 1024, 1])
Transfer from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/1-0.png to /root/LLM_project/codes/data/style/illustration_style/3.png done
content: Distorted Fate/nc0-0.png
style: 7.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png


2024-06-11 21:11:39.411303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 22181 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:36:00.0, compute capability: 8.9
2024-06-11 21:11:39.411562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:1 with 21901 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:37:00.0, compute capability: 8.9


torch.Size([3, 224, 224])
torch.Size([1024, 1024, 1])
Transfer from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png done
content: Distorted Fate/nc1-0.png
style: 11.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/nc1-0.png to /root/LLM_project/codes/data/style/illustration_style/11.png


2024-06-11 21:11:40.501115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 22181 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:36:00.0, compute capability: 8.9
2024-06-11 21:11:40.501374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:1 with 21901 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:37:00.0, compute capability: 8.9


torch.Size([3, 224, 224])
torch.Size([1024, 1024, 1])
Transfer from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/nc1-0.png to /root/LLM_project/codes/data/style/illustration_style/11.png done


0

# Final Results

In [15]:
import matplotlib.pyplot as plt
from PIL import Image

result = {}
for music in input_list:
  result[music] = os.listdir(DATA_PATH + '.tmp/style_transfer/'+music)

for (music, pics) in zip(result.keys(), result.values()):
  print(music)
  for pic in pics:
    if pic.endswith('.png'):
      image = Image.open((DATA_PATH + '.tmp/style_transfer/'+music+'/'+pic))
      plt.imshow(image)
      plt.axis('off')
      plt.show()

FileNotFoundError: [Errno 2] No such file or directory: '/root/LLM_project/codes/data/.tmp/style_transfer/Chronologika'