We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [16]:
import os

In [17]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/illustration_style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [18]:
input_list = [
  'Kerberos.mp3',
]
prompts = [r'''
  The name of this song is "Kerberos". 
''',
]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
num_char = 1 # default
num_non_char = 1 # default
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [19]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 4')

Kerberos.mp3
['Kerberos.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [01:08<00:00,  7.62s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [01:08<00:00,  7.60s/it]


using device 0
using device 1
using device 0
using device 1
using device 0
successfully add prompt for Kerberos.wav
This music is cut into 5 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: This is a high-energy electronic track with a strong emphasis on drums and bass. The atmosphere is intense and urgent, with a sense of danger and excitement. The music is perfect for action scenes, video games, and sports events. The track features a distorted, glitchy synth lead that gives the music a futuristic and edgy feel. Overall, this is a dynamic and exciting piece of music that will get your heart racing.
Description piece 2: This is a high-energy electronic track with a strong emphasis on drums and percussion. The tempo is fast and the music is intense and urgent. The track is suitable for action scenes, high-energy video games, and sports videos. The music is also suitable for use in a club or danc

0

In [23]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/extract/' + file_name + '.prompt', 'r') as f:
    print(f.read())

This music is cut into 5 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: This is a high-energy electronic track with a strong emphasis on drums and bass. The atmosphere is intense and urgent, with a sense of danger and excitement. The music is perfect for action scenes, video games, and sports events. The track features a distorted, glitchy synth lead that gives the music a futuristic and edgy feel. Overall, this is a dynamic and exciting piece of music that will get your heart racing.
Description piece 2: This is a high-energy electronic track with a strong emphasis on drums and percussion. The tempo is fast and the music is intense and urgent. The track is suitable for action scenes, high-energy video games, and sports videos. The music is also suitable for use in a club or dance environment. The instruments used in the track include synthesizers, electric guitars, and drums. The overall soun

## Process

In [26]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/ --num_char {num_char} --num_non_char {num_non_char}')

['Kerberos']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
Token spent: 8680


0

In [27]:
for file_name in input_list:
  for t in range(num_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt' + str(t), 'r') as f:
      print(f.read())
  for t in range(num_non_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt_nc' + str(t), 'rb') as f:
      print(f.read())

ferocious guardian dog, Greek mythology inspiration, three heads, dark and intense atmosphere, red and orange hues, glowing eyes, electronic circuitry, drum beats visualized, fast-paced motion blur, 8k resolution, 16:9 aspect ratio, 60fps.
b'dark futuristic cityscape, intense red and blue neon lights, glitchy digital patterns, strong geometric shapes, deep bass vibrations, edgy synthesizer waves, no human presence, dynamic motion, fast-paced, urban atmosphere, Kerberos reference with abstract three-headed guard dog silhouette, high-energy conducive to action, 8k resolution, 16:9 aspect ratio, 60fps'


## Generate

In [28]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num 3 --num_char {num_char} --num_non_char {num_non_char}')

Loading prompt from file
Kerberos.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00,  3.66it/s]


Model loaded
Generating for Kerberos.prompt


100%|██████████| 50/50 [00:34<00:00,  1.45it/s]


Generated for Kerberos.prompt
Generating image without characters
Loading prompt from file
Prompt loaded
Generating for Kerberos.prompt_nc


Token indices sequence length is longer than the specified maximum sequence length for this model (80 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (80 > 77). Running this sequence through the model will result in indexing errors
100%|██████████| 50/50 [00:33<00:00,  1.48it/s]


Generated for Kerberos.prompt_nc


0

# Style transfer

In [29]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o')

  warn(
Traceback (most recent call last):
  File "/root/LLM_project/codes/style_transfer/style_transfer.py", line 10, in <module>
    import torchvision.models as models
  File "/opt/conda/lib/python3.10/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/opt/conda/lib/python3.10/site-packages/torchvision/models/__init__.py", line 2, in <module>
    from .convnext import *
  File "/opt/conda/lib/python3.10/site-packages/torchvision/models/convnext.py", line 8, in <module>
    from ..ops.misc import Conv2dNormActivation, Permute
  File "/opt/conda/lib/python3.10/site-packages/torchvision/ops/__init__.py", line 23, in <module>
    from .poolers import MultiScaleRoIAlign
  File "/opt/conda/lib/python3.10/site-packages/torchvision/ops/poolers.py", line 10, in <module>
    from .roi_align import roi_align
  File "/opt/conda/lib/python3.10/site-packages/torchvision/ops/roi_align.py

256