We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/illustration_style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [3]:
input_list = [
  'Kerberos.mp3',
]
prompts = [r'''
  The name of this song is "Kerberos". 
''',
]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
num_char = 1 # default
num_non_char = 1 # default
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [4]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 4')

Kerberos.mp3
['Kerberos.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:45<00:00,  5.04s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:45<00:00,  5.07s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:47<00:00,  5.23s/it]
The model is automatic

using device 0
using device 1
using device 2
using device 3
using device 0
successfully add prompt for Kerberos.wav
This music is cut into 5 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: This is a high-energy electronic track with a strong emphasis on drums and percussion. The tempo is fast and relentless, with an insistent beat that drives the music forward. The instruments are heavily processed and distorted, giving the track a raw and edgy sound. The overall mood is intense and aggressive, with a sense of urgency and intensity. This track would be well-suited for action scenes or high-energy sports footage.
Description piece 2: This is a high-energy electronic track with a strong emphasis on drums and percussion. The tempo is fast and the music is intense and urgent. The track is suitable for action scenes, high-energy video games, and sports videos. The music is also suitable for use in a

0

In [5]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/extract/' + file_name + '.prompt', 'r') as f:
    print(f.read())

This music is cut into 5 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: This is a high-energy electronic track with a strong emphasis on drums and percussion. The tempo is fast and relentless, with an insistent beat that drives the music forward. The instruments are heavily processed and distorted, giving the track a raw and edgy sound. The overall mood is intense and aggressive, with a sense of urgency and intensity. This track would be well-suited for action scenes or high-energy sports footage.
Description piece 2: This is a high-energy electronic track with a strong emphasis on drums and percussion. The tempo is fast and the music is intense and urgent. The track is suitable for action scenes, high-energy video games, and sports videos. The music is also suitable for use in a club or dance environment. The instruments used in the track include synthesizers, electric guitars, and drums. The

## Process

In [6]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/ --num_char {num_char} --num_non_char {num_non_char}')

['Kerberos']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
Token spent: 8856


0

In [7]:
for file_name in input_list:
  for t in range(num_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt' + str(t), 'r') as f:
      print(f.read())
  for t in range(num_non_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt_nc' + str(t), 'rb') as f:
      print(f.read())

dark urban landscape, glowing neon lights, lone figure in silhouette, hoodie, intense gaze, Kerberos reference, mythical creature, three heads, guarding entrance, fast-paced motion blur, edgy electronics, abstract shapes, aggressive styling, red and black color scheme, high contrast, 8k resolution, 16:9 aspect ratio, 60fps.
b'dark futuristic cityscape, glowing neon lights, intense reds and blues, abstract patterns resembling digital code, strong geometric shapes, dynamic motion blur, deep shadows, edgy electronic textures, fast-paced rhythm visible in light trails, sense of power and energy, intense atmosphere, no characters, 8k resolution, 16:9 aspect ratio, 60fps'


## Generate

In [11]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num 3 --num_char {num_char} --num_non_char {num_non_char}')

Loading prompt from file
Kerberos.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00,  6.59it/s]


Model loaded
Generating for Kerberos.prompt


  return F.conv2d(input, weight, bias, self.stride,
100%|██████████| 50/50 [00:16<00:00,  3.02it/s]


Generated for Kerberos.prompt
Generating image without characters
Loading prompt from file
Prompt loaded
Generating for Kerberos.prompt_nc


100%|██████████| 50/50 [00:16<00:00,  3.07it/s]


Generated for Kerberos.prompt_nc


0

# Style transfer

In [24]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o --num_char {num_char} --num_non_char {num_non_char}')

  warn(


['0-0.png', '0-1.png', '0-2.png', 'nc0-0.png', 'nc0-1.png', 'nc0-2.png']
style_file_name: {'Kerberos': ['19.png', '5.png']}
map: {'Kerberos': {'0-0.png': '19.png', '0-1.png': '19.png', '0-2.png': '19.png', 'nc0-0.png': '5.png', 'nc0-1.png': '5.png', 'nc0-2.png': '5.png'}}
content: Kerberos/0-0.png
style: 19.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Kerberos/0-0.png to /root/LLM_project/codes/data/style/illustration_style/19.png
Building the style transfer model..


Style Loss : 0.049068 Content Loss: 0.270643:  80%|████████  | 40/50 [00:04<00:01,  9.35it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/Kerberos/0-0.png to /root/LLM_project/codes/data/style/illustration_style/19.png done
content: Kerberos/0-1.png
style: 19.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Kerberos/0-1.png to /root/LLM_project/codes/data/style/illustration_style/19.png
Building the style transfer model..


Style Loss : 0.066875 Content Loss: 0.320130:  80%|████████  | 40/50 [00:04<00:01,  9.48it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/Kerberos/0-1.png to /root/LLM_project/codes/data/style/illustration_style/19.png done
content: Kerberos/0-2.png
style: 19.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Kerberos/0-2.png to /root/LLM_project/codes/data/style/illustration_style/19.png
Building the style transfer model..


Style Loss : 0.052950 Content Loss: 0.255260:  80%|████████  | 40/50 [00:04<00:01,  9.48it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/Kerberos/0-2.png to /root/LLM_project/codes/data/style/illustration_style/19.png done
content: Kerberos/nc0-0.png
style: 5.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Kerberos/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/5.png
Building the style transfer model..


Style Loss : 0.418165 Content Loss: 1.649557:  80%|████████  | 40/50 [00:04<00:01,  9.48it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/Kerberos/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/5.png done
content: Kerberos/nc0-1.png
style: 5.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Kerberos/nc0-1.png to /root/LLM_project/codes/data/style/illustration_style/5.png
Building the style transfer model..


Style Loss : 0.350551 Content Loss: 1.469282:  80%|████████  | 40/50 [00:04<00:01,  9.47it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/Kerberos/nc0-1.png to /root/LLM_project/codes/data/style/illustration_style/5.png done
content: Kerberos/nc0-2.png
style: 5.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Kerberos/nc0-2.png to /root/LLM_project/codes/data/style/illustration_style/5.png
Building the style transfer model..


Style Loss : 0.363506 Content Loss: 1.482159:  80%|████████  | 40/50 [00:04<00:01,  9.46it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/Kerberos/nc0-2.png to /root/LLM_project/codes/data/style/illustration_style/5.png done


0