We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/illustration_style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [3]:
input_list = [
  'Distorted Fate.mp3',
]
prompts = [r'''

''',
]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
num_char = 2 # default 1
num_non_char = 2 # default 1
image_num = 1 
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [4]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 2 --ignore_lyrics True')

Distorted Fate.mp3
['Distorted Fate.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:16<00:00,  1.84s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:21<00:00,  2.37s/it]


using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
successfully add prompt for Distorted Fate.wav
This music is cut into 7 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A powerful, energetic, intense, aggressive, powerful dubstep track with strong beats, powerful bass, extreme synthesizers, aggressive sound effects, and other powerful elements.
Description piece 2: The fast-paced electro song features distorted synthesizers, punchy drums, and loud sirens. It sounds energetic, intense, and powerful.
Description piece 3: The electronic track features a strong beat, punchy kick, claps, shimmering hi hats, powerful bass, glitchy effects, shimmering arpeggios, repetitive melodies, and vocal samples. It sounds energetic, intense, powerful, and futuristic.
Description piece 4: This is a track with a powerful and intense atmosphere. The distorted 

0

## Process

In [5]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/ --num_char {num_char} --num_non_char {num_non_char}')

['Distorted Fate']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
7
7
7
16
Token spent: 30414


0

In [6]:
for file_name in input_list:
  for t in range(num_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt' + str(t), 'r') as f:
      print(f.read())
  for t in range(num_non_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt_nc' + str(t), 'rb') as f:
      print(f.read())

intense action scene, urban setting, dynamic lighting, figures in motion, agility, strength, inline with music's aggression, edgy fashion, red and black color theme, industrial elements, cityscape background, fiery glow, abstract shapes, sharp angles, deep bass visualized as ripples or shockwaves, 8k resolution, 16:9 aspect ratio, 60fps, sense of adrenaline and power.
intense action scene, urban setting, dynamic lighting, silhouette of a lone figure, focus on intense expression, post-apocalyptic fashion, cyberpunk elements, glowing cybernetic implants, aggressive stance, background with contrasting neon lights, dark alleys, sense of urgency, deep bass vibrations, distorted guitar riffs, synthesizer waves, edgy atmosphere, logo reveal, powerful impact, 8k resolution, 16:9 aspect ratio, 60fps.
b'dynamic abstract composition, intense reds and blacks with electric blue highlights, sharp geometric shapes, sense of impact and motion, deep bass vibrations represented by wavy lines or visual s

## Generate

In [7]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num {image_num} --num_char {num_char} --num_non_char {num_non_char}')

Loading prompt from file
Distorted Fate.prompt
Prompt loaded
Loading model


Loading pipeline components...:  71%|███████▏  | 5/7 [00:00<00:00,  5.74it/s]

Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00,  4.91it/s]


Model loaded
Generating for Distorted Fate.prompt


Token indices sequence length is longer than the specified maximum sequence length for this model (82 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (82 > 77). Running this sequence through the model will result in indexing errors
  return F.conv2d(input, weight, bias, self.stride,
100%|██████████| 50/50 [00:06<00:00,  7.84it/s]
100%|██████████| 50/50 [00:06<00:00,  8.32it/s]


Generated for Distorted Fate.prompt
Loading prompt from file
Generating image without characters
Prompt loaded
Generating for Distorted Fate.prompt_nc


100%|██████████| 50/50 [00:05<00:00,  8.34it/s]
100%|██████████| 50/50 [00:06<00:00,  8.32it/s]


Generated for Distorted Fate.prompt_nc


0

# Style transfer

In [8]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o --num_char {num_char} --num_non_char {num_non_char}')

  warn(


['0-0.png', '1-0.png', 'nc0-0.png', 'nc1-0.png']
content: Distorted Fate/0-0.png
style: 7.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/0-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png
Building the style transfer model..


Style Loss : 0.582267 Content Loss: 1.583437:  80%|████████  | 40/50 [00:04<00:01,  9.17it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/0-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png done
content: Distorted Fate/1-0.png
style: 7.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/1-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png
Building the style transfer model..


Style Loss : 0.790870 Content Loss: 2.323792:  80%|████████  | 40/50 [00:04<00:01,  9.44it/s] 


Transfer from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/1-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png done
content: Distorted Fate/nc0-0.png
style: 7.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png
Building the style transfer model..


Style Loss : 1.046949 Content Loss: 2.381912:  80%|████████  | 40/50 [00:04<00:01,  9.44it/s] 


Transfer from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png done
content: Distorted Fate/nc1-0.png
style: 7.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/nc1-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png
Building the style transfer model..


Style Loss : 0.823237 Content Loss: 2.060661:  80%|████████  | 40/50 [00:04<00:01,  9.44it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/Distorted Fate/nc1-0.png to /root/LLM_project/codes/data/style/illustration_style/7.png done


0

# Final Results

In [None]:
import matplotlib.pyplot as plt
from PIL import Image

result = {}
for music in input_list:
  result[music] = os.listdir(DATA_PATH + '.tmp/style_transfer/'+music)

for (music, pics) in zip(result.keys(), result.values()):
  print(music)
  for pic in pics:
    if pic.endswith('.png'):
      image = Image.open((DATA_PATH + '.tmp/style_transfer/'+music+'/'+pic))
      plt.imshow(image)
      plt.axis('off')
      plt.show()