We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [20]:
import os

In [21]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/illustration_style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [22]:
input_list = [
  'Distorted Fate.mp3',
]
prompts = [r'''

''',
]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
num_char = 2 # default 1
num_non_char = 2 # default 1
image_num = 1 
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [25]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 2')

Distorted Fate.mp3
['Distorted Fate.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:45<00:00,  5.09s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:46<00:00,  5.15s/it]


using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
successfully add prompt for Distorted Fate.wav
This music is cut into 7 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A powerful, energetic, intense, aggressive, powerful dubstep track with strong beats, powerful bass, extreme synthesizers, aggressive sound effects, and other powerful elements.
Description piece 2: This is a high-energy electronic track that is perfect for any type of action scene or high-speed chase. The beat is intense and driving, with heavy basslines and distorted synths that create a sense of urgency and excitement. The track also features a variety of sound effects, such as beeping and screeching, that add to the sense of intensity and danger. Overall, this track is perfect for adding a sense of excitement and energy to any project.
Description piece 3: The genre of 

0

## Process

In [24]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/ --num_char {num_char} --num_non_char {num_non_char}')

['Distorted Fate']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
16
7
3
12
Token spent: 32871


0

In [None]:
for file_name in input_list:
  for t in range(num_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt' + str(t), 'r') as f:
      print(f.read())
  for t in range(num_non_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt_nc' + str(t), 'rb') as f:
      print(f.read())

dynamic solo figure, intense expression, starry eyes, confident posture, night scene, urban backdrop, glowing city lights, tightfitting outfit, makeup, winning attitude, queen of the night, end of the line, dreamlike atmosphere, moonlight, searching for answers, sunlight through trees, stars in the sky, vivid colors, neon accents, 8k resolution, 16:9 aspect ratio, 60fps.
dynamic solo figure, confident posture, stars adorned on clothing, intense gaze, nightlife setting, make-up, winner's mindset, abstract skin texture, dark urban background with moonlight, tight clothing, edgy aesthetics, shining sunlight breaking through, tree silhouettes, starry sky, sense of determination, aggressive energy, 8k resolution, 16:9 aspect ratio, 60fps.
b'dynamic solo figure, confident posture, starry attire, vivid makeup, intense gaze, nightclub setting, neon lights, abstract patterns, dark background with contrasting highlights, sense of power and aggression, distorted shapes and textures, heavy bass vi

## Generate

In [None]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num {image_num} --num_char {num_char} --num_non_char {num_non_char}')

Loading prompt from file
Distorted Fate.prompt
Chronostasis.prompt
Disorder.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:02<00:00,  3.46it/s]


Model loaded
Generating for Distorted Fate.prompt


Token indices sequence length is longer than the specified maximum sequence length for this model (84 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (84 > 77). Running this sequence through the model will result in indexing errors
  return F.conv2d(input, weight, bias, self.stride,
100%|██████████| 50/50 [00:06<00:00,  7.80it/s]
100%|██████████| 50/50 [00:06<00:00,  8.33it/s]


Generated for Distorted Fate.prompt
Generating for Chronostasis.prompt


100%|██████████| 50/50 [00:06<00:00,  8.31it/s]
100%|██████████| 50/50 [00:06<00:00,  8.33it/s]


Generated for Chronostasis.prompt
Generating for Disorder.prompt


100%|██████████| 50/50 [00:06<00:00,  8.30it/s]
100%|██████████| 50/50 [00:06<00:00,  8.30it/s]


Generated for Disorder.prompt
Loading prompt from file
Generating image without characters
Prompt loaded
Generating for Distorted Fate.prompt_nc


100%|██████████| 50/50 [00:06<00:00,  8.29it/s]
100%|██████████| 50/50 [00:06<00:00,  8.28it/s]


Generated for Distorted Fate.prompt_nc
Generating for Chronostasis.prompt_nc


100%|██████████| 50/50 [00:06<00:00,  8.28it/s]
100%|██████████| 50/50 [00:06<00:00,  8.26it/s]


Generated for Chronostasis.prompt_nc
Generating for Disorder.prompt_nc


100%|██████████| 50/50 [00:06<00:00,  8.22it/s]
100%|██████████| 50/50 [00:06<00:00,  8.28it/s]


Generated for Disorder.prompt_nc


0

# Style transfer

In [None]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o --num_char {num_char} --num_non_char {num_non_char}')

Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library.
	Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.


256

# Final Results

In [None]:
import matplotlib.pyplot as plt
from PIL import Image

result = {}
for music in input_list:
  result[music] = os.listdir(DATA_PATH + '.tmp/style_transfer/'+music)

for (music, pics) in zip(result.keys(), result.values()):
  print(music)
  for pic in pics:
    if pic.endswith('.png'):
      image = Image.open((DATA_PATH + '.tmp/style_transfer/'+music+'/'+pic))
      plt.imshow(image)
      plt.axis('off')
      plt.show()