We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [3]:
input_list = [
  'FULi AUTO SHOOTER.mp3',
]
prompts = [r'''
  The name of this song is 'FULi AUTO SHOOTER'. 
''',
]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [4]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 2')

FULi AUTO SHOOTER.mp3
['FULi AUTO SHOOTER.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.61it/s]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.60it/s]


using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
successfully add prompt for FULi AUTO SHOOTER.wav
successfully write prompt for FULi AUTO SHOOTER.wav


0

In [5]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/extract/' + file_name + '.prompt', 'r') as f:
    print(f.read())

This music is cut into 8 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A fast-paced, energetic electronic track with a strong bassline, fast-paced drums, and a catchy melody. It is perfect for action scenes, extreme sports, and fast-paced video games.
Description piece 2: This is a high-energy dubstep track featuring distorted basslines, pounding drums, and powerful lead synths. The mood is intense and energetic, with a strong focus on the beat and the bass. The track is ideal for action scenes, sports montages, and high-energy media.
Description piece 3: This is a dubstep track with a strong bassline and powerful drums. The lead synthesizer has a distorted, robotic sound, and the overall atmosphere is energetic and intense. This track would be well-suited for action scenes in a video game or movie.
Description piece 4: This is a dubstep track with a strong beat and electronic elements. The s

## Process

In [6]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/')

['FULi AUTO SHOOTER']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
Loading model
Model loaded
Token spent: 4721


0

In [7]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt', 'r') as f:
    print(f.read())

dynamic cyberpunk cityscape, neon lights, futuristic vehicles, male protagonist in focus, intense expression, wearing high-tech gear, red and black color scheme, abstract patterns, strong shadows, glowing elements, sense of motion, urban chaos, 8k resolution, 16:9 aspect ratio, 60fps.


## Generate

In [8]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num 3')

Loading prompt from file
FULi AUTO SHOOTER.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00,  6.59it/s]


Model loaded
Generating for FULi AUTO SHOOTER.prompt


100%|██████████| 50/50 [00:16<00:00,  2.95it/s]


Generated for FULi AUTO SHOOTER.prompt
Generating image without characters
Loading prompt from file
Prompt loaded
Generating for FULi AUTO SHOOTER.prompt2


Token indices sequence length is longer than the specified maximum sequence length for this model (84 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (84 > 77). Running this sequence through the model will result in indexing errors
100%|██████████| 50/50 [00:16<00:00,  2.95it/s]


Generated for FULi AUTO SHOOTER.prompt2


0

# Style transfer

In [13]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o')

Transferring from /root/LLM_project/codes/data/.tmp/generate/FULi AUTO SHOOTER/0.png to /root/LLM_project/codes/data/style/starry_night.jpeg
Building the style transfer model..


Style Loss : 0.127708 Content Loss: 0.615244: 100%|██████████| 300/300 [00:33<00:00,  9.02it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/FULi AUTO SHOOTER0.png to /root/LLM_project/codes/data/style/starry_night.jpeg done
Transferring from /root/LLM_project/codes/data/.tmp/generate/FULi AUTO SHOOTER/10.png to /root/LLM_project/codes/data/style/starry_night.jpeg
Building the style transfer model..


Style Loss : 0.193818 Content Loss: 1.617119: 100%|██████████| 300/300 [00:33<00:00,  8.94it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/FULi AUTO SHOOTER10.png to /root/LLM_project/codes/data/style/starry_night.jpeg done


0