We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

In [3]:
input_list = [
  'cryout.mp3',
  'opia.mp3'
]
prompts = [r'''
  The name of this song is 'cryout'. 
''',
r'''
  The name of this song is 'opia'. 
'''
# You should check both input_list and prompts modified!!!
]
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)
tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
  os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [4]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 2')

cryout.mp3
opia.mp3
['cryout.wav', 'opia.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:04<00:00,  2.23it/s]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.28it/s]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:04<00:00,  2.23it/s]
The model is automatic

using device 0
using device 1
using device 0
successfully add prompt for cryout.wav
using device 1
using device 2
using device 3
using device 0
using device 1
using device 2
using device 3
using device 0
using device 1
using device 2
using device 3
using device 0
successfully add prompt for opia.wav
successfully write prompt for cryout.wav
successfully write prompt for opia.wav


0

In [5]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/extract/' + file_name + '.prompt', 'r') as f:
    print(f.read())

This music is cut into 5 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A cinematic track with a mix of suspense, hope, and optimism. It has a powerful and emotional feel to it, with a strong beat and a catchy melody. It is perfect for film, TV, and video games.
Description piece 2: A cinematic track with piano, drums, and electronic elements. Perfect for vlogs, travel videos, and commercials.
Description piece 3: This is a dynamic, intense, and upbeat electronic track featuring drums, bass, and various synths. The music is energetic, powerful, and exciting, making it perfect for action scenes, trailers, and commercials.
Description piece 4: This is a high energy electronic track that is perfect for any type of action or sports video. The track is very fast paced and features a lot of synthesizer and drum machine sounds. The track is very dynamic and features a lot of changes in tempo and styl

## Process

In [6]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/')

['cryout', 'opia']
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
Token spent: 9720


0

In [7]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt', 'r') as f:
    print(f.read())

lonely figure, stormy sky, calling out, powerful pose, dynamically charged, epic landscape, hopeful light, 8k resolution, 16:9 aspect ratio, 60fps
sleek futuristic cityscape, intense glowing lights, abstract digital art, bold colors, aggressive stance, strings in motion, 8k resolution, 16:9 aspect ratio, 60fps.


## Generate

In [8]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num 3')

Loading prompt from file
cryout.prompt
opia.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  7.45it/s]


Model loaded
Generating for cryout.prompt


100%|██████████| 50/50 [00:17<00:00,  2.94it/s]


Generated for cryout.prompt
Generating for opia.prompt


100%|██████████| 50/50 [00:16<00:00,  2.96it/s]


Generated for opia.prompt
Generating image without characters
Loading prompt from file
Prompt loaded
Generating for cryout.prompt2


100%|██████████| 50/50 [00:16<00:00,  2.95it/s]


Generated for cryout.prompt2
Generating for opia.prompt2


100%|██████████| 50/50 [00:17<00:00,  2.93it/s]


Generated for opia.prompt2


0