We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

In [22]:
input_list = [
  'HyuN - Infinity Heaven.wav',
  'NceS - Burn.mp3'
]
prompts = [r'''
  The name of this song is Infinity Heaven. I want an angel in white standing in the middle of a field of flowers. I want it to be anime.
''',
r'''
  The name of this song is Burn. I want a little girl standing in the middle of the night. I want it to be anime.
'''
]
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)
tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
  os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [10]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 4')

HyuN - Infinity Heaven.wav
NceS - Burn.mp3
['HyuN - Infinity Heaven.wav', 'NceS - Burn.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.64it/s]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.63it/s]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.61it/s]
The model is automatic

using device 0
using device 1
using device 2
using device 3
using device 0
using device 1
successfully add prompt for HyuN - Infinity Heaven.wav
using device 2
using device 3
using device 0
using device 1
using device 2
using device 3
successfully add prompt for NceS - Burn.wav
successfully write prompt for HyuN - Infinity Heaven.wav
successfully write prompt for NceS - Burn.wav


0

In [11]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/extract/' + file_name + '.prompt', 'r') as f:
    print(f.read())

This music is cut into 6 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A fast-paced, energetic drum and bass track with a powerful piano lead and a driving bassline. Perfect for action, sports, and energetic videos.
Description piece 2: This is a fast-paced electronic track with a strong emphasis on drums and percussion. The synthesizers and keyboard create a sense of urgency and excitement. The track is ideal for action scenes, fast-paced video games, and high-energy videos.
Description piece 3: This is a fast-paced electronic dance music piece that features synthesizers, electric guitars, and a strong beat. The music is energetic and upbeat, making it perfect for use in action scenes or fast-paced video games. The instruments are played with precision and skill, creating a sense of excitement and urgency. The overall sound is intense and powerful, making it ideal for use in scenes that requ

## Process

In [23]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/')

['HyuN - Infinity Heaven', 'NceS - Burn']
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
Token spent: 4041


0

In [24]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt', 'r') as f:
    print(f.read())

angel in white, field of flowers, anime style, intense energy, 8k resolution, 16:9 aspect ratio, 60fps, dynamic sky, powerful aura, floral breeze, emotional depth
little girl, night, anime style, fiery hair, determined expression, stars in the sky, 8k resolution, 16:9 aspect ratio, vibrant colors, secret whispers, energetic aura.


## Generate

In [25]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --image_num 3')

Loading prompt from file
HyuN - Infinity Heaven.prompt
NceS - Burn.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  7.54it/s]


Model loaded
Generating for HyuN - Infinity Heaven.prompt


100%|██████████| 50/50 [00:16<00:00,  2.95it/s]


Generated for HyuN - Infinity Heaven.prompt
Generating for NceS - Burn.prompt


100%|██████████| 50/50 [00:16<00:00,  2.97it/s]


Generated for NceS - Burn.prompt


0