We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

In [6]:
input_list = [
  '茶鸣拾贰律 - Feast远东之宴.mp3',
]
prompts = [r'''
  The name of the song is "远东之宴" by 茶鸣拾贰律. I want to create a Chinese-style painting based on this song.
'''
]
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)
tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
  os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [8]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 2')

茶鸣拾贰律 - Feast远东之宴.mp3
['茶鸣拾贰律 - Feast远东之宴.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.38it/s]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:03<00:00,  2.41it/s]


using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
using device 0
using device 1
successfully add prompt for 茶鸣拾贰律 - Feast远东之宴.wav
successfully write prompt for 茶鸣拾贰律 - Feast远东之宴.wav


0

In [10]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/extract/' + file_name + '.prompt', 'r') as f:
    print(f.read())

This music is cut into 8 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: This is a song whose genre is Pop, and the lyrics are "求生之命 适合作为 破坏者 邀请一坛酒请 我保护你 你系谁的眷属 请风流一流 永远不息的血泪 湖光山色 情义江湖".
Description piece 2: The genre of the music is electronic. The tempo is fast with a strong beat. The music is upbeat and energetic. The lyrics are in Chinese. The lyrics are about a person who is deeply in love with someone and is willing to do anything for them. The music is perfect for a dance party or a club.
Description piece 3: This is a high-energy electronic dance music piece. It features a catchy melody, a strong beat, and synthesizer arrangements. The overall emotion of the piece is energetic and upbeat. It would be suitable for use in a variety of settings, including workout videos, dance clubs, and video games. The gender of the piece is difficult to determine as it does not contain any lyrics.
Descr

## Process

In [11]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/')

['茶鸣拾贰律 - Feast远东之宴']
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
Token spent: 2182


0

In [12]:
for file_name in input_list:
  with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt', 'r') as f:
    print(f.read())

Chinese-style painting, elegant garden, traditional architecture, fast-paced dancers, passionate expressions, pop of color from lanterns, synthesizer lights, 8k resolution, 16:9 aspect ratio, static image.


## Generate

In [9]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --image_num 3')

Loading prompt from file
茶鸣拾贰律 - Feast远东之宴.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  7.30it/s]


Model loaded
Generating for 茶鸣拾贰律 - Feast远东之宴.prompt


Token indices sequence length is longer than the specified maximum sequence length for this model (89 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (89 > 77). Running this sequence through the model will result in indexing errors
100%|██████████| 50/50 [00:16<00:00,  2.96it/s]


Generated for 茶鸣拾贰律 - Feast远东之宴.prompt


0