We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [1]:
import os

In [2]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/illustration_style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [3]:
input_list = [
  'cryout.mp3',
]
prompts = [r'''
  The name of this song is "cryout". 
''',
]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
num_char = 2 # default 1
num_non_char = 2 # default 1
image_num = 1 
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [4]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 4')

cryout.mp3
['cryout.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:41<00:00,  4.64s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:40<00:00,  4.49s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:40<00:00,  4.47s/it]
The model is automatic

using device 0
using device 1
using device 2
using device 3
using device 0
successfully add prompt for cryout.wav
This music is cut into 5 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A cinematic track with a mix of suspense, hope, and optimism.
Description piece 2: A cinematic track with piano, drums, and electronic elements. Perfect for vlogs, commercials, and more.
Description piece 3: This is a dynamic, intense, and upbeat electronic track featuring drums, bass, and various synths. The music is energetic, powerful, and exciting, making it perfect for action scenes, trailers, and commercials. The track has a strong beat and a catchy melody that will stick in your head. The overall sound is modern and fresh, making it ideal for contemporary projects.
Description piece 4: This is a high energy electronic track that is perfect for any type of action or sports video. The track features a fast

0

## Process

In [5]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/ --num_char {num_char} --num_non_char {num_non_char}')

['cryout']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
Token spent: 17002


0

In [6]:
for file_name in input_list:
  for t in range(num_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt' + str(t), 'r') as f:
      print(f.read())
  for t in range(num_non_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt_nc' + str(t), 'rb') as f:
      print(f.read())

solo figure, looking upwards, expressive face, tears streaming down cheeks, rain falling, city streets, emotional turmoil, mix of hope and despair, cinematic atmosphere, bright light in distance,象征性光明, modern fashion, urban setting, blue and silver color scheme, fast-paced motion blur, energetic feel, drum and bass rhythm, synths in background, powerful imagery, 8k resolution, 16:9 aspect ratio, 60fps.
lonely figure, looking out over a cityscape, rain falling, somber expression, hoodie or jacket, wet streets reflecting city lights, sense of contemplation, mix of electronic and organic elements in background, abstract shapes representing hope and suspense, vibrant colors contrasting with the dark mood, neon lights, fast-paced motion blur, intense atmosphere, 8k resolution, 16:9 aspect ratio, 60fps.
b'ethereal sky with floating clouds, a glimpse of sunlight breaking through, cityscape in the distance with rain falling, abstract representation of emotions with colorful streams of light, r

## Generate

In [7]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num {image_num} --num_char {num_char} --num_non_char {num_non_char}')

Loading prompt from file
cryout.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:02<00:00,  2.89it/s]


Model loaded
Generating for cryout.prompt


Token indices sequence length is longer than the specified maximum sequence length for this model (102 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (102 > 77). Running this sequence through the model will result in indexing errors
  return F.conv2d(input, weight, bias, self.stride,
100%|██████████| 50/50 [00:06<00:00,  7.71it/s]
100%|██████████| 50/50 [00:06<00:00,  8.32it/s]


Generated for cryout.prompt
Loading prompt from file
Generating image without characters
Prompt loaded
Generating for cryout.prompt_nc


100%|██████████| 50/50 [00:06<00:00,  8.28it/s]
100%|██████████| 50/50 [00:06<00:00,  8.27it/s]


Generated for cryout.prompt_nc


0

# Style transfer

In [10]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o --num_char {num_char} --num_non_char {num_non_char}')

  warn(


['0-0.png', '1-0.png', 'nc0-0.png', 'nc1-0.png']
content: cryout/0-0.png
style: 10.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/cryout/0-0.png to /root/LLM_project/codes/data/style/illustration_style/10.png
Building the style transfer model..


Style Loss : 0.829802 Content Loss: 2.775409:  80%|████████  | 40/50 [00:04<00:01,  9.19it/s] 


Transfer from /root/LLM_project/codes/data/.tmp/generate/cryout/0-0.png to /root/LLM_project/codes/data/style/illustration_style/10.png done
content: cryout/1-0.png
style: 10.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/cryout/1-0.png to /root/LLM_project/codes/data/style/illustration_style/10.png
Building the style transfer model..


Style Loss : 0.351353 Content Loss: 1.966159:  80%|████████  | 40/50 [00:04<00:01,  9.47it/s]


Transfer from /root/LLM_project/codes/data/.tmp/generate/cryout/1-0.png to /root/LLM_project/codes/data/style/illustration_style/10.png done
content: cryout/nc0-0.png
style: 10.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/cryout/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/10.png
Building the style transfer model..


Style Loss : 0.699911 Content Loss: 2.372981:  80%|████████  | 40/50 [00:04<00:01,  9.47it/s] 


Transfer from /root/LLM_project/codes/data/.tmp/generate/cryout/nc0-0.png to /root/LLM_project/codes/data/style/illustration_style/10.png done
content: cryout/nc1-0.png
style: 10.png
Transferring from /root/LLM_project/codes/data/.tmp/generate/cryout/nc1-0.png to /root/LLM_project/codes/data/style/illustration_style/10.png
Building the style transfer model..


Style Loss : 0.850943 Content Loss: 3.052303:  80%|████████  | 40/50 [00:04<00:01,  9.47it/s] 


Transfer from /root/LLM_project/codes/data/.tmp/generate/cryout/nc1-0.png to /root/LLM_project/codes/data/style/illustration_style/10.png done


0