We should call `.py` so that cuda memory will be automatically released after each part.

This notebook is used to organize the codes

## Setup

### What you need to do

- Throw audio files into `data/` directory

- Correctly set `input_list`

- Create a `.env` file in the `process/` directory if using `glm-4`

In [11]:
import os

In [12]:
DATA_PATH = os.getcwd() + '/data/'
MODEL_PATH = '/ssdshare/LLMs/'
MUSIC_PATH = os.getcwd() + '/data/music/'
LLM_MODEL = "glm-4"
GENRATE_MODEL = "playground-v2.5-1024px-aesthetic"
CONTENT_PATH = DATA_PATH + '.tmp/generate/'
STYLE_PATH = DATA_PATH + 'style/illustration_style/'

if not os.path.exists(DATA_PATH + '.tmp/'):
  os.makedirs(DATA_PATH + '.tmp/')

list = ['extract/', 'generate/', 'process/', 'inprompt', 'style_transfer']

for folder in list:
  if not os.path.exists(DATA_PATH + '.tmp/' + folder):
    os.makedirs(DATA_PATH + '.tmp/' + folder)


In [13]:
input_list = [
  'Protoflicker.mp3',
  'Nhelv.mp3',
]
prompts = [r'''
  The name of this song is "Protoflicker". 
''',
r'''
The name of this song is "Nhelv".
'''
]
# Pick the style images in the style library
style_list = [
  # 'opia.png'
]
num_char = 1 # default 1
num_non_char = 1 # default 1
image_num = 2 
# You should check both input_list and prompts modified!!!
with open(DATA_PATH + 'input_list.txt', 'w') as f:
  for item in input_list:
    f.write("%s\n" % item)

with open(DATA_PATH + 'style_list.txt', 'w') as f:
  for item in style_list:
    f.write("%s\n" % item)

tmp_list = []
for item in input_list:
  tmp_list.append(item[:-4])
input_list = tmp_list

# if not os.path.exists(DATA_PATH + '.tmp/inprompt/'):
#   os.makedirs(DATA_PATH + '.tmp/inprompt/')
for (prompt, name) in zip(prompts, input_list):
  with open(DATA_PATH + '.tmp/inprompt/' + name + '.prompt', 'w') as f:
    f.write(prompt)

## Extract

In [14]:
os.system(f'python extract/extract.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --music_path {MUSIC_PATH} --output_path {DATA_PATH}.tmp/extract/ --device_num 4')

Protoflicker.mp3
Nhelv.mp3
['Protoflicker.wav', 'Nhelv.wav']
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.


The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:13<00:00,  1.46s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:17<00:00,  1.90s/it]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|██████████| 9/9 [00:32<00:00,  3.63s/it]
The model is automatic

using device 0
No second element found in split lyrics NOLYRICS:
using device 1
using device 2
using device 3
using device 0
using device 1
using device 2
successfully add prompt for Protoflicker.wav
using device 3
using device 0
using device 1
using device 2
using device 3
using device 0
using device 1
successfully add prompt for Nhelv.wav
This music is cut into 7 pieces. Each piece has a length of 30 seconds and an overlap of 5 seconds. The description of each piece is as follows:
Description piece 1: A powerful, energetic, motivational, uplifting, inspiring, powerful, motivational, energetic, uplifting, inspirational, dynamic, fresh, energetic, energetic, dynamic, fresh, uplifting, inspiring, motivational, powerful, dynamic, uplifting, fresh, dynamic, fresh, dynamic, energetic, uplifting, inspiring, dynamic, fresh, energetic, dynamic, fresh, uplifting, inspiring, motivational, powerful, dynamic, uplifting, fresh, dynamic, fresh, dynamic, energetic, uplifting, inspiring, motivational

0

## Process

In [16]:
os.system(f'python process/process.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {LLM_MODEL} --prompt_path {DATA_PATH}.tmp/extract/ --output_path {DATA_PATH}.tmp/process/ --num_char {num_char} --num_non_char {num_non_char}')

['Protoflicker', 'Nhelv']
Loading model
Model loaded
<class 'zhipuai._client.ZhipuAI'> <class 'NoneType'>
Token spent: 43302


0

In [17]:
for file_name in input_list:
  for t in range(num_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt' + str(t), 'r') as f:
      print(f.read())
  for t in range(num_non_char):
    with open(DATA_PATH + '.tmp/process/' + file_name + '.prompt_nc' + str(t), 'rb') as f:
      print(f.read())

dynamic and inspiring scene, a person standing on top of a mountain, looking up at the sky with arms outstretched, wearing casual outdoor clothing, a determined and hopeful expression, sun rays breaking through clouds, vibrant colors of orange, yellow, and blue, sense of triumph and aspiration, background with abstract shapes and patterns, geometric forms, 8k resolution, 16:9 aspect ratio, 60fps
vibrant cityscape at night, neon lights, dynamic silhouette of a person standing on top of a skyscraper, arms outstretched towards the stars, sense of ambition and determination, city lights reflecting off the windows, skyscrapers with futuristic designs, flying vehicles in the distance, glowing trails, starry sky above, energetic atmosphere, vivid colors, 8k resolution, 16:9 aspect ratio, 60fps
b'dynamic, energetic, uplifting, inspiring, powerful, motivational, fresh, dynamic, fresh, uplifting, inspiring, motivational, powerful, dynamic, fresh, dynamic, fresh, dynamic, energetic, uplifting, in

## Generate

In [21]:
os.system(f'python generate/generate.py --model_path {MODEL_PATH} --data_path {DATA_PATH} --model {GENRATE_MODEL} --output_path {DATA_PATH}.tmp/generate/ --prompt_path {DATA_PATH}.tmp/process/ --image_num {image_num} --num_char {num_char} --num_non_char {num_non_char}')

Loading prompt from file
Protoflicker.prompt
Nhelv.prompt
Prompt loaded
Loading model


Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00,  4.57it/s]


Model loaded
Generating for Protoflicker.prompt


Token indices sequence length is longer than the specified maximum sequence length for this model (84 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (84 > 77). Running this sequence through the model will result in indexing errors
  return F.conv2d(input, weight, bias, self.stride,
100%|██████████| 50/50 [00:11<00:00,  4.37it/s]
100%|██████████| 50/50 [00:11<00:00,  4.49it/s]


Generated for Protoflicker.prompt
Generating for Nhelv.prompt


100%|██████████| 50/50 [00:11<00:00,  4.48it/s]
100%|██████████| 50/50 [00:11<00:00,  4.47it/s]


Generated for Nhelv.prompt
Loading prompt from file
Generating image without characters
Prompt loaded
Generating for Protoflicker.prompt_nc


100%|██████████| 50/50 [00:12<00:00,  3.96it/s]
100%|██████████| 50/50 [00:11<00:00,  4.46it/s]


Generated for Protoflicker.prompt_nc
Generating for Nhelv.prompt_nc


100%|██████████| 50/50 [00:11<00:00,  4.44it/s]
100%|██████████| 50/50 [00:11<00:00,  4.46it/s]


Generated for Nhelv.prompt_nc


0

# Style transfer

In [19]:
os.system(f'python style_transfer/style_transfer.py --data_path {DATA_PATH} --output_path {DATA_PATH}.tmp/style_transfer/ --style_path {STYLE_PATH} --content_path {CONTENT_PATH} -l_o --num_char {num_char} --num_non_char {num_non_char}')

  warn(


['0-0.png', '0-1.png']


Traceback (most recent call last):
  File "/root/LLM_project/codes/style_transfer/style_transfer.py", line 64, in <module>
    print(os.listdir(CONTENT_PATH+"/"+x))
FileNotFoundError: [Errno 2] No such file or directory: '/root/LLM_project/codes/data/.tmp/generate//Nhelv'


256