Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sounakray2003/issue91 #92

Open
wants to merge 78 commits into
base: dongchao
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
609d4bb
Add WIP items for the project
ftshijt Mar 28, 2023
8d0203b
Merge pull request #3 from ftshijt/main
Rongjiehuang Mar 28, 2023
f28a3c5
update
Rongjiehuang Mar 28, 2023
fb52b59
update
lmzjms Mar 28, 2023
0b56c58
Merge pull request #4 from lmzjms/main
Rongjiehuang Mar 28, 2023
7099521
Revert "update"
Rongjiehuang Mar 28, 2023
f89ee96
Merge pull request #5 from AIGC-Audio/revert-4-main
Rongjiehuang Mar 28, 2023
03f41ec
update
lmzjms Mar 28, 2023
fd6a33c
update
lmzjms Mar 28, 2023
c97c3f7
Merge pull request #6 from lmzjms/main
Rongjiehuang Mar 28, 2023
a869dce
Update README.md
yangdongchao Mar 29, 2023
7492e97
update
lmzjms Mar 30, 2023
819b3c4
update
lmzjms Mar 30, 2023
2d1dd31
update
lmzjms Mar 31, 2023
cc0c224
merge tts and t2s into NeuralSeq
PeppaPiggeee Mar 31, 2023
6c6e077
update
PeppaPiggeee Mar 31, 2023
3fbfbf9
update
PeppaPiggeee Mar 31, 2023
7094f94
update
PeppaPiggeee Mar 31, 2023
c7e6dbe
update
PeppaPiggeee Mar 31, 2023
3131ec4
Merge pull request #8 from lmzjms/main
Rongjiehuang Mar 31, 2023
f6c8cbe
Update README.md
Rongjiehuang Mar 31, 2023
7a315a8
update
PeppaPiggeee Apr 1, 2023
a23e0f3
Merge branch 'main' into hzq
PeppaPiggeee Apr 1, 2023
3be5167
update
PeppaPiggeee Apr 1, 2023
7a709a6
update
PeppaPiggeee Apr 2, 2023
8569aa0
Merge pull request #9 from AIGC-Audio/hzq
Rongjiehuang Apr 2, 2023
9e2a24b
delect cache
Rongjiehuang Apr 2, 2023
ff82f7a
delect cache
Rongjiehuang Apr 2, 2023
9b4a830
Merge pull request #10 from Rongjiehuang/main
Rongjiehuang Apr 2, 2023
1a69271
cleaning
Rongjiehuang Apr 3, 2023
7ee017c
Update README.md
yangdongchao Apr 4, 2023
322ed8c
detection and extraction
yangdongchao Apr 5, 2023
5cfa061
Merge pull request #12 from AIGC-Audio/ydc
Rongjiehuang Apr 6, 2023
e3a7194
fix e
yangdongchao Apr 6, 2023
3c47c2c
Merge pull request #13 from AIGC-Audio/ydc
Rongjiehuang Apr 6, 2023
112d87b
Merge branch 'main' of github.com:Rongjiehuang/AudioGPT
Rongjiehuang Apr 6, 2023
2da3ccd
update huggingface
Rongjiehuang Apr 6, 2023
e8fdbbf
update huggingface
Rongjiehuang Apr 6, 2023
514f233
add assets
yangdongchao Apr 9, 2023
0dff745
Merge pull request #14 from AIGC-Audio/ydc
yangdongchao Apr 9, 2023
f3cf2be
update
Rongjiehuang Apr 9, 2023
028bf0c
Merge branch 'main' of github.com:Rongjiehuang/AudioGPT
Rongjiehuang Apr 9, 2023
69aca79
update
Rongjiehuang Apr 9, 2023
236a2aa
Merge pull request #15 from Rongjiehuang/main
Rongjiehuang Apr 9, 2023
963fb77
Add Visinger
A-Quarter-Mile Apr 10, 2023
f5c6c4c
update
lmzjms Apr 11, 2023
4d14f89
update
lmzjms Apr 11, 2023
e2b06d3
Merge pull request #16 from A-Quarter-Mile/main
Rongjiehuang Apr 11, 2023
181bcee
add enh / ss
simpleoier Apr 11, 2023
9d9ad78
Merge pull request #18 from simpleoier/enh_ss
Rongjiehuang Apr 11, 2023
84a5493
update
lmzjms Apr 11, 2023
ea246e7
update
lmzjms Apr 11, 2023
e03a456
update
lmzjms Apr 11, 2023
70d54b5
update
lmzjms Apr 11, 2023
cb62a28
Merge pull request #20 from lmzjms/main
Rongjiehuang Apr 12, 2023
aab80e0
clean some codes
Rongjiehuang Apr 12, 2023
8975378
Merge pull request #21 from Rongjiehuang/main
Rongjiehuang Apr 12, 2023
7c6f83a
update
lmzjms Apr 13, 2023
34d0365
Merge branch 'main' of github.com:lmzjms/AudioGPT into main
lmzjms Apr 13, 2023
209995f
update
lmzjms Apr 13, 2023
46e0dbe
update
lmzjms Apr 13, 2023
d218ef7
Merge pull request #22 from lmzjms/main
Rongjiehuang Apr 13, 2023
89d47f0
clean
Rongjiehuang Apr 16, 2023
a06c041
Merge branch 'main' of github.com:Rongjiehuang/AudioGPT
Rongjiehuang Apr 16, 2023
b7ef7f0
Update README.md
MoonInTheRiver Apr 18, 2023
1c4b42f
Update README.md
RayeRen Apr 21, 2023
7ecef2b
update
Rongjiehuang Apr 26, 2023
4a0a02e
Merge branch 'AIGC-Audio:main' into main
Rongjiehuang Apr 26, 2023
36b86ad
Merge pull request #23 from Rongjiehuang/main
Rongjiehuang Apr 26, 2023
ed28e06
update
Rongjiehuang Apr 26, 2023
d1c2e98
Merge branch 'main' of github.com:Rongjiehuang/AudioGPT
Rongjiehuang Apr 26, 2023
afbc05e
Merge branch 'main' of github.com:Rongjiehuang/AudioGPT
Rongjiehuang Apr 26, 2023
97a9a2f
Refine readme
Rongjiehuang Apr 26, 2023
9b6d51d
update
lmzjms Apr 30, 2023
79fe509
update
lmzjms Apr 30, 2023
526a05a
update
lmzjms Apr 30, 2023
f61a97c
update
lmzjms Apr 30, 2023
148737e
Merge pull request #40 from lmzjms/main
lmzjms Apr 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

# Byte-compiled / optimized / DLL files
*__pycache__/
__pycache__/
*.py[cod]
*$py.class

Expand Down
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions NeuralSeq/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
In this directory, we support FastSpeech, GenerSpeech, SyntaSpeech, DiffSinger
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,6 @@ def process(self):
f.writelines([f'{l}\n' for l in mfa_dict])
with open(f"{processed_dir}/{self.meta_csv_filename}.json", 'w') as f:
f.write(re.sub(r'\n\s+([\d+\]])', r'\1', json.dumps(items, ensure_ascii=False, sort_keys=False, indent=1)))

# save to csv
meta_df = pd.DataFrame(items)
meta_df.to_csv(f"{processed_dir}/metadata_phone.csv")

remove_file(wav_processed_tmp_dir)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@
from nltk import pos_tag
from nltk.tokenize import TweetTokenizer

from text_to_speech.data_gen.tts.txt_processors.base_text_processor import BaseTxtProcessor, register_txt_processors
from text_to_speech.utils.text.text_encoder import PUNCS, is_sil_phoneme

from data_gen.tts.txt_processors.base_text_processor import BaseTxtProcessor, register_txt_processors
from data_gen.tts.data_gen_utils import is_sil_phoneme, PUNCS

class EnG2p(G2p):
word_tokenize = TweetTokenizer().tokenize
Expand Down Expand Up @@ -75,4 +74,4 @@ def process(cls, txt, preprocess_args):
else:
txt_struct[i_word][1].append(p)
txt_struct = cls.postprocess(txt_struct, preprocess_args)
return txt_struct, txt
return txt_struct, txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import re
import jieba
from pypinyin import pinyin, Style
from data_gen.tts.data_gen_utils import PUNCS
from data_gen.tts.txt_processors.base_text_processor import BaseTxtProcessor
Expand All @@ -20,6 +21,7 @@ def preprocess_text(text):
text = re.sub(f"([{PUNCS}])+", r"\1", text) # !! -> !
text = re.sub(f"([{PUNCS}])", r" \1 ", text)
text = re.sub(rf"\s+", r"", text)
text = re.sub(rf"[A-Za-z]+", r"$", text)
return text

@classmethod
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,10 @@
import subprocess
import librosa
import numpy as np
from text_to_speech.data_gen.tts.wav_processors.base_processor import BaseWavProcessor, register_wav_processors
from text_to_speech.utils.audio import trim_long_silences
from text_to_speech.utils.audio.io import save_wav
from text_to_speech.utils.audio.rnnoise import rnnoise
from text_to_speech.utils.commons.hparams import hparams
from data_gen.tts.wav_processors.base_processor import BaseWavProcessor, register_wav_processors
from data_gen.tts.data_gen_utils import trim_long_silences
from utils.audio import save_wav, rnnoise
from utils.hparams import hparams


@register_wav_processors(name='sox_to_wav')
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from text_to_speech.data_gen.tts.base_preprocess import BasePreprocessor
from data_gen.tts.base_preprocess import BasePreprocessor


class LJPreprocess(BasePreprocessor):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
task_cls: usr.task.DiffFsTask
task_cls: tasks.svs.task.DiffFsTask
pitch_type: frame
timesteps: 100
dilation_cycle_length: 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ spec_max: [ -0.5982, -0.0778, 0.1205, 0.2747, 0.4657, 0.5123, 0.5684, 0.70
0.0086, -0.0698, 0.1385, 0.0941, 0.1864, 0.1225, 0.2176, 0.2566,
0.1670, 0.1007, 0.1444, 0.0888, 0.1998, 0.2414, 0.2932, 0.3047 ]

task_cls: usr.diffspeech_task.DiffSpeechTask
task_cls: tasks.svs.diffspeech_task.DiffSpeechTask
vocoder: vocoders.hifigan.HifiGAN
vocoder_ckpt: checkpoints/0414_hifi_lj_1
num_valid_plots: 10
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
base_config:
- configs/singing/fs2.yaml
- usr/configs/midi/cascade/opencs/opencpop_statis.yaml
- egs/egs_bases/svs/midi/cascade/opencs/opencpop_statis.yaml

audio_sample_rate: 24000
hop_size: 128 # Hop size.
Expand Down Expand Up @@ -42,8 +42,8 @@ test_prefixes: [
'2100',
]

task_cls: usr.diffsinger_task.AuxDecoderMIDITask
#vocoder: usr.singingvocoder.highgan.HighGAN
task_cls: tasks.svs.diffsinger_task.AuxDecoderMIDITask
#vocoder: tasks.svs.singingvocoder.highgan.HighGAN
#vocoder_ckpt: checkpoints/h_2_model/checkpoint-530000steps.pkl
vocoder: vocoders.hifigan.HifiGAN
vocoder_ckpt: checkpoints/0109_hifigan_bigpopcs_hop128
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
base_config:
- usr/configs/popcs_ds_beta6.yaml
- usr/configs/midi/cascade/opencs/opencpop_statis.yaml
- egs/egs_bases/svs/popcs_ds_beta6.yaml
- egs/egs_bases/svs/midi/cascade/opencs/opencpop_statis.yaml

binarizer_cls: data_gen.singing.binarize.OpencpopBinarizer
binary_data_dir: 'data/binary/opencpop-midi-dp'
Expand All @@ -21,7 +21,7 @@ pe_ckpt: ''

fs2_ckpt: 'checkpoints/0302_opencpop_fs_midi/model_ckpt_steps_160000.ckpt' #
#num_valid_plots: 0
task_cls: usr.diffsinger_task.DiffSingerMIDITask
task_cls: tasks.svs.diffsinger_task.DiffSingerMIDITask

K_step: 60
max_tokens: 36000
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
base_config:
- usr/configs/popcs_ds_beta6.yaml
- usr/configs/midi/cascade/opencs/opencpop_statis.yaml
- egs/egs_bases/svs/popcs_ds_beta6.yaml
- egs/egs_bases/svs/midi/cascade/opencs/opencpop_statis.yaml

binarizer_cls: data_gen.singing.binarize.OpencpopBinarizer
binary_data_dir: 'data/binary/opencpop-midi-dp'
Expand All @@ -17,7 +17,7 @@ dur_predictor_layers: 5 # *

fs2_ckpt: '' #
#num_valid_plots: 0
task_cls: usr.diffsinger_task.DiffSingerMIDITask
task_cls: tasks.svs.diffsinger_task.DiffSingerMIDITask

timesteps: 1000
K_step: 1000
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
base_config:
- usr/configs/popcs_ds_beta6.yaml
- usr/configs/midi/cascade/opencs/opencpop_statis.yaml
- egs/egs_bases/svs/popcs_ds_beta6.yaml
- egs/egs_bases/svs/midi/cascade/opencs/opencpop_statis.yaml

binarizer_cls: data_gen.singing.binarize.OpencpopBinarizer
binary_data_dir: 'data/binary/opencpop-midi-dp'
Expand All @@ -17,7 +17,7 @@ dur_predictor_layers: 5 # *

fs2_ckpt: '' #
#num_valid_plots: 0
task_cls: usr.diffsinger_task.DiffSingerMIDITask
task_cls: tasks.svs.diffsinger_task.DiffSingerMIDITask

# for diffusion schedule
timesteps: 1000
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
base_config:
- usr/configs/popcs_ds_beta6.yaml
- usr/configs/midi/cascade/opencs/opencpop_statis.yaml
- egs/egs_bases/svs/popcs_ds_beta6.yaml
- egs/egs_bases/svs/midi/cascade/opencs/opencpop_statis.yaml

binarizer_cls: data_gen.singing.binarize.OpencpopBinarizer
binary_data_dir: 'data/binary/opencpop-midi-dp'
Expand All @@ -17,7 +17,7 @@ dur_predictor_layers: 5 # *

fs2_ckpt: '' #
#num_valid_plots: 0
task_cls: usr.diffsinger_task.DiffSingerMIDITask
task_cls: tasks.svs.diffsinger_task.DiffSingerMIDITask

K_step: 100
max_tokens: 36000
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
base_config:
- usr/configs/popcs_ds_beta6.yaml
- usr/configs/midi/cascade/popcs/popcs_statis.yaml
- egs/egs_bases/svs/popcs_ds_beta6.yaml
- egs/egs_bases/svs/midi/cascade/popcs/popcs_statis.yaml

binarizer_cls: data_gen.singing.binarize.MidiSingingBinarizer
binary_data_dir: 'data/binary/popcs-midi-dp'
Expand All @@ -17,7 +17,7 @@ dur_predictor_layers: 5 # *

fs2_ckpt: '' #
#num_valid_plots: 0
task_cls: usr.diffsinger_task.DiffSingerMIDITask
task_cls: tasks.svs.diffsinger_task.DiffSingerMIDITask

K_step: 100
max_tokens: 40000
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ spec_max: [ 0.2645, 0.0583, -0.2344, -0.0184, 0.1227, 0.1533, 0.1103, 0.121
-0.8770, -0.9520, -0.8749, -0.7297, -0.8374, -0.8667, -0.7157, -0.9035,
-0.9219, -0.8801, -0.9298, -0.9009, -0.9604, -1.0537, -1.0781, -1.3766]

task_cls: usr.diffsinger_task.DiffSingerTask
#vocoder: usr.singingvocoder.highgan.HighGAN
task_cls: tasks.svs.diffsinger_task.DiffSingerTask
#vocoder: tasks.svs.singingvocoder.highgan.HighGAN
#vocoder_ckpt: checkpoints/h_2_model/checkpoint-530000steps.pkl
vocoder: vocoders.hifigan.HifiGAN
vocoder_ckpt: checkpoints/0109_hifigan_bigpopcs_hop128
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ base_config:

fs2_ckpt: checkpoints/popcs_fs2_pmf0_1230/model_ckpt_steps_160000.ckpt # to be infer
num_valid_plots: 0
task_cls: usr.diffsinger_task.DiffSingerOfflineTask
task_cls: tasks.svs.diffsinger_task.DiffSingerOfflineTask

# tmp:
#pe_enable: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ test_prefixes: [
]

task_cls: tasks.tts.fs2.FastSpeech2Task
#vocoder: usr.singingvocoder.highgan.HighGAN
#vocoder: tasks.svs.singingvocoder.highgan.HighGAN
#vocoder_ckpt: checkpoints/h_2_model/checkpoint-530000steps.pkl
vocoder: vocoders.hifigan.HifiGAN
vocoder_ckpt: checkpoints/0109_hifigan_bigpopcs_hop128
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
base_config: ./fs.yaml
base_config: ./fs2.yaml

###########################
# models
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
base_config: ./ps.yaml
base_config: ./ps2.yaml
task_cls: tasks.tts.ps_flow.PortaSpeechFlowTask

use_post_flow: true
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
import torch
# from inference.tts.fs import FastSpeechInfer
# from modules.tts.fs2_orig import FastSpeech2Orig
from inference.svs.base_svs_infer import BaseSVSInfer
from utils import load_ckpt
from utils.hparams import hparams
from usr.diff.shallow_diffusion_tts import GaussianDiffusion
from usr.diffsinger_task import DIFF_DECODERS
from modulesmodules.diff.shallow_diffusion_tts import GaussianDiffusion
from tasks.svs.diffsinger_task import DIFF_DECODERS

class DiffSingerCascadeInfer(BaseSVSInfer):
def build_model(self):
Expand Down Expand Up @@ -53,4 +51,4 @@ def forward_model(self, inp):
} # input like Opencpop dataset.
DiffSingerCascadeInfer.example_run(inp)

# # CUDA_VISIBLE_DEVICES=1 python inference/svs/ds_cascade.py --config usr/configs/midi/cascade/opencs/ds60_rel.yaml --exp_name 0303_opencpop_ds58_midi
# # CUDA_VISIBLE_DEVICES=1 python inference/svs/ds_cascade.py --config egs/egs_bases/svs/midi/cascade/opencs/ds60_rel.yaml --exp_name 0303_opencpop_ds58_midi
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
from inference.svs.base_svs_infer import BaseSVSInfer
from utils import load_ckpt
from utils.hparams import hparams
from usr.diff.shallow_diffusion_tts import GaussianDiffusion
from usr.diffsinger_task import DIFF_DECODERS
from modules.diff.shallow_diffusion_tts import GaussianDiffusion
from tasks.svs.diffsinger_task import DIFF_DECODERS
from modules.fastspeech.pe import PitchExtractor
import utils

Expand Down Expand Up @@ -64,4 +64,4 @@ def forward_model(self, inp):
DiffSingerE2EInfer.example_run(inp)


# CUDA_VISIBLE_DEVICES=3 python inference/svs/ds_e2e.py --config usr/configs/midi/e2e/opencpop/ds100_adj_rel.yaml --exp_name 0228_opencpop_ds100_rel
# CUDA_VISIBLE_DEVICES=3 python inference/svs/ds_e2e.py --config egs/egs_bases/svs/midi/e2e/opencpop/ds100_adj_rel.yaml --exp_name 0228_opencpop_ds100_rel
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
def cpop_pinyin2ph_func():
# In the README file of opencpop dataset, they defined a "pinyin to phoneme mapping table"
pinyin2phs = {'AP': 'AP', 'SP': 'SP'}
with open('text_to_sing/DiffSinger/inference/svs/opencpop/cpop_pinyin2ph.txt') as rf:
with open('NeuralSeq/inference/svs/opencpop/cpop_pinyin2ph.txt') as rf:
for line in rf.readlines():
elements = [x.strip() for x in line.split('|') if x.strip() != '']
pinyin2phs[elements[0]] = elements[1]
Expand Down